| 16 Nov 2024 |
hexa |
error: tensorflow-gpu-2.13.0 not supported for interpreter python3.12
| 20:45:57 |
hexa | the sound of nixos 24.05 hits hard | 20:46:03 |
hexa | * error: tensorflow-gpu-2.13.0 not supported for interpreter python3.12
| 20:46:08 |
hexa | * error: tensorflow-gpu-2.13.0 not supported for interpreter python3.12
| 20:46:12 |
| 17 Nov 2024 |
Gaétan Lepage | Yes... Let's hope zeuner finds the time to end the TF bump... | 10:38:39 |
| 18 Nov 2024 |
hexa | wyoming-faster-whisper[4505]: File "/nix/store/dfp38l0dy3n97wvrgz5i62mwvsmshd3n-python3.12-faster-whisper-unstable-2024-07-26/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 145, in __init__
wyoming-faster-whisper[4505]: self.model = ctranslate2.models.Whisper(
wyoming-faster-whisper[4505]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wyoming-faster-whisper[4505]: RuntimeError: CUDA failed with error unknown error
systemd[1]: wyoming-faster-whisper-medium-en.service: Main process exited, code=exited, status=1/FAILURE
| 02:09:21 |
hexa | also loving unknown error errors | 02:09:26 |
hexa | wyoming-faster-whisper[4745]: File "/nix/store/dfp38l0dy3n97wvrgz5i62mwvsmshd3n-python3.12-faster-whisper-unstable-2024-07-26/lib/python3.12/site-packages/faster_whisper/transcribe.py", line 145, in __init__
wyoming-faster-whisper[4745]: self.model = ctranslate2.models.Whisper(
wyoming-faster-whisper[4745]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
wyoming-faster-whisper[4745]: RuntimeError: CUDA failed with error no CUDA-capable device is detected
| 02:10:44 |
hexa | baby steps | 02:10:46 |
hexa | I can confirm the card is still seated correctly 😄 | 02:10:58 |
hexa | hardening at work | 02:18:46 |
connor (he/him) | Ugh I don’t like computers | 05:10:46 |
connor (he/him) | Anyway in the interest of splitting my attention ever more thinly I decided to start trying to work on some approach toward evaluation of derivations and building them
The idea being to have
- a service which is given a flake ref and an attribute path and efficiently produces a list of attribute paths to derivations exiting under the given attribute path and stores the eval time somewhere
- a service which is given a flake ref and an attribute path to a derivation and produces the JSON representation of the closure of derivations required to realize the derivation, again storing eval time somewhere
- a service which functions as a job scheduler, using historical data about costs (space, time, memory, CPU usage, etc.) and information about locality (existing store paths on different builders) to realize a derivation, which is updated upon realization of a derivation
| 05:18:41 |
connor (he/him) | Because why have one project when you can have many? | 05:18:55 |
connor (he/him) | https://github.com/ConnorBaker/nix-eval-graph
And I’ve decided to write it in Rust, which I am self teaching.
And I’ll probably use a graph database, because why not.
And I’ll use NixOS tests for integration testing, because also why not. | 05:20:02 |
connor (he/him) | All this is to say I am deeply irritated when I see my builders copying around gigantic CUDA libraries constantly. | 05:20:31 |
connor (he/him) | Unrelated to closure woes, I tried to package https://github.com/NVIDIA/MatX and https://github.com/NVIDIA/nvbench and nearly pulled my hair out. If anyone has suggestions for doing so without creating a patched and vendored copy of https://github.com/rapidsai/rapids-cmake or writing my own CMake for everything, I’d love to hear! | 05:23:26 |
connor (he/him) | Also, anyone know how the ROCm maintainers are doing? | 05:26:35 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org
Anyway in the interest of splitting my attention ever more thinly I decided to start trying to work on some approach toward evaluation of derivations and building them
The idea being to have
- a service which is given a flake ref and an attribute path and efficiently produces a list of attribute paths to derivations exiting under the given attribute path and stores the eval time somewhere
- a service which is given a flake ref and an attribute path to a derivation and produces the JSON representation of the closure of derivations required to realize the derivation, again storing eval time somewhere
- a service which functions as a job scheduler, using historical data about costs (space, time, memory, CPU usage, etc.) and information about locality (existing store paths on different builders) to realize a derivation, which is updated upon realization of a derivation
Awesome! I've been bracing myself to look into that too. What's your current idea regarding costs and locality? | 07:09:42 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org Unrelated to closure woes, I tried to package https://github.com/NVIDIA/MatX and https://github.com/NVIDIA/nvbench and nearly pulled my hair out. If anyone has suggestions for doing so without creating a patched and vendored copy of https://github.com/rapidsai/rapids-cmake or writing my own CMake for everything, I’d love to hear! we'd need to do that if were to package rapids itself too, wouldn't we? | 07:11:11 |
connor (he/him) | In reply to @ss:someonex.net Awesome! I've been bracing myself to look into that too. What's your current idea regarding costs and locality? Currently I don't know how I'd even model it... but I've been told that job scheduling is a well-researched problem in HPC communities ;) I started to write something about how I think of high-level tradeoffs between choosing where to build to build moar fast, reduce the number of rebuilds (if they are at all permitted), reduce network traffic, etc. and then thought "well what if the machines aren't homogenous" and I've decided it's time for bed. | 08:40:34 |
connor (he/him) | In reply to @ss:someonex.net we'd need to do that if were to package rapids itself too, wouldn't we? I have been avoiding rapids so hard lmao 🙅♂️ | 08:40:49 |
connor (he/him) | Unrelated -- if anyone has experience with NixOS VM tests and getting multiple nodes to talk to each other, I'd appreciate pointers. ping can resolve hostnames but curl can't for some reason (https://github.com/ConnorBaker/nix-eval-graph/commit/c5a1e2268ead6ff6ffaab672762c1eedee53f403). | 08:43:02 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org Currently I don't know how I'd even model it... but I've been told that job scheduling is a well-researched problem in HPC communities ;) I started to write something about how I think of high-level tradeoffs between choosing where to build to build moar fast, reduce the number of rebuilds (if they are at all permitted), reduce network traffic, etc. and then thought "well what if the machines aren't homogenous" and I've decided it's time for bed. True. I'm still yet to read up on how SLURM and friends do this. Shameless plug: https://github.com/sinanmohd/evanix (slides) | 12:20:00 |
SomeoneSerge (back on matrix) | You should chat with picnoir too | 12:20:44 |
SomeoneSerge (back on matrix) | In reply to @connorbaker:matrix.org Unrelated -- if anyone has experience with NixOS VM tests and getting multiple nodes to talk to each other, I'd appreciate pointers. ping can resolve hostnames but curl can't for some reason (https://github.com/ConnorBaker/nix-eval-graph/commit/c5a1e2268ead6ff6ffaab672762c1eedee53f403). Should just work, what is the error? | 12:22:30 |
connor (he/him) | In reply to @ss:someonex.net True. I'm still yet to read up on how SLURM and friends do this. Shameless plug: https://github.com/sinanmohd/evanix (slides) Woah! Thanks for the links, I wasn't aware of these | 20:17:47 |
| 19 Nov 2024 |
hexa | python-updates with numpy 2.1 has landed in staging | 00:31:36 |
hexa | sowwy | 00:31:40 |
connor (he/him) | In reply to @ss:someonex.net Should just work, what is the error? Curl threw connection refused or something similar; I’ll try to get the log tomorrow | 06:34:11 |