!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

288 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda56 Servers

Load older messages


SenderMessageTime
26 Jun 2024
@hexa:lossy.networkhexa (UTC+1)saw this behavior weeks ago as well15:43:12
@hexa:lossy.networkhexa (UTC+1)since I have to build pyannotate-audio for whisper-ctranslate2 🫠15:43:41
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Doesn’t torch still have that problem where, when compiling the CUDA kernels for flash attention, if you don’t limit parallelism to 1 then you can easily consume 100GB+ of RAM?16:23:11
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I remember enabling ZRAM partly because of that. And I’m pretty sure they were all zero pages too, because they compressed to absolutely nothing lmao16:23:46
@hexa:lossy.networkhexa (UTC+1)128GB RAM here16:50:58
@hexa:lossy.networkhexa (UTC+1)With 50% zramswap16:51:15
@hexa:lossy.networkhexa (UTC+1)So yeah, pretty scuffes16:51:26
@hexa:lossy.networkhexa (UTC+1) * 16:51:30
@ss:someonex.netSomeoneSerge (back on matrix)Gosh ofborg darn evals are so slow18:32:10
@ss:someonex.netSomeoneSerge (back on matrix) Here goes https://github.com/NixOS/nixpkgs/pull/256230 19:17:07
@hexa:lossy.networkhexa (UTC+1) SomeoneSerge (UTC+3): have you seen faiss fail on remote building? 22:40:56
@hexa:lossy.networkhexa (UTC+1)the builders in this case are skylake22:41:05
@hexa:lossy.networkhexa (UTC+1)
pytorch-metric-learning> tests/testers/test_global_embedding_space_tester.py Fatal Python error: Aborted
pytorch-metric-learning> 
pytorch-metric-learning> Thread 0x00007ffdbac006c0 (most recent call first):
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 331 in wait
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 629 in wait
pytorch-metric-learning>   File "/nix/store/mm7hz4pnl7m3cvg6vlgi8ngqqffpg0p1-python3.11-tqdm-4.66.4/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 1002 in _bootstrap
pytorch-metric-learning> 
pytorch-metric-learning> Current thread 0x00007ffff7eae740 (most recent call first):
pytorch-metric-learning>   File "/nix/store/lwybzhchhmq9g0crvfd5301ig5kwdqgp-faiss-1.7.4/lib/python3.11/site-packages/faiss/swigfaiss.py", line 1000 in get_num_gpus
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/utils/inference.py", line 261 in try_gpu
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/utils/inference.py", line 189 in __call__
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/utils/accuracy_calculator.py", line 472 in get_accuracy
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/testers/global_embedding_space.py", line 21 in do_knn_and_accuracies
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/testers/base_tester.py", line 303 in test
pytorch-metric-learning>   File "/build/source/tests/testers/test_global_embedding_space_tester.py", line 43 in test_global_embedding_space_tester
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/unittest/case.py", line 579 in _callTestMethod
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/unittest/case.py", line 623 in run
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/unittest/case.py", line 678 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/unittest.py", line 321 in runtest
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 172 in pytest_runtest_call
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 240 in <lambda>
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 340 in from_call
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 239 in call_and_report
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 134 in runtestprotocol
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 115 in pytest_runtest_protocol
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 339 in _main
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 285 in wrap_session
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/config/__init__.py", line 174 in main
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/config/__init__.py", line 197 in console_main
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/pytest/__main__.py", line 7 in <module>
pytorch-metric-learning>   File "<frozen runpy>", line 88 in _run_code
pytorch-metric-learning>   File "<frozen runpy>", line 198 in _run_module_as_main
pytorch-metric-learning> 
pytorch-metric-learning> Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, faiss._swigfaiss, sklearn.__check_build._check_build, lz4._version, lz4.frame._frame, psutil._psutil_linux, psutil._psutil_posix, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imaging, PIL._imagingft (total: 152)
pytorch-metric-learning> /nix/store/7v4isfyqx892id87k3dsn3pa81wpw0d0-pytest-check-hook/nix-support/setup-hook: line 53:   511 Aborted                 (core dumped) /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/bin/python3.11 -m pytest -k "not TestDistributedLossWrapper and not TestInference and not test_get_nearest_neighbors and not test_tuplestoweights_sampler and not test_untrained_indexer and not test_metric_loss_only and not test_pca and not test_distributed_classifier_loss_and_miner"
23:58:21
@hexa:lossy.networkhexa (UTC+1)very cool23:58:22
27 Jun 2024
@hexa:lossy.networkhexa (UTC+1)crashes reliably here00:01:04
@hexa:lossy.networkhexa (UTC+1)reproduces on intel 6th gen, 8th gen and zen 300:13:45
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @hexa:lossy.network
SomeoneSerge (UTC+3): have you seen faiss fail on remote building?
Nope. I only used local builds, they seem to work consistently on a threadripper
09:32:20
@coruscate:matrix.orgcoruscate joined the room.10:13:08
@coruscate:matrix.orgcoruscateis there any up to date documentation on how to use cuda properly? I find rather conflicting information and stumble accross new information whenever I google. My Opensycl project seems to require 11.5 or earlier, while 10 would be in the common nixpkgs I thought I'd ask before going with that.10:18:32
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @coruscate:matrix.org
is there any up to date documentation on how to use cuda properly? I find rather conflicting information and stumble accross new information whenever I google.

My Opensycl project seems to require 11.5 or earlier, while 10 would be in the common nixpkgs I thought I'd ask before going with that.
I think the section in the official manual is not in an unreasonable state, although messy
10:38:20
@ss:someonex.netSomeoneSerge (back on matrix)
In reply to @coruscate:matrix.org
is there any up to date documentation on how to use cuda properly? I find rather conflicting information and stumble accross new information whenever I google.

My Opensycl project seems to require 11.5 or earlier, while 10 would be in the common nixpkgs I thought I'd ask before going with that.
Nixpkgs' unstable is at 12.2 🤔
10:39:19
@ss:someonex.netSomeoneSerge (back on matrix) * I think the section in the official manual is not in an unreasonable state, although messy: https://nixos.org/manual/nixpkgs/unstable/#cuda 10:39:49
@coruscate:matrix.orgcoruscateshould have said later then, i guess? <11.5 weirdly enough10:46:27
@ss:someonex.netSomeoneSerge (back on matrix)Ah so opensycl requires an older release?10:54:07
@ss:someonex.netSomeoneSerge (back on matrix) Just use cudaPackages_11_5. Start ad hoc, building only opensycl against it. If you wish to rebuild the whole package set, use an overlay. 10:55:35
@ss:someonex.netSomeoneSerge (back on matrix) They don't seem to specify any constraints: https://github.com/archibate/OpenSYCL/blob/b919667ea53f99dbc55a9832f297cf0cb689034e/cmake/FindCUDA.cmake#L31 11:02:25
@ss:someonex.netSomeoneSerge (back on matrix) * They don't seem to specify any constraints: https://github.com/archibate/OpenSYCL/blob/b919667ea53f99dbc55a9832f297cf0cb689034e/cmake/FindCUDA.cmake#L31 (oh, this is some fork) 11:02:44
@coruscate:matrix.orgcoruscatemy issue seems to be the packaged clang version in the nixpkg opensycl package, i'll probably simply repackage it.11:16:02
@matthewcroughan:defenestrate.itmatthewcroughan Does anybody get cicc died due to signal 9 (Kill signal) 12:19:18
@matthewcroughan:defenestrate.itmatthewcroughanWhen trying to build onnxruntime with cuda support?12:19:24

Show newer messages


Back to Room ListRoom Version: 9