!eWOErHSaiddIbsUNsJ:nixos.org

NixOS CUDA

289 Members
CUDA packages maintenance and support in nixpkgs | https://github.com/orgs/NixOS/projects/27/ | https://nixos.org/manual/nixpkgs/unstable/#cuda57 Servers

Load older messages


SenderMessageTime
26 Jun 2024
@glepage:matrix.orgGaétan Lepage
In reply to @hexa:lossy.network
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7981/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_aligned_k64.cu.ou.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7982/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536.cu.o.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) [7983/8253] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/mem_eff_attention/kernels/cutlassB_f16_notaligned_k65536_dropout.cu.o
stderr) nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
stderr) client_loop: send disconnect: Broken pipe
stderr) error: Nix daemon disconnected unexpectedly (maybe it crashed?)
stderr) error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
stderr)        last 1 log lines:
stderr)        > client_loop: send disconnect: Broken pipe
stderr)        For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
8253 ? That's torchWithCuda. (Why the hell do I know that by heart 🫠 ?)
13:26:36
@hexa:lossy.networkhexa (UTC+1)because you are as emotionally damaged by cudaSupport as me13:27:07
@hexa:lossy.networkhexa (UTC+1)the only thing worse is being gaslit by bazel build jobs13:27:28
@glepage:matrix.orgGaétan Lepage
In reply to @hexa:lossy.network
the only thing worse is being gaslit by bazel build jobs
I suffer from this too
13:27:56
@ss:someonex.netSomeoneSerge (back on matrix)
❯ ag cuda_runtime.h
...

pkgs/development/python-modules/torch/default.nix
453:        cuda_cudart.dev # cuda_runtime.h and libraries
13:32:00
@hexa:lossy.networkhexa (UTC+1)the first 7800 build steps of torchWithCuda are essentially free14:17:54
@hexa:lossy.networkhexa (UTC+1)I'm not sure what happens between 7800 and 825314:18:01
@hexa:lossy.networkhexa (UTC+1)it feels like I'm doing proof of work in that range14:18:12
@glepage:matrix.orgGaétan Lepagecuda kernels :/14:18:55
@glepage:matrix.orgGaétan LepageThey are excuciatingly slow14:19:17
@ss:someonex.netSomeoneSerge (back on matrix)And it's good to limit parallelism there...14:19:31
@hexa:lossy.networkhexa (UTC+1)image.png
Download image.png
14:20:28
@hexa:lossy.networkhexa (UTC+1)do we? 😄 14:20:29
@hexa:lossy.networkhexa (UTC+1)I mean, its all userspace load, so it's not too bad14:20:44
@hexa:lossy.networkhexa (UTC+1)some 5% system load14:21:06
@hexa:lossy.networkhexa (UTC+1)
error: builder for '/nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv' failed with exit code 1;
       last 1 log lines:
       > client_loop: send disconnect: Broken pipe
       For full logs, run 'nix log /nix/store/5dx3yrij3jn4fybsmxvl6dk6d4hl7hzg-faiss-1.7.4.drv'.
15:40:24
@hexa:lossy.networkhexa (UTC+1)remote building faiss fails repeatedly 😕15:40:42
@hexa:lossy.networkhexa (UTC+1)saw this behavior weeks ago as well15:43:12
@hexa:lossy.networkhexa (UTC+1)since I have to build pyannotate-audio for whisper-ctranslate2 🫠15:43:41
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)Doesn’t torch still have that problem where, when compiling the CUDA kernels for flash attention, if you don’t limit parallelism to 1 then you can easily consume 100GB+ of RAM?16:23:11
@connorbaker:matrix.orgconnor (burnt/out) (UTC-8)I remember enabling ZRAM partly because of that. And I’m pretty sure they were all zero pages too, because they compressed to absolutely nothing lmao16:23:46
@hexa:lossy.networkhexa (UTC+1)128GB RAM here16:50:58
@hexa:lossy.networkhexa (UTC+1)With 50% zramswap16:51:15
@hexa:lossy.networkhexa (UTC+1)So yeah, pretty scuffes16:51:26
@hexa:lossy.networkhexa (UTC+1) * 16:51:30
@ss:someonex.netSomeoneSerge (back on matrix)Gosh ofborg darn evals are so slow18:32:10
@ss:someonex.netSomeoneSerge (back on matrix) Here goes https://github.com/NixOS/nixpkgs/pull/256230 19:17:07
@hexa:lossy.networkhexa (UTC+1) SomeoneSerge (UTC+3): have you seen faiss fail on remote building? 22:40:56
@hexa:lossy.networkhexa (UTC+1)the builders in this case are skylake22:41:05
@hexa:lossy.networkhexa (UTC+1)
pytorch-metric-learning> tests/testers/test_global_embedding_space_tester.py Fatal Python error: Aborted
pytorch-metric-learning> 
pytorch-metric-learning> Thread 0x00007ffdbac006c0 (most recent call first):
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 331 in wait
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 629 in wait
pytorch-metric-learning>   File "/nix/store/mm7hz4pnl7m3cvg6vlgi8ngqqffpg0p1-python3.11-tqdm-4.66.4/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/threading.py", line 1002 in _bootstrap
pytorch-metric-learning> 
pytorch-metric-learning> Current thread 0x00007ffff7eae740 (most recent call first):
pytorch-metric-learning>   File "/nix/store/lwybzhchhmq9g0crvfd5301ig5kwdqgp-faiss-1.7.4/lib/python3.11/site-packages/faiss/swigfaiss.py", line 1000 in get_num_gpus
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/utils/inference.py", line 261 in try_gpu
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/utils/inference.py", line 189 in __call__
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/utils/accuracy_calculator.py", line 472 in get_accuracy
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/testers/global_embedding_space.py", line 21 in do_knn_and_accuracies
pytorch-metric-learning>   File "/build/source/src/pytorch_metric_learning/testers/base_tester.py", line 303 in test
pytorch-metric-learning>   File "/build/source/tests/testers/test_global_embedding_space_tester.py", line 43 in test_global_embedding_space_tester
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/unittest/case.py", line 579 in _callTestMethod
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/unittest/case.py", line 623 in run
pytorch-metric-learning>   File "/nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/lib/python3.11/unittest/case.py", line 678 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/unittest.py", line 321 in runtest
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 172 in pytest_runtest_call
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 240 in <lambda>
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 340 in from_call
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 239 in call_and_report
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 134 in runtestprotocol
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/runner.py", line 115 in pytest_runtest_protocol
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 339 in _main
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 285 in wrap_session
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_callers.py", line 102 in _multicall
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_manager.py", line 119 in _hookexec
pytorch-metric-learning>   File "/nix/store/lmhsqyj6106bg07x0lfznyw279x6qz4i-python3.11-pluggy-1.4.0/lib/python3.11/site-packages/pluggy/_hooks.py", line 501 in __call__
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/config/__init__.py", line 174 in main
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/_pytest/config/__init__.py", line 197 in console_main
pytorch-metric-learning>   File "/nix/store/vmfa4h5g5mf0lzxgwpjs0q7806g0dgbc-python3.11-pytest-8.1.1/lib/python3.11/site-packages/pytest/__main__.py", line 7 in <module>
pytorch-metric-learning>   File "<frozen runpy>", line 88 in _run_code
pytorch-metric-learning>   File "<frozen runpy>", line 198 in _run_module_as_main
pytorch-metric-learning> 
pytorch-metric-learning> Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, faiss._swigfaiss, sklearn.__check_build._check_build, lz4._version, lz4.frame._frame, psutil._psutil_linux, psutil._psutil_posix, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imaging, PIL._imagingft (total: 152)
pytorch-metric-learning> /nix/store/7v4isfyqx892id87k3dsn3pa81wpw0d0-pytest-check-hook/nix-support/setup-hook: line 53:   511 Aborted                 (core dumped) /nix/store/4rf5qybw37b4lh1g0xczlv14sqdbmnpm-python3-3.11.9/bin/python3.11 -m pytest -k "not TestDistributedLossWrapper and not TestInference and not test_get_nearest_neighbors and not test_tuplestoweights_sampler and not test_untrained_indexer and not test_metric_loss_only and not test_pca and not test_distributed_classifier_loss_and_miner"
23:58:21

Show newer messages


Back to Room ListRoom Version: 9