NixOS CUDA - Public Room Timeline

	NixOS CUDA	289 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	57 Servers

Load older messages

Sender	Message	Time
23 Sep 2024
connor (burnt/out) (UTC-8)	Kevin Mittman: does NVIDIA happen to have JSON (or otherwise structured) versions of their dependency constraints for packages somewhere, or are the tables on the docs for each respective package the only source? I'm working on update scripts and I'd like to avoid the manual stage of "go look on the website, find the table (it may have moved), and encode the contents as a Nix expression"	18:39:25
24 Sep 2024
	@pascal.grosmann:scs.ems.host set a profile picture.	08:56:22
hexa	_______ TestKernelLinearOperatorLinOpReturn.test_solve_matrix_broadcast ________ self = <test.operators.test_kernel_linear_operator.TestKernelLinearOperatorLinOpReturn testMethod=test_solve_matrix_broadcast> def test_solve_matrix_broadcast(self): linear_op = self.create_linear_op() # Right hand size has one more batch dimension batch_shape = torch.Size((3, linear_op.batch_shape)) rhs = torch.randn(batch_shape, linear_op.size(-1), 5) self._test_solve(rhs) if linear_op.ndimension() > 2: # Right hand size has one fewer batch dimension batch_shape = torch.Size(linear_op.batch_shape[1:]) rhs = torch.randn(batch_shape, linear_op.size(-1), 5) > self._test_solve(rhs) linear_operator/test/linear_operator_test_case.py:1115: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ linear_operator/test/linear_operator_test_case.py:615: in _test_solve self.assertAllClose(arg.grad, arg_copy.grad, *self.tolerances["grad"]) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <test.operators.test_kernel_linear_operator.TestKernelLinearOperatorLinOpReturn testMethod=test_solve_matrix_broadcast> tensor1 = tensor([[[[ 1.8514e+04, 7.1797e+03, -1.1073e+04, -6.6690e+03, 1.2985e+04, 6.8468e+03], [ 1.685... -3.0153e+04], [-9.0042e+04, -1.3429e+04, -3.1822e+04, 1.3839e+04, 5.9735e+04, -5.4315e+04]]]]) tensor2 = tensor([[[[ 1.8514e+04, 7.1797e+03, -1.1073e+04, -6.6690e+03, 1.2985e+04, 6.8468e+03], [ 1.685... -3.0153e+04], [-9.0042e+04, -1.3429e+04, -3.1822e+04, 1.3839e+04, 5.9735e+04, -5.4315e+04]]]]) rtol = 0.03, atol = 1e-05, equal_nan = False def assertAllClose(self, tensor1, tensor2, rtol=1e-4, atol=1e-5, equal_nan=False): if not tensor1.shape == tensor2.shape: raise ValueError(f"tensor1 ({tensor1.shape}) and tensor2 ({tensor2.shape}) do not have the same shape.") if torch.allclose(tensor1, tensor2, rtol=rtol, atol=atol, equal_nan=equal_nan): return True if not equal_nan: if not torch.equal(tensor1, tensor1): raise AssertionError(f"tensor1 ({tensor1.shape}) contains NaNs") if not torch.equal(tensor2, tensor2): raise AssertionError(f"tensor2 ({tensor2.shape}) contains NaNs") rtol_diff = (torch.abs(tensor1 - tensor2) / torch.abs(tensor2)).view(-1) rtol_diff = rtol_diff[torch.isfinite(rtol_diff)] rtol_max = rtol_diff.max().item() atol_diff = (torch.abs(tensor1 - tensor2) - torch.abs(tensor2).mul(rtol)).view(-1) atol_diff = atol_diff[torch.isfinite(atol_diff)] atol_max = atol_diff.max().item() > raise AssertionError( f"tensor1 ({tensor1.shape}) and tensor2 ({tensor2.shape}) are not close enough. \n" f"max rtol: {rtol_max:0.8f}\t\tmax atol: {atol_max:0.8f}" ) E AssertionError: tensor1 (torch.Size([2, 3, 4, 6])) and tensor2 (torch.Size([2, 3, 4, 6])) are not close enough. E max rtol: 0.03577567 max atol: 0.00741313 linear_operator/test/base_test_case.py:46: AssertionError	11:40:36
hexa	I think this one has been failing for me on the linear-operator package	11:41:02
connor (burnt/out) (UTC-8)	As a sanity check — has anyone been able to successfully use `torch.compile` to speed up model training, or do they also get a python stack trace when torch tries to call into OpenAI’s triton	15:23:08
25 Sep 2024
SomeoneSerge (back on matrix)	It used to work but now our t2iton is lagging 1 major version behind	19:36:58
Gaétan Lepage	Because those geniuses are not able to tag a freaking release	20:20:55
Gaétan Lepage	https://github.com/triton-lang/triton/issues/3535	20:21:18
SomeoneSerge (back on matrix)	unstable-yyyy-mm-dd is ok for us; there were some minor but unresolved issues with the PR that does the bump though	20:23:04
26 Sep 2024
connor (burnt/out) (UTC-8)	In reply to @glepage:matrix.org https://github.com/triton-lang/triton/issues/3535 Well that’s an infuriating read	16:33:18
Gaétan Lepage	It's OK, OpenAI is just a small startup with only a few people. And deep learning is not even their main activity	17:07:38
connor (burnt/out) (UTC-8)	Yeah and they're ~~definitely not a for-profit organization~~	17:20:14
@adam:robins.wtf	"open" is in their name	17:24:26
nim65s	it's such a joke that I find it sad it was not opened one day earlier	17:28:20
Gaétan Lepage	"I propose a 200€ bounty for this PR. Please `git tag` the freaking commit.	21:09:04
Gaétan Lepage	* "I propose a 200€ bounty for this PR. Please `git tag` the freaking commit."	21:09:07
Gaétan Lepage	The ease of spinning up a release is a decreasing function of the project/company resources.	21:09:40
nim65s	same issue on a one-man project abandonned for the last year or so: https://github.com/bab2min/EigenRand/issues/56	21:47:05
nim65s	* same issue on a one-man project abandonned for the last year or so: https://github.com/bab2min/EigenRand/issues/56 : <48h	21:49:56
28 Sep 2024
	shekhinah changed their profile picture.	07:04:58
	kaya 𖤐 changed their profile picture.	16:55:46
1 Oct 2024
	-_o joined the room.	21:00:15
2 Oct 2024
hexa	Gaétan Lepage: please take care of tensordict	00:25:19
hexa	Download image.png	00:25:22
Gaétan Lepage	Sure, I will have a look right now. I have not faced any failure on my end, weird...	06:21:33
Gaétan Lepage	Is this on staging ?	06:23:26
Gaétan Lepage	All failures that I was able to find on hydra are timeouts or upstream dependency failures. I was able to build `tensordict` on all architectures...	07:05:50
hexa	this is on trunk	11:03:39
hexa	then you probably need to increase meta.timeout	11:04:00
Gaétan Lepage	Now that you say it, I remember this package being stuck (indefinitly) during mass rebuilds. I don't know if increasing the timeout will help. When everything works fine, it builds in ~1min... Also, nothing has changed in the derivation for the past few months.	11:47:12

Show newer messages

Back to Room ListRoom Version: 9