NixOS CUDA - Public Room Timeline

	NixOS CUDA	290 Members
	CUDA packages maintenance and support in nixpkgs \| https://github.com/orgs/NixOS/projects/27/ \| https://nixos.org/manual/nixpkgs/unstable/#cuda	58 Servers

Load older messages

Sender	Message	Time
24 Sep 2024
hexa (UTC+1)	_______ TestKernelLinearOperatorLinOpReturn.test_solve_matrix_broadcast ________ self = <test.operators.test_kernel_linear_operator.TestKernelLinearOperatorLinOpReturn testMethod=test_solve_matrix_broadcast> def test_solve_matrix_broadcast(self): linear_op = self.create_linear_op() # Right hand size has one more batch dimension batch_shape = torch.Size((3, linear_op.batch_shape)) rhs = torch.randn(batch_shape, linear_op.size(-1), 5) self._test_solve(rhs) if linear_op.ndimension() > 2: # Right hand size has one fewer batch dimension batch_shape = torch.Size(linear_op.batch_shape[1:]) rhs = torch.randn(batch_shape, linear_op.size(-1), 5) > self._test_solve(rhs) linear_operator/test/linear_operator_test_case.py:1115: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ linear_operator/test/linear_operator_test_case.py:615: in _test_solve self.assertAllClose(arg.grad, arg_copy.grad, *self.tolerances["grad"]) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <test.operators.test_kernel_linear_operator.TestKernelLinearOperatorLinOpReturn testMethod=test_solve_matrix_broadcast> tensor1 = tensor([[[[ 1.8514e+04, 7.1797e+03, -1.1073e+04, -6.6690e+03, 1.2985e+04, 6.8468e+03], [ 1.685... -3.0153e+04], [-9.0042e+04, -1.3429e+04, -3.1822e+04, 1.3839e+04, 5.9735e+04, -5.4315e+04]]]]) tensor2 = tensor([[[[ 1.8514e+04, 7.1797e+03, -1.1073e+04, -6.6690e+03, 1.2985e+04, 6.8468e+03], [ 1.685... -3.0153e+04], [-9.0042e+04, -1.3429e+04, -3.1822e+04, 1.3839e+04, 5.9735e+04, -5.4315e+04]]]]) rtol = 0.03, atol = 1e-05, equal_nan = False def assertAllClose(self, tensor1, tensor2, rtol=1e-4, atol=1e-5, equal_nan=False): if not tensor1.shape == tensor2.shape: raise ValueError(f"tensor1 ({tensor1.shape}) and tensor2 ({tensor2.shape}) do not have the same shape.") if torch.allclose(tensor1, tensor2, rtol=rtol, atol=atol, equal_nan=equal_nan): return True if not equal_nan: if not torch.equal(tensor1, tensor1): raise AssertionError(f"tensor1 ({tensor1.shape}) contains NaNs") if not torch.equal(tensor2, tensor2): raise AssertionError(f"tensor2 ({tensor2.shape}) contains NaNs") rtol_diff = (torch.abs(tensor1 - tensor2) / torch.abs(tensor2)).view(-1) rtol_diff = rtol_diff[torch.isfinite(rtol_diff)] rtol_max = rtol_diff.max().item() atol_diff = (torch.abs(tensor1 - tensor2) - torch.abs(tensor2).mul(rtol)).view(-1) atol_diff = atol_diff[torch.isfinite(atol_diff)] atol_max = atol_diff.max().item() > raise AssertionError( f"tensor1 ({tensor1.shape}) and tensor2 ({tensor2.shape}) are not close enough. \n" f"max rtol: {rtol_max:0.8f}\t\tmax atol: {atol_max:0.8f}" ) E AssertionError: tensor1 (torch.Size([2, 3, 4, 6])) and tensor2 (torch.Size([2, 3, 4, 6])) are not close enough. E max rtol: 0.03577567 max atol: 0.00741313 linear_operator/test/base_test_case.py:46: AssertionError	11:40:36
hexa (UTC+1)	I think this one has been failing for me on the linear-operator package	11:41:02
connor (burnt/out) (UTC-8)	As a sanity check — has anyone been able to successfully use `torch.compile` to speed up model training, or do they also get a python stack trace when torch tries to call into OpenAI’s triton	15:23:08
25 Sep 2024
SomeoneSerge (back on matrix)	It used to work but now our t2iton is lagging 1 major version behind	19:36:58
Gaétan Lepage	Because those geniuses are not able to tag a freaking release	20:20:55
Gaétan Lepage	https://github.com/triton-lang/triton/issues/3535	20:21:18
SomeoneSerge (back on matrix)	unstable-yyyy-mm-dd is ok for us; there were some minor but unresolved issues with the PR that does the bump though	20:23:04
26 Sep 2024
connor (burnt/out) (UTC-8)	In reply to @glepage:matrix.org https://github.com/triton-lang/triton/issues/3535 Well that’s an infuriating read	16:33:18
Gaétan Lepage	It's OK, OpenAI is just a small startup with only a few people. And deep learning is not even their main activity	17:07:38
connor (burnt/out) (UTC-8)	Yeah and they're ~~definitely not a for-profit organization~~	17:20:14
@adam:robins.wtf	"open" is in their name	17:24:26
nim65s	it's such a joke that I find it sad it was not opened one day earlier	17:28:20
Gaétan Lepage	"I propose a 200€ bounty for this PR. Please `git tag` the freaking commit.	21:09:04
Gaétan Lepage	* "I propose a 200€ bounty for this PR. Please `git tag` the freaking commit."	21:09:07
Gaétan Lepage	The ease of spinning up a release is a decreasing function of the project/company resources.	21:09:40
nim65s	same issue on a one-man project abandonned for the last year or so: https://github.com/bab2min/EigenRand/issues/56	21:47:05
nim65s	* same issue on a one-man project abandonned for the last year or so: https://github.com/bab2min/EigenRand/issues/56 : <48h	21:49:56
28 Sep 2024
	shekhinah changed their profile picture.	07:04:58
	kaya 𖤐 changed their profile picture.	16:55:46
1 Oct 2024
	-_o joined the room.	21:00:15
2 Oct 2024
hexa (UTC+1)	Gaétan Lepage: please take care of tensordict	00:25:19
hexa (UTC+1)	Download image.png	00:25:22
Gaétan Lepage	Sure, I will have a look right now. I have not faced any failure on my end, weird...	06:21:33
Gaétan Lepage	Is this on staging ?	06:23:26
Gaétan Lepage	All failures that I was able to find on hydra are timeouts or upstream dependency failures. I was able to build `tensordict` on all architectures...	07:05:50
hexa (UTC+1)	this is on trunk	11:03:39
hexa (UTC+1)	then you probably need to increase meta.timeout	11:04:00
Gaétan Lepage	Now that you say it, I remember this package being stuck (indefinitly) during mass rebuilds. I don't know if increasing the timeout will help. When everything works fine, it builds in ~1min... Also, nothing has changed in the derivation for the past few months.	11:47:12
Kevin Mittman (UTC-8)	Back from vacation	18:23:19
Kevin Mittman (UTC-8)	Redacted or Malformed Event	18:32:05

Show newer messages

Back to Room ListRoom Version: 9