!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

271 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.86 Servers

Load older messages


SenderMessageTime
19 Oct 2024
@joerg:thalheim.ioMic92It was confusing as hell anyway15:33:35
@hexa:lossy.networkhexa (signing key rotation when)for you maybe 15:33:45
@joerg:thalheim.ioMic92I had some back from the future moments15:33:49
@hexa:lossy.networkhexa (signing key rotation when)not for me and delroth at the time15:33:52
@joerg:thalheim.ioMic92okay. Thanks for updating this.15:35:31
@hexa:lossy.networkhexa (signing key rotation when)https://md.darmstadt.ccc.de/VXRS-Wk9QvqKxUlNXNk9rA# overview over patches in h.n.o-2.1.9 branch15:37:07
@k900:0upti.meK900I don't think deploying Hydra master without the download slot limit patch is a good idea15:39:10
@hexa:lossy.networkhexa (signing key rotation when)I don't think running an EOL nixos is a good idea either15:39:26
@hexa:lossy.networkhexa (signing key rotation when)🪨 me 🪨 (hard place)15:39:47
@hexa:lossy.networkhexa (signing key rotation when)I might try to port the patches to master, but I'm a dummy, sooooo15:40:15
@vcunat:matrix.orgvcunatIt was (also) meant to increase the throughput of compression of outputs, which seems like a very significant bottleneck these days, right?15:44:26
@hexa:lossy.networkhexa (signing key rotation when)so for example there is https://github.com/nixos/hydra/commit/18466e83261d39b997a73bbd9f0f249c3a91fbeb15:44:59
@hexa:lossy.networkhexa (signing key rotation when)the conflict looks like this15:45:17
@hexa:lossy.networkhexa (signing key rotation when)
diff --cc src/hydra-queue-runner/build-remote.cc
index 1cabd291,b1629595..00000000
--- a/src/hydra-queue-runner/build-remote.cc
+++ b/src/hydra-queue-runner/build-remote.cc
@@@ -43,31 -48,63 +43,44 @@@ static Strings extraStoreArgs(std::stri
      return result;
  }
  
 -static void openConnection(::Machine::ptr machine, Path tmpDir, int stderrFD, SSHMaster::Connection & child)
 +static std::unique_ptr<SSHMaster::Connection> openConnection(
 +    ::Machine::ptr machine, SSHMaster & master)
  {
 -    std::string pgmName;
 -    Pipe to, from;
 -    to.create();
 -    from.create();
 -
 -    Strings argv;
 +    Strings command = {"nix-store", "--serve", "--write"};
      if (machine->isLocalhost()) {
 -        pgmName = "nix-store";
 -        argv = {"nix-store", "--builders", "", "--serve", "--write"};
 +        command.push_back("--builders");
 +        command.push_back("");
      } else {
 -        pgmName = "ssh";
 -        auto sshName = machine->sshName;
 -        Strings extraArgs = extraStoreArgs(sshName);
 -        argv = {"ssh", sshName};
 -        if (machine->sshKey != "") append(argv, {"-i", machine->sshKey});
 -        if (machine->sshPublicHostKey != "") {
 -            Path fileName = tmpDir + "/host-key";
 -            auto p = machine->sshName.find("@");
 -            std::string host = p != std::string::npos ? std::string(machine->sshName, p + 1) : machine->sshName;
 -            writeFile(fileName, host + " " + machine->sshPublicHostKey + "\n");
 -            append(argv, {"-oUserKnownHostsFile=" + fileName});
 -        }
 -        append(argv,
 -            { "-x", "-a", "-oBatchMode=yes", "-oConnectTimeout=60", "-oTCPKeepAlive=yes"
 -            , "--", "nix-store", "--serve", "--write" });
 -        append(argv, extraArgs);
 +        command.splice(command.end(), extraStoreArgs(machine->sshName));
      }
  
 -    child.sshPid = startProcess([&]() {
 -        restoreProcessContext();
 -
 -        if (dup2(to.readSide.get(), STDIN_FILENO) == -1)
 -            throw SysError("cannot dup input pipe to stdin");
 -
 -        if (dup2(from.writeSide.get(), STDOUT_FILENO) == -1)
 -            throw SysError("cannot dup output pipe to stdout");
 +    auto ret = master.startCommand(std::move(command), {
 +        "-a", "-oBatchMode=yes", "-oConnectTimeout=60", "-oTCPKeepAlive=yes"
 +    });
  
 -        if (dup2(stderrFD, STDERR_FILENO) == -1)
 -            throw SysError("cannot dup stderr");
 +    // XXX: determine the actual max value we can use from /proc.
  
 -        execvp(argv.front().c_str(), (char * *) stringsToCharPtrs(argv).data()); // FIXME: remove cast
++<<<<<<< HEAD
 +    // FIXME: Should this be upstreamed into `startCommand` in Nix?
  
 -        throw SysError("cannot start %s", pgmName);
 -    });
 +    int pipesize = 1024 * 1024;
  
 -    to.readSide = -1;
 -    from.writeSide = -1;
 +    fcntl(ret->in.get(), F_SETPIPE_SZ, &pipesize);
 +    fcntl(ret->out.get(), F_SETPIPE_SZ, &pipesize);
  
 +    return ret;
++||||||| parent of 18466e83 (queue-runner: try larger pipe buffer sizes)
++    child.in = to.writeSide.release();
++    child.out = from.readSide.release();
++=======
+     child.in = to.writeSide.release();
+     child.out = from.readSide.release();
+ 
+     // XXX: determine the actual max value we can use from /proc.
+     int pipesize = 1024 * 1024;
+     fcntl(child.in.get(), F_SETPIPE_SZ, &pipesize);
+     fcntl(child.out.get(), F_SETPIPE_SZ, &pipesize);
++>>>>>>> 18466e83 (queue-runner: try larger pipe buffer sizes)
  }
15:45:23
@hexa:lossy.networkhexa (signing key rotation when)which isn't super fun15:45:33
@hexa:lossy.networkhexa (signing key rotation when) and also I cannot access child any longer I think 15:45:57
@hexa:lossy.networkhexa (signing key rotation when)huh, nvm15:47:21
@hexa:lossy.networkhexa (signing key rotation when)that patch was upstreamed15:47:25
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network
https://md.darmstadt.ccc.de/VXRS-Wk9QvqKxUlNXNk9rA# overview over patches in h.n.o-2.1.9 branch
Okay, which one is the most important one?
15:48:51
@hexa:lossy.networkhexa (signing key rotation when)I could pick most of the relevant patches w/o conflicts, so we're probably good15:51:17
@hexa:lossy.networkhexa (signing key rotation when)
1c951d90 queue-runner: release machine reservation while copying outputs
10282dce queue-runner: switch to pseudorandom ordering of builds processing
a3fc251a queue runner: introduce some parallelism for remote paths lookup
56bd53c0 queue-runner: reduce the time between queue monitor restarts
6035a590 queue-runner: remove id > X from new builds query
268ba5e8 queue-runner: add prom metrics to allow detecting internal bottlenecks
3992ccd6 web: include current step status on /machines
ca737c7e queue-runner: limit parallelism of CPU intensive operations
15:51:54
@joerg:thalheim.ioMic92 hexa (signing key rotation when): can we not apply those to hydra master? 15:52:14
@hexa:lossy.networkhexa (signing key rotation when)In theory we could, in practice I don't maintain hydra15:52:33
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network
In theory we could, in practice I don't maintain hydra
I do.
15:52:47
@vcunat:matrix.orgvcunat
a51bd392 queue-runner: limit parallelism of CP

this was the one mentioned. I'm not sure how much it's supposed to improve throughput, though.

15:52:55
@vcunat:matrix.orgvcunat *
a51bd392 queue-runner: limit parallelism of CPU...

this was the one mentioned. I'm not sure how much it's supposed to improve throughput, though.

15:53:22
@hexa:lossy.networkhexa (signing key rotation when)updating rhea15:53:47
@hexa:lossy.networkhexa (signing key rotation when)
In reply to @joerg:thalheim.io
I do.
I think all patches have seen enough exposure on hydra.nixos.org fwiw, but they could use a review, and some were picked from open PRs
15:54:33
@vcunat:matrix.orgvcunat"steps completed per minute" keeps showing weird stalls since the upgrade, but maybe it just hasn't warmed up enough or something.15:54:44
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network
I think all patches have seen enough exposure on hydra.nixos.org fwiw, but they could use a review, and some were picked from open PRs
Ok. I can get John for review
15:55:03

Show newer messages


Back to Room ListRoom Version: 6