!RROtHmAaQIkiJzJZZE:nixos.org

NixOS Infrastructure

277 Members
Next Infra call: 2024-07-11, 18:00 CEST (UTC+2) | Infra operational issues backlog: https://github.com/orgs/NixOS/projects/52 | See #infra-alerts:nixos.org for real time alerts from Prometheus.88 Servers

Load older messages


SenderMessageTime
19 Oct 2024
@hexa:lossy.networkhexa (signing key rotation when)
diff --cc src/hydra-queue-runner/build-remote.cc
index 1cabd291,b1629595..00000000
--- a/src/hydra-queue-runner/build-remote.cc
+++ b/src/hydra-queue-runner/build-remote.cc
@@@ -43,31 -48,63 +43,44 @@@ static Strings extraStoreArgs(std::stri
      return result;
  }
  
 -static void openConnection(::Machine::ptr machine, Path tmpDir, int stderrFD, SSHMaster::Connection & child)
 +static std::unique_ptr<SSHMaster::Connection> openConnection(
 +    ::Machine::ptr machine, SSHMaster & master)
  {
 -    std::string pgmName;
 -    Pipe to, from;
 -    to.create();
 -    from.create();
 -
 -    Strings argv;
 +    Strings command = {"nix-store", "--serve", "--write"};
      if (machine->isLocalhost()) {
 -        pgmName = "nix-store";
 -        argv = {"nix-store", "--builders", "", "--serve", "--write"};
 +        command.push_back("--builders");
 +        command.push_back("");
      } else {
 -        pgmName = "ssh";
 -        auto sshName = machine->sshName;
 -        Strings extraArgs = extraStoreArgs(sshName);
 -        argv = {"ssh", sshName};
 -        if (machine->sshKey != "") append(argv, {"-i", machine->sshKey});
 -        if (machine->sshPublicHostKey != "") {
 -            Path fileName = tmpDir + "/host-key";
 -            auto p = machine->sshName.find("@");
 -            std::string host = p != std::string::npos ? std::string(machine->sshName, p + 1) : machine->sshName;
 -            writeFile(fileName, host + " " + machine->sshPublicHostKey + "\n");
 -            append(argv, {"-oUserKnownHostsFile=" + fileName});
 -        }
 -        append(argv,
 -            { "-x", "-a", "-oBatchMode=yes", "-oConnectTimeout=60", "-oTCPKeepAlive=yes"
 -            , "--", "nix-store", "--serve", "--write" });
 -        append(argv, extraArgs);
 +        command.splice(command.end(), extraStoreArgs(machine->sshName));
      }
  
 -    child.sshPid = startProcess([&]() {
 -        restoreProcessContext();
 -
 -        if (dup2(to.readSide.get(), STDIN_FILENO) == -1)
 -            throw SysError("cannot dup input pipe to stdin");
 -
 -        if (dup2(from.writeSide.get(), STDOUT_FILENO) == -1)
 -            throw SysError("cannot dup output pipe to stdout");
 +    auto ret = master.startCommand(std::move(command), {
 +        "-a", "-oBatchMode=yes", "-oConnectTimeout=60", "-oTCPKeepAlive=yes"
 +    });
  
 -        if (dup2(stderrFD, STDERR_FILENO) == -1)
 -            throw SysError("cannot dup stderr");
 +    // XXX: determine the actual max value we can use from /proc.
  
 -        execvp(argv.front().c_str(), (char * *) stringsToCharPtrs(argv).data()); // FIXME: remove cast
++<<<<<<< HEAD
 +    // FIXME: Should this be upstreamed into `startCommand` in Nix?
  
 -        throw SysError("cannot start %s", pgmName);
 -    });
 +    int pipesize = 1024 * 1024;
  
 -    to.readSide = -1;
 -    from.writeSide = -1;
 +    fcntl(ret->in.get(), F_SETPIPE_SZ, &pipesize);
 +    fcntl(ret->out.get(), F_SETPIPE_SZ, &pipesize);
  
 +    return ret;
++||||||| parent of 18466e83 (queue-runner: try larger pipe buffer sizes)
++    child.in = to.writeSide.release();
++    child.out = from.readSide.release();
++=======
+     child.in = to.writeSide.release();
+     child.out = from.readSide.release();
+ 
+     // XXX: determine the actual max value we can use from /proc.
+     int pipesize = 1024 * 1024;
+     fcntl(child.in.get(), F_SETPIPE_SZ, &pipesize);
+     fcntl(child.out.get(), F_SETPIPE_SZ, &pipesize);
++>>>>>>> 18466e83 (queue-runner: try larger pipe buffer sizes)
  }
15:45:23
@hexa:lossy.networkhexa (signing key rotation when)which isn't super fun15:45:33
@hexa:lossy.networkhexa (signing key rotation when) and also I cannot access child any longer I think 15:45:57
@hexa:lossy.networkhexa (signing key rotation when)huh, nvm15:47:21
@hexa:lossy.networkhexa (signing key rotation when)that patch was upstreamed15:47:25
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network
https://md.darmstadt.ccc.de/VXRS-Wk9QvqKxUlNXNk9rA# overview over patches in h.n.o-2.1.9 branch
Okay, which one is the most important one?
15:48:51
@hexa:lossy.networkhexa (signing key rotation when)I could pick most of the relevant patches w/o conflicts, so we're probably good15:51:17
@hexa:lossy.networkhexa (signing key rotation when)
1c951d90 queue-runner: release machine reservation while copying outputs
10282dce queue-runner: switch to pseudorandom ordering of builds processing
a3fc251a queue runner: introduce some parallelism for remote paths lookup
56bd53c0 queue-runner: reduce the time between queue monitor restarts
6035a590 queue-runner: remove id > X from new builds query
268ba5e8 queue-runner: add prom metrics to allow detecting internal bottlenecks
3992ccd6 web: include current step status on /machines
ca737c7e queue-runner: limit parallelism of CPU intensive operations
15:51:54
@joerg:thalheim.ioMic92 hexa (signing key rotation when): can we not apply those to hydra master? 15:52:14
@hexa:lossy.networkhexa (signing key rotation when)In theory we could, in practice I don't maintain hydra15:52:33
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network
In theory we could, in practice I don't maintain hydra
I do.
15:52:47
@vcunat:matrix.orgvcunat
a51bd392 queue-runner: limit parallelism of CP

this was the one mentioned. I'm not sure how much it's supposed to improve throughput, though.

15:52:55
@vcunat:matrix.orgvcunat *
a51bd392 queue-runner: limit parallelism of CPU...

this was the one mentioned. I'm not sure how much it's supposed to improve throughput, though.

15:53:22
@hexa:lossy.networkhexa (signing key rotation when)updating rhea15:53:47
@hexa:lossy.networkhexa (signing key rotation when)
In reply to @joerg:thalheim.io
I do.
I think all patches have seen enough exposure on hydra.nixos.org fwiw, but they could use a review, and some were picked from open PRs
15:54:33
@vcunat:matrix.orgvcunat"steps completed per minute" keeps showing weird stalls since the upgrade, but maybe it just hasn't warmed up enough or something.15:54:44
@joerg:thalheim.ioMic92
In reply to @hexa:lossy.network
I think all patches have seen enough exposure on hydra.nixos.org fwiw, but they could use a review, and some were picked from open PRs
Ok. I can get John for review
15:55:03
@rick:matrix.ciphernetics.nlMindaviAlso take a look at the open PRs, I made one yesterday-ish to fix a Perl issue15:56:12
@rick:matrix.ciphernetics.nlMindaviAnd another one to reduce verbose logging from git, but likely not so important15:56:35
@rick:matrix.ciphernetics.nlMindaviStill nice to have readable logs :)15:56:50
@joerg:thalheim.ioMic92 Mindavi: ok. I will currently one that will make the nixos module use hydra from the repository. Should be easy to review 15:57:21
@joerg:thalheim.ioMic92 * Mindavi: ok. I will currently make one that will make the nixos module use hydra from the repository. Should be easy to review 16:02:20
@hexa:lossy.networkhexa (signing key rotation when)ok, looks like everything still applies to master16:04:20
@k900:0upti.meK900Did the Hydra update kill the unstable-large eval? :(16:13:20
@hexa:lossy.networkhexa (signing key rotation when)probably?16:14:07
@hexa:lossy.networkhexa (signing key rotation when)we have lots of defunct hydra-eval-jobs processes 🫠16:15:06
@hexa:lossy.networkhexa (signing key rotation when)nix-gc is fast now, given that it couldn't gc properly in a long time16:16:41
@hexa:lossy.networkhexa (signing key rotation when)104G free space16:16:49
@k900:0upti.meK900It's evaluating staging-next :(16:19:14
@joerg:thalheim.ioMic92 Mindavi: merged your stuff 16:20:47

Show newer messages


Back to Room ListRoom Version: 6