Hello,
Mathieu Othacehe <othacehe@gnu.org> writes:
Toggle quote (22 lines)
> Hello,
>
> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
> all of this 404 thing, pushing something like:
> https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
> closes the communication.
>
> Any other cause I could be missing?
Looking at multiple of recent 'cannot build missing derivation' build
failures on Cuirass, I see for example:
Toggle snippet (7 lines)
substitute:
substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0%
substitute: [Kcould not fetch http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
So it seems the error originated from guix-publish being too heavily
under load to produce a timely reply, and the nginx proxy issued a 504
(timeout) error response.
Looking into /var/log/guix-publish.log for a corresponding entry, I
found:
Toggle snippet (10 lines)
2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 42> #vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
So the connection was apparently severed (?), resulting in the "broken
pipe" error.
Here's a different one:
Toggle snippet (7 lines)
substitute:
substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0%
substitute: [Kcould not fetch http://141.80.167.131/p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation ?/gnu/store/p2lfyvbxicjqsm4qp6368bx76gp0g948-python-astropy-healpix-0.7.drv?
it occurred around the same time, and the failing mode was the same, per
guix-publish.log:
Toggle snippet (10 lines)
2023-08-21 23:59:35 GET /p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 50> #vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
I wonder if these could be related to the DDoS protection discovered on
the Berlin network. I'll keep looking for other, potentially different
occurrences.
--
Thanks,
Maxim