Downloading substitutes is too slow upon nginx cache misses

  • Done
  • quality assurance status badge
Details
7 participants
  • dian_cecht
  • Ludovic Courtès
  • Maxim Cournoyer
  • Tobias Geerinckx-Rice
  • Mark H Weaver
  • Florian Pelz
  • Ricardo Wurmus
Owner
unassigned
Submitted by
dian_cecht
Severity
important

Debbugs page

D
D
dian_cecht wrote on 20 Mar 2017 18:44
No notification of cache misses when downloading substitutes
(name . GuixSD)(address . bug-guix@gnu.org)
20170320184449.5ac06051@khaalida
Just ran guix pull and guix package -u, and found some of the programs
download VERY slowly (<100kb/s, usually around 95). I asked on #guix
and lfam mentioned it was probably a cache miss.

It would be nice if there was some notification that a cache miss
happened and the download will likely be slow, otherwise a user might
wonder what problem there is with their connection.
T
T
Tobias Geerinckx-Rice wrote on 20 Mar 2017 19:46
(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)
144e9ba8-af93-fb18-d2b9-f198ae7c11e9@tobias.gr
Hullo,

On 21/03/17 02:44, dian_cecht@zoho.com wrote:
Toggle quote (4 lines)
> Just ran guix pull and guix package -u, and found some of the programs
> download VERY slowly (<100kb/s, usually around 95). I asked on #guix
> and lfam mentioned it was probably a cache miss.

Do you mean that *substitutes* existed, but were not yet on
mirror.hydra.gnu.org and so were silently proxied from the much slower
hydra.gnu.org?

Or did Guix fall back to downloading *source* tarballs from some slow
upstream to build locally?

(I've no access to IRC at the mo'.)

Kind regards,

T G-R
Attachment: signature.asc
D
D
dian_cecht wrote on 20 Mar 2017 19:52
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)
20170320195247.05f72fc9@khaalida
On Tue, 21 Mar 2017 03:46:29 +0100
Tobias Geerinckx-Rice <me@tobias.gr> wrote:

Toggle quote (11 lines)
> Hullo,
>
> On 21/03/17 02:44, dian_cecht@zoho.com wrote:
> > Just ran guix pull and guix package -u, and found some of the
> > programs download VERY slowly (<100kb/s, usually around 95). I
> > asked on #guix and lfam mentioned it was probably a cache miss.
>
> Do you mean that *substitutes* existed, but were not yet on
> mirror.hydra.gnu.org and so were silently proxied from the much slower
> hydra.gnu.org?

The URL displayed during the download was mirror.hydra.gnu.org.

Toggle quote (4 lines)
>
> Or did Guix fall back to downloading *source* tarballs from some slow
> upstream to build locally?

It was a binary download, not source. At least, I don't recall anything
about compiles at any point (and I'm sure it didn't take long enough to
do that; one package was icecat which I'm sure wouldn't have downloaded
at 90k/s then compiled in less than 15 minutes (fwiw, according to my
build logs firefox takes about 2 hours to build, so unless icecat is
magically orders of magnitude faster to build, then I'm sure it was
just a download + install, and not download + compile + install)
T
T
Tobias Geerinckx-Rice wrote on 20 Mar 2017 20:57
(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)
8e7e07d1-563f-666f-2c32-2a772757c86f@tobias.gr
Ahoy,

On 21/03/17 03:52, dian_cecht@zoho.com wrote:
Toggle quote (3 lines)
> The URL displayed during the download was mirror.hydra.gnu.org.
> [...] It was a binary download, not source.

Oh, OK. I'm not an expert on how Hydra's set up these days, but will
assume it's not too different from my own (a fast nginx proxy_cache,
mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).

Whenever you're the first to request a substitute, mirror.hydra.gnu.org
transparently forwards the request to hydra.gnu.org.

The latter has to compress the response on the fly, leading to much
slower transfer speeds. It slowly sends it back to the mirror, which
slowly sends it on to you while also saving it on disc so all subsequent
downloads will be fast — by Hydra standards – and not involve hydra.gnu.org.

Maybe you knew all this, but it's also the reason that...

Toggle quote (5 lines)
> On 21/03/17 02:44, dian_cecht@zoho.com wrote:
> It would be nice if there was some notification that a cache miss
> happened and the download will likely be slow, otherwise a user might
> wonder what problem there is with their connection.

...I'm afraid this makes no sense from guix's point of view.

The term ‘cache miss’ here is an implementation detail of our current
Hydra set-up, not something guix can or IMO should care about. There are
hundreds of reasons why your connection might be slow at any given time.
Guix should just tell you so (it does), not guess why. Or worse: know.

(But if others disagree, we'll have to extend the Hydra API to somehow
relay this information to the client, in the spirit of the modern Web.)

HTTP 200½: OK, fine, but it's Going to Suck.

T G-R
Attachment: signature.asc
D
D
dian_cecht wrote on 20 Mar 2017 21:48
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)
20170320214809.466dc5fe@khaalida
On Tue, 21 Mar 2017 04:57:09 +0100
Tobias Geerinckx-Rice <me@tobias.gr> wrote:

Toggle quote (22 lines)
> Ahoy,
>
> On 21/03/17 03:52, dian_cecht@zoho.com wrote:
> > The URL displayed during the download was mirror.hydra.gnu.org.
> > [...] It was a binary download, not source.
>
> Oh, OK. I'm not an expert on how Hydra's set up these days, but will
> assume it's not too different from my own (a fast nginx proxy_cache,
> mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).
>
> Whenever you're the first to request a substitute,
> mirror.hydra.gnu.org transparently forwards the request to
> hydra.gnu.org.
>
> The latter has to compress the response on the fly, leading to much
> slower transfer speeds. It slowly sends it back to the mirror, which
> slowly sends it on to you while also saving it on disc so all
> subsequent downloads will be fast — by Hydra standards – and not
> involve hydra.gnu.org.
>
> Maybe you knew all this, but it's also the reason that...

I'm not familiar with the implementation details, nor how hydra is
currently setup.

Toggle quote (13 lines)
> > On 21/03/17 02:44, dian_cecht@zoho.com wrote:
> > It would be nice if there was some notification that a cache miss
> > happened and the download will likely be slow, otherwise a user
> > might wonder what problem there is with their connection.
>
> ...I'm afraid this makes no sense from guix's point of view.
>
> The term ‘cache miss’ here is an implementation detail of our current
> Hydra set-up, not something guix can or IMO should care about. There
> are hundreds of reasons why your connection might be slow at any
> given time. Guix should just tell you so (it does), not guess why. Or
> worse: know.

I'm not suggesting having Guix tell me why my network is slow, only if
the download might be slow because it's having to pull from
hydra.gnu.org. Having Guix automagically troubleshoot networking
problems is well beyond the scope of a package manager, even one that
goes as far beyond simple package management as Guix does.

Toggle quote (5 lines)
>
> (But if others disagree, we'll have to extend the Hydra API to somehow
> relay this information to the client, in the spirit of the modern
> Web.)

AFAIK, Guix devs are working on a replacement for the current build
system, so the sane option wouldn't be extending the current hydra
system to handle a new API call, but to try and work this type of
feature into the next system. Unless, of course, something like this
could be done in hydra reasonably easily, in which case why not.

Another option would be to have the mirrors automatically cache the
files as soon as they are available to try. I'd hope this would be how
things are handled already, but one never knows.
T
T
Tobias Geerinckx-Rice wrote on 20 Mar 2017 23:21
(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)
d8962205-0e0f-59ef-c957-923ba9bc01d4@tobias.gr
Mornin',

On 21/03/17 05:48, dian_cecht@zoho.com wrote:
Toggle quote (2 lines)
> I'm not suggesting having Guix tell me why my network is slow,

I never mentioned your network. Your proxied connection to a substitute
server, yes. And, well, this very bug report is for Guix to tell you why
that's slow...

Toggle quote (3 lines)
> only if the download might be slow because it's having to pull from
> hydra.gnu.org.

(Side note: ‘it’ here is mirror.hydra.gnu.org, never a well-configured
Guix client.)

So to implement this, the client would need to display a ‘warning‘
message or flag sent by the substitute server, to notify the user that
their download might be slower... sometimes... by an unknown amount...
possibly?

But see, that wouldn't be true at all on my system (and surely others),
despite being set up nearly identically to Hydra. On the other hand, my
home download speed fluctuates wildly, even between simultaneous
connections to the same server. Whether or not a file is cached makes no
difference. To be told would be noise at best, misleading at worst.

I'd be against this only for those reasons, but I promise I'm not.

It's just all a bit vague, 's all, and my personal opinion is that once
the vagueness is resolved, not much will remain. But who knows.

Toggle quote (5 lines)
> AFAIK, Guix devs are working on a replacement for the current build
> system, so the sane option wouldn't be extending the current hydra
> system to handle a new API call, but to try and work this type of
> feature into the next system.

My point is that it wouldn't be sane, and would be an ugly hack in
either system. Cuirass isn't really different from Hydra is this regard.

Me shut up now :-) I'm more interested in what others have to say.

Kind regards,

T G-R
Attachment: signature.asc
D
D
dian_cecht wrote on 20 Mar 2017 23:49
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)
20170320234912.46680062@khaalida
On Tue, 21 Mar 2017 07:21:54 +0100
Tobias Geerinckx-Rice <me@tobias.gr> wrote:
Toggle quote (8 lines)
> > only if the download might be slow because [mirror.hydra] is having
> > to pull from hydra.gnu.org.
>
> So to implement this, the client would need to display a ‘warning‘
> message or flag sent by the substitute server, to notify the user that
> their download might be slower... sometimes... by an unknown amount...
> possibly?

Simply a notification that mirror.hydra doesn't currently have a cached
version of the file and the download might be slower than normal would
be fine. As-is, looking up and seeing download speeds that amount to
less than 10% of one's normal bandwidth is a bit concerning since it
would seem like there is a problem. In this case, Guix would be giving
the user some notification that something /is/ out of the ordinary, and
possibly save the user some effort trying to determine the cause of the
slowdown.

Toggle quote (5 lines)
> But see, that wouldn't be true at all on my system (and surely
> others), despite being set up nearly identically to Hydra. On the
> other hand, my home download speed fluctuates wildly, even between
> simultaneous connections to the same server.

I'm not sure how any of this matters. If you are running a local Hydra
instance or whatever, then I'd assume you'd be aware of what, if any,
problems that could arise. In this case, I'd hope hydra would allow you
to disable this feature.

Toggle quote (3 lines)
> Whether or not a file is cached makes no difference. To be told would
> be noise at best, is leading at worst.

Had I been notified that mirror.hydra was currently pulling from hydra,
it would have saved me the time of jumping on IRC and asking what was
up, which only worked because someone was in #guix and had an idea of
what was going on; had that not been the case, I would have started
looking for the cause for the slowdown and wasted several minutes (at
least) trying to figure out what was wrong, and since it was on
mirror.hydra's end, I'd have no way to know the slowdown was on their
end and not mine, nor my ISP's problem.

Toggle quote (8 lines)
> > AFAIK, Guix devs are working on a replacement for the current build
> > system, so the sane option wouldn't be extending the current hydra
> > system to handle a new API call, but to try and work this type of
> > feature into the next system.
>
> My point is that it wouldn't be sane, and would be an ugly hack in
> either system.

I don't see how this would have to be "an ugly hack". It's simply a
query and response. The simplest way I can see for this to work would
be for mirror.hydra to either just send the requested file, or a
response that the file isn't cached then start to trickle the file on to
the client.
F
F
Florian Pelz wrote on 21 Mar 2017 05:59
(address . 26201@debbugs.gnu.org)
be6b7b69-5ab9-3d4e-68fe-4d582699b2cc@pelzflorian.de
On Mon, 2017-03-20 at 21:48 -0700, dian_cecht@zoho.com wrote:
Toggle quote (5 lines)
> Another option would be to have the mirrors automatically cache the
> files as soon as they are available to try. I'd hope this would be how
> things are handled already, but one never knows.
>

If it cached everything, it wouldn’t be a cache?
T
T
Tobias Geerinckx-Rice wrote on 21 Mar 2017 07:55
(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)
1bbd8ee3-1745-3642-27ed-f095c732dc11@tobias.gr
Hullo!

On 21/03/17 07:49, dian_cecht@zoho.com wrote:
Toggle quote (4 lines)
> I'm not sure how any of this matters. If you are running a local
> Hydra instance or whatever, then I'd assume you'd be aware of what,
> if any, problems that could arise.

It matters for the reasons mentioned. It's not a ‘local Hydra’ & I have
no idea what problems you're talking about.

My problem is that every invocation of Guix already fills several
screens with Guile cache misses. Adding another warning (‘warning! the
system is working exactly as designed!’) will only serve to make those
other warnings look less silly, and I think that would be a shame.

To clarify:

- Warnings should be scary because warnings should be actionable.
There's nothing the user can or needs to do about a cache miss.
- It would be randomly shown to everyone, since this happens constantly.
- The behaviour warned about is not incorrect or abnormal.
- As already noted, it's how caching works.

Toggle quote (6 lines)
> I don't see how this would have to be "an ugly hack". It's simply a
> query and response. The simplest way I can see for this to work would
> be for mirror.hydra to either just send the requested file, or a
> response that the file isn't cached then start to trickle the file on
> to the client.

Well, yeah... That's the ugly hack. :-)

It's not that your suggestion's hard to implement. In fact, it's
just one line for nginx (which it turns out I already had):

add_header X-Cache-Status $upstream_cache_status;

and 6 lines of lightly-tested Guile (attached)¹. And presto. This thing.

Doesn't mean we should.

Kind regards,

T G-R

¹: Why? Practice. Irony. Light masochism.
From 6d459a442d73628a0628385283c7cf04dff1b797 Mon Sep 17 00:00:00 2001
From: Tobias Geerinckx-Rice <me@tobias.gr>
Date: Tue, 21 Mar 2017 15:31:56 +0100
Subject: [PATCH] http-client: Warn on proxy cache misses.

Still not a good idea.

* guix/http-client.scm (http-fetch): Add #:peek-behind-proxy parameter
to expose caching proxy implementation details as a scary warning.
* guix/scripts/substitute.scm (fetch): Use it.
---
guix/http-client.scm | 10 +++++++++-
guix/scripts/substitute.scm | 3 ++-
2 files changed, 11 insertions(+), 2 deletions(-)

Toggle diff (53 lines)
diff --git a/guix/http-client.scm b/guix/http-client.scm
index 6874c51..2366f5e 100644
--- a/guix/http-client.scm
+++ b/guix/http-client.scm
@@ -2,6 +2,7 @@
;;; Copyright © 2012, 2013, 2014, 2015, 2016, 2017 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2015 Mark H Weaver <mhw@netris.org>
;;; Copyright © 2012, 2015 Free Software Foundation, Inc.
+;;; Copyright © 2017 Tobias Geerinckx-Rice <me@tobias.gr>
;;;
;;; This file is part of GNU Guix.
;;;
@@ -222,7 +223,8 @@ or if EOF is reached."
(define* (http-fetch uri #:key port (text? #f) (buffered? #t)
keep-alive? (verify-certificate? #t)
- (headers '((user-agent . "GNU Guile"))))
+ (headers '((user-agent . "GNU Guile")))
+ (peek-behind-cache? #f))
"Return an input port containing the data at URI, and the expected number of
bytes available or #f. If TEXT? is true, the data at URI is considered to be
textual. Follow any HTTP redirection. When BUFFERED? is #f, return an
@@ -253,8 +255,14 @@ Raise an '&http-get-error' condition if downloading fails."
(http-get uri #:streaming? #t #:port port
#:keep-alive? #t
#:headers headers))
+ ((headers)
+ (response-headers resp))
((code)
(response-code resp)))
+ (when (and peek-behind-cache?
+ (equal? (assoc-ref headers 'x-cache-status) "MISS"))
+ (warning (_ "the caching proxy is working properly!~%"))
+ (warning (_ "and there's nothing you can do about it.~%")))
(case code
((200)
(values data (response-content-length resp)))
diff --git a/guix/scripts/substitute.scm b/guix/scripts/substitute.scm
index faeb019..4a4f115 100755
--- a/guix/scripts/substitute.scm
+++ b/guix/scripts/substitute.scm
@@ -216,7 +216,8 @@ provide."
(unless (or buffered? (not (file-port? port)))
(setvbuf port _IONBF)))
(http-fetch uri #:text? #f #:port port
- #:verify-certificate? #f))))))
+ #:verify-certificate? #f
+ #:peek-behind-cache? #t))))))
(else
(leave (_ "unsupported substitute URI scheme: ~a~%")
(uri->string uri)))))
--
2.9.3
Attachment: signature.asc
D
D
dian_cecht wrote on 21 Mar 2017 08:32
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)
20170321083239.3cbf1e8d@khaalida
On Tue, 21 Mar 2017 15:55:05 +0100
Tobias Geerinckx-Rice <me@tobias.gr> wrote:
Toggle quote (4 lines)
> To clarify:
>
> - Warnings should be scary because warnings should be actionable.

There are warnings and there are errors. Warnings don't have to be
scary; I get them every time I update emacs because of duplicate icons
stored in two different directories in the store. Is that actionable?
Not as far as I am concerned, unless I want to hand delete something
from the store, which, as far as I understand it, shouldn't be done.

Toggle quote (2 lines)
> There's nothing the user can or needs to do about a cache miss.

Please reread the 2nd part of my response in Message #23 in this
bugreport for why this is needed.

Toggle quote (3 lines)
> - It would be randomly shown to everyone, since this happens
> constantly.

Unless mirror.hydra randomly loses data in it's cache from hydra, it
won't be random in the least.

Toggle quote (2 lines)
> - The behaviour warned about is not incorrect or abnormal.

No, but the behavior would inform the user that the unusual and random
slowdown isn't another problem and is because mirror.hydra is having to
update it's cache, which, as I explained before, is useful information.

Toggle quote (2 lines)
> [...]

Quite frankly I'd like someone else to take a look at this bug, if
for no other reason than I'm not sure if we're communicating clearly
with each other here. Most of what you are saying makes no sense
whatsoever and seems to miss the point I have attempted to make.

While I will thank you for actually writing a patch, saying "the
caching proxy is working properly! and there's nothing you can do about
it." seems rather cynical and clearly misses the point of what I'm
requesting here.
D
D
dian_cecht wrote on 21 Mar 2017 08:35
(name . Florian Pelz)(address . pelzflorian@pelzflorian.de)(address . 26201@debbugs.gnu.org)
20170321083536.639716a9@khaalida
On Tue, 21 Mar 2017 13:59:27 +0100
Florian Pelz <pelzflorian@pelzflorian.de> wrote:

Toggle quote (8 lines)
> On Mon, 2017-03-20 at 21:48 -0700, dian_cecht@zoho.com wrote:
> > Another option would be to have the mirrors automatically cache the
> > files as soon as they are available to try. I'd hope this would be
> > how things are handled already, but one never knows.
> >
>
> If it cached everything, it wouldn’t be a cache?

If the point is to reduce the load on hydra, then at some point it
could have everything. If it doesn't, then why have a mirror when it's
just pulling right the source all the time anyways?
T
T
Tobias Geerinckx-Rice wrote on 21 Mar 2017 09:07
(address . dian_cecht@zoho.com)(address . 26201@debbugs.gnu.org)
553699c2-fb50-5cf4-a80d-8ee0a70c039d@tobias.gr
On 21/03/17 16:32, dian_cecht@zoho.com wrote:
Toggle quote (3 lines)
> Unless mirror.hydra randomly loses data in it's cache from hydra, it
> won't be random in the least.

It will. Whether one is first to download from the cache after the
substitute is built is essentially random.

Toggle quote (2 lines)
> Quite frankly I'd like someone else to take a look at this bug,

Glad you agree.

Toggle quote (4 lines)
> if for no other reason than I'm not sure if we're communicating clearly
> with each other here. Most of what you are saying makes no sense
> whatsoever and seems to miss the point I have attempted to make.

I assure you it does not.

Kind regards,

T G-R
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 21 Mar 2017 09:43
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
8760j2wpfy.fsf@gnu.org
Hello!

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (4 lines)
> Oh, OK. I'm not an expert on how Hydra's set up these days, but will
> assume it's not too different from my own (a fast nginx proxy_cache,
> mirror.hydra.gnu.org, in front of a slower build farm, hydra.gnu.org).

I think there’s room for improvement in our nginx config at

For instance, I just discovered ‘proxy_cache_lock’ while looking at
in reducing load on hydra.gnu.org. Surely there are other ways to tweak
caching.

Besides, I’d like to use ‘guix publish’ on hydra.gnu.org. I suspect
it’s going to be faster than Starman (the HTTP server behind Hydra), and
also it uses an in-process gzip by default, as opposed to bzip2 which is
what Hydra uses (better compression ratio, but super CPU-intensive).

At any rate, clients should not paper over server-side performance
issues IMO.

Thanks,
Ludo’.
T
T
Tobias Geerinckx-Rice wrote on 21 Mar 2017 10:08
(address . ludo@gnu.org)(address . 26201@debbugs.gnu.org)
9889a4b5-c300-cd03-1095-1115428067fb@tobias.gr
Ludo',

On 21/03/17 17:43, Ludovic Courtès wrote:
Toggle quote (8 lines)
> I think there’s room for improvement in our nginx config at
> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.
>
> For instance, I just discovered ‘proxy_cache_lock’ while looking at
> <http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful
> in reducing load on hydra.gnu.org. Surely there are other ways to tweak
> caching.

Indeed! For reference, here's my cache configuration.

That's right. Now you can all¹ steal some criminally overpriced Belgian
bandwidth!

server {
server_name substitutes.tobias.gr;
listen [::]:443 ssl http2;
listen 443 ssl http2;

# FIXME move to main LE cert
ssl_certificate substitutes.pem;
ssl_certificate_key substitutes.key;

# "" means ‘inherit from upstream’ here.
add_header Cache-Control "";
# So does ‘off’. This is all a bit hacky.
expires off;
proxy_hide_header Set-Cookie;
proxy_ignore_headers Set-Cookie;

# Almost all traffic is already compressed.
gzip off;

...

location / {
limit_except GET { deny all; }
proxy_pass SUPER_SEKRIT_BACKEND;

add_header X-Cache-Status $upstream_cache_status;

proxy_cache default;
# We allow only GET requests, so don't waste key space:
proxy_cache_key "$request_uri";
proxy_cache_lock on;
proxy_cache_lock_timeout 3h; #yolo
proxy_cache_use_stale error timeout
http_500 http_502 http_503 http_504;
}
...
}

I'm sure it's hardly optimal (or, erm, ‘good’) either but it works.

Toggle quote (5 lines)
> Besides, I’d like to use ‘guix publish’ on hydra.gnu.org. I suspect
> it’s going to be faster than Starman (the HTTP server behind Hydra), and
> also it uses an in-process gzip by default, as opposed to bzip2 which is
> what Hydra uses (better compression ratio, but super CPU-intensive).

Back when I used Hydra-the-software I do so briefly and I think it
worked. But no hard tests.

Toggle quote (3 lines)
> At any rate, clients should not paper over server-side performance
> issues IMO.

Entirely off-topic, but this 'tude is a part of what drew me to Guix in
the first place. So, like, thanks, in general :-)

Kind regards,

T G-R

¹: Just put it *after* mirror.hydra.gnu.org, OK?
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 22 Mar 2017 15:06
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)
87fui50xws.fsf@gnu.org
Hey Tobias,

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (14 lines)
> On 21/03/17 17:43, Ludovic Courtès wrote:
>> I think there’s room for improvement in our nginx config at
>> <https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/mirror.conf>.
>>
>> For instance, I just discovered ‘proxy_cache_lock’ while looking at
>> <http://nginx.org/en/docs/http/ngx_http_proxy_module.html>; looks useful
>> in reducing load on hydra.gnu.org. Surely there are other ways to tweak
>> caching.
>
> Indeed! For reference, here's my cache configuration.
>
> That's right. Now you can all¹ steal some criminally overpriced Belgian
> bandwidth!

Heheh. :-)

Toggle quote (14 lines)
> limit_except GET { deny all; }
> proxy_pass SUPER_SEKRIT_BACKEND;
>
> # https://www.nginx.com/blog/nginx-caching-guide
> add_header X-Cache-Status $upstream_cache_status;
>
> proxy_cache default;
> # We allow only GET requests, so don't waste key space:
> proxy_cache_key "$request_uri";
> proxy_cache_lock on;
> proxy_cache_lock_timeout 3h; #yolo
> proxy_cache_use_stale error timeout
> http_500 http_502 http_503 http_504;

I didn’t fully understand the docs for the last 3 directives here. For
instance, what happens when 10 clients do GET /nar/xyz-texlive? Do the
9 unlucky clients wait for 3 hours and then get 404?

Anyway, thanks for sharing your tips. :-)

Toggle quote (3 lines)
> Entirely off-topic, but this 'tude is a part of what drew me to Guix in
> the first place. So, like, thanks, in general :-)

:-)

Ludo’.
L
L
Ludovic Courtès wrote on 22 Mar 2017 15:22
hydra.gnu.org uses ‘guix publish’ for nars and narinfos
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
87r31pyms2.fsf_-_@gnu.org
Hi again!

Until now hydra.gnu.org was using Hydra (the software) to serve not only
the Web interface but also all the .narinfo and /nar URLs (substitute
meta-data and substitutes).

Starting from now, hydra.gnu.org directs all .narinfo and corresponding
nar requests to ‘guix publish’ instead of Hydra.

‘guix publish’ should be faster and less resource-hungry than Hydra. It
uses in-process gzip for nar compression instead of bzip2 (I chose level
7, which seems to provide compression ratios close to what bzip2
provides with its default compression level, while being 3 times
faster). Unlike Hydra it never forks so for instance, 404 responses for
.narinfo URLs should be quicker. Hopefully, that will improve the
worst-case (cache miss) throughput.

I configured nginx in such a way that the former Hydra-provided /nar
URLs (which are cached in nginx instances, in our
/var/guix/substitute/cache directories, etc.) are still available.
‘guix publish’ uses the /guix/nar URLs while Hydra uses /nar, so the
nginx config redirects to either Hydra or ‘guix publish’ depending on
the URL:


Hydra-provided .narinfos are still cached here and there; they’ll be
progressively expire and be replaced by ‘guix publish’-provided
.narinfos.

Let me know if you notice anything fishy!

Ludo’.
R
R
Ricardo Wurmus wrote on 23 Mar 2017 03:29
(name . Ludovic Courtès)(address . ludo@gnu.org)
87mvccs2uu.fsf@elephly.net
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (7 lines)
> Until now hydra.gnu.org was using Hydra (the software) to serve not only
> the Web interface but also all the .narinfo and /nar URLs (substitute
> meta-data and substitutes).
>
> Starting from now, hydra.gnu.org directs all .narinfo and corresponding
> nar requests to ‘guix publish’ instead of Hydra.

That’s very cool! I’m happy to see more of Hydra replaced.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
M
M
Mark H Weaver wrote on 23 Mar 2017 11:36
Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos
(name . Ludovic Courtès)(address . ludo@gnu.org)
87inmzrgbf.fsf@netris.org
ludo@gnu.org (Ludovic Courtès) writes:

Toggle quote (17 lines)
> Hi again!
>
> Until now hydra.gnu.org was using Hydra (the software) to serve not only
> the Web interface but also all the .narinfo and /nar URLs (substitute
> meta-data and substitutes).
>
> Starting from now, hydra.gnu.org directs all .narinfo and corresponding
> nar requests to ‘guix publish’ instead of Hydra.
>
> ‘guix publish’ should be faster and less resource-hungry than Hydra. It
> uses in-process gzip for nar compression instead of bzip2 (I chose level
> 7, which seems to provide compression ratios close to what bzip2
> provides with its default compression level, while being 3 times
> faster). Unlike Hydra it never forks so for instance, 404 responses for
> .narinfo URLs should be quicker. Hopefully, that will improve the
> worst-case (cache miss) throughput.

Excellent! Any improvement in 404 response time will be very helpful.
I've noticed that spikes of narinfo requests resulting in 404 has been a
major source of overloading on Hydra, because these requests cannot be
cached for very long. The reason: if we cache those failures for N
minutes, this effectively delays the appearance of new nars by N minutes
(if it was requested before that). This forces us to choose a small N
for negative cache entries, which means the cache is not much help here.

One question: what will happen in the case of multiple concurrent
requests for the same nar? Will multiple nar-pack-and-bzip2 processes
be run on-demand? Recall that the nginx proxy will pass all of those
requests through, and not create the cache entry until it has received a
complete response. This has caused us severe problems with huge nars
such as texinfo-texmf, to the point that we had to crudely block those
nar requests. Unfortunately, it is not obvious how to block the
associated narinfo requests due to the lack of job name in the URL, so
this results in failures on the client side that must be manually worked
around.

Thanks,
Mark
T
T
Tobias Geerinckx-Rice wrote on 23 Mar 2017 11:52
(address . mhw@netris.org)
25b2472a-c705-53fe-f94f-04de9a2d484e@tobias.gr
Mark,

On 23/03/17 19:36, Mark H Weaver wrote:
Toggle quote (4 lines)
> One question: what will happen in the case of multiple concurrent
> requests for the same nar? Will multiple nar-pack-and-bzip2 processes
> be run on-demand?

I think this used to be the case with the previous nginx configuration,
but the recent changes pushed by Ludo' were aimed in part at preventing
that.

Toggle quote (2 lines)
> Recall that the nginx proxy will pass all of those requests through,

Are you sure? I was under the impression¹ that this is exactly what
‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
— anyone! — correct me if I'm misguided.

Kind regards,

T G-R

¹:
Attachment: signature.asc
T
T
Tobias Geerinckx-Rice wrote on 23 Mar 2017 12:25
(address . ludo@gnu.org)(address . 26201@debbugs.gnu.org)
a1f7cae6-0d37-6d6b-8ed9-8fd124fc037c@tobias.gr
Ludo',

On 22/03/17 23:06, Ludovic Courtès wrote:
Toggle quote (9 lines)
> Tobias Geerinckx-Rice <me@tobias.gr> skribis:
>> proxy_cache_lock on;
>> proxy_cache_lock_timeout 3h; #yolo
>> proxy_cache_use_stale error timeout
>> http_500 http_502 http_503 http_504;
> I didn’t fully understand the docs for the last 3 directives here. For
> instance, what happens when 10 clients do GET /nar/xyz-texlive? Do the
> 9 unlucky clients wait for 3 hours and then get 404?

From ‘proxy_cache_lock’ [1]:

“When enabled, only one request at a time will be allowed to populate
a new cache element identified according to the proxy_cache_key
directive by passing a request to a proxied server. Other requests
of the same cache element will either wait for a response to appear
in the cache or the cache lock for this element to be released, up
to the time set by the proxy_cache_lock_timeout directive.”

Hmm. Good point: ‘to appear in the cache’, when we don't cache 404s or
even 410s.

I don't actually know.

Kind regards,

T G-R

[1]:
Attachment: signature.asc
M
M
Maxim Cournoyer wrote on 23 Mar 2017 19:15
Re: bug#26201: No notification of cache misses when downloading substitutes
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
87efxnzagb.fsf@gmail.com
Hi!

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (21 lines)
> On 21/03/17 16:32, dian_cecht@zoho.com wrote:
>> Unless mirror.hydra randomly loses data in it's cache from hydra, it
>> won't be random in the least.
>
> It will. Whether one is first to download from the cache after the
> substitute is built is essentially random.
>
>> Quite frankly I'd like someone else to take a look at this bug,
>
> Glad you agree.
>
>> if for no other reason than I'm not sure if we're communicating clearly
>> with each other here. Most of what you are saying makes no sense
>> whatsoever and seems to miss the point I have attempted to make.
>
> I assure you it does not.
>
> Kind regards,
>
> T G-R

Please allow me to jump in and voice my opinion here. To me it doesn't
make sense to concern the Guix client with implementation details of how
the caching of substitutes happen and its impacts.

This situation is bound to change in the future or become irrelevant
(say, if a new build farm would be able to sustain higher transfer
speeds to the cache mirror), or if the caching implementation changes.

If the current cache building implementation is slow to the point of
being a problem it should be fixed (or documented).

Cheers,

Maxim
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEJ9WGpPiQCFQyn/CfEmDkZILmNWIFAljUgVUACgkQEmDkZILm
NWLXkA//fiY5xgNAAbJ+QANhXWNcYsCHfTVm9Zhl/dqq2rnKgUcDs7/vd7AKfQJT
wQmoWJf2Uz+lnGJep5plLxCy1Q0DhmnnfVtjrtcD2Z12IIkfCd0jo2DIFiuVH4LO
PnyhEzQZnSlF/wYPxiyYRkagp5eNQNBeCA8Ym14VP15PXytb7GvrKldH0o3oBBm6
Eht4WjKQ9wWeu5vwcRyWAMxQyPbD1ITpfFRUru1mNgjCmeNRDH7g/q17lQlyXuNA
/QVNoJsT2+FOSdjFhvTPGyWXWtVnWWHzU0XGw3iKYfvAHxxroNP12LzK8Mr/KuUw
Oux6MIrpsdwCoMmtLZmqVkQEYFbXAPoqZftN1OXOqXdIXNmh9fE6ZAlLrqVPkTdn
19bdRONIxZGOS39lIB1SS0jJ4gIehjWU1ZiqgoKIZ/4jArjn+5cd81+yB5rsUDCF
NgCILJRK6TXoaqHCjEj3N0ci3jxrpwtobsAERkiK80tOegPCTCNvIym94y0Zce0Q
pJrSBNjPVq1DFXQ/biGlcDsoVq/eGGY9Ie6WfqGfgjpfmb/Espud/XQYQj7j9Mjm
OTGcu8vd0Q3TING1RjW1FDlI2dfRyIxVda8Zosj1ckS72OIQ2HFWRqQmL/DD44NY
W2qeBfQ3yYHmTalm7ir65Oj9J80AuBpb9KHsbPC5ZzBhuCiP9Io=
=zBw4
-----END PGP SIGNATURE-----

M
M
Mark H Weaver wrote on 24 Mar 2017 01:12
Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
87y3vvozy5.fsf@netris.org
Hi,

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (15 lines)
> On 23/03/17 19:36, Mark H Weaver wrote:
>> One question: what will happen in the case of multiple concurrent
>> requests for the same nar? Will multiple nar-pack-and-bzip2 processes
>> be run on-demand?
>
> I think this used to be the case with the previous nginx configuration,
> but the recent changes pushed by Ludo' were aimed in part at preventing
> that.
>
>> Recall that the nginx proxy will pass all of those requests through,
>
> Are you sure? I was under the impression¹ that this is exactly what
> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
> — anyone! — correct me if I'm misguided.

I agree that "proxy_cache_lock on" should prevent multiple concurrent
requests for the same URL, but unfortunately its behavior is quite
undesirable, and arguably worse than leaving it off in our case. See:


Specifically:

Other requests of the same cache element will either wait for a
response to appear in the cache or the cache lock for this element to
be released, up to the time set by the proxy_cache_lock_timeout
directive.

In our problem case, it takes more than an hour for Hydra to finish
sending a response for the 'texlive-texmf' nar. During that time, the
nar will be slowly sent to the first client while it's being packed and
bzipped on-demand.

IIUC, with "proxy_cache_lock on", we have two choices of how other
client requests will be treated:

(1) If we increase "proxy_cache_lock_timeout" to a huge value, then
there will *no* data sent to the other clients until the first
client has received the entire nar, which means they wait over an
hour before receiving the first byte. I guess this will result in
timeouts on the client side.

(2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
will get failure responses until the first client has received the
entire nar.

Either way, this would cause users to see the same download failures
(requiring user work-arounds like --fallback) that this fix is intended
to prevent for 'texlive-texmf', but instead of happening only for that
one nar, it will now happen for *all* large nars.

Or at least that's what I'd expect based on my reading of the nginx docs
linked above. I haven't tried it.

IMO, the best solution is to *never* generate nars on Hydra in response
to client requests, but rather to have the build slaves pack and
compress the nars, copy them to Hydra, and then serve them as static
files using nginx.

A far inferior solution, but possibly acceptable and closer to the
current approach, would be to arrange for all concurrent responses for
the same nar to be sent incrementally from a single nar-packing process.
More concretely, while packing and sending a nar response to the first
client, the data would also be written to a file. Subsequent requests
for the same nar would be serviced using the equivalent of:

tail --bytes=+0 --follow FILENAME

This way, no one would have to wait an hour to receive the first byte.

What do you think?

Mark
L
L
Ludovic Courtès wrote on 24 Mar 2017 02:25
(name . Mark H Weaver)(address . mhw@netris.org)
87d1d710xc.fsf@gnu.org
Hi!

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (2 lines)
> Tobias Geerinckx-Rice <me@tobias.gr> writes:

[...]

Toggle quote (40 lines)
>> Are you sure? I was under the impression¹ that this is exactly what
>> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
>> — anyone! — correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case. See:
>
> https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock
>
> Specifically:
>
> Other requests of the same cache element will either wait for a
> response to appear in the cache or the cache lock for this element to
> be released, up to the time set by the proxy_cache_lock_timeout
> directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar. During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
> there will *no* data sent to the other clients until the first
> client has received the entire nar, which means they wait over an
> hour before receiving the first byte. I guess this will result in
> timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
> will get failure responses until the first client has received the
> entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.

My understanding is that proxy_cache_lock allows us to avoid spawning
concurrent compression threads of the same item at the same time, while
also avoiding starvation (proxy_cache_lock_timeout should ensure that
nobody ends up waiting until the nar-compression process is done.)

IOW, it should help reduce load in most cases, while introducing small
delays in some cases (if you’re downloading a nar that’s already being
downloaded.)

Toggle quote (5 lines)
> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

The problem is that we want nars to be signed by the master node. Or,
if we don’t require that, we need a PKI that allows us to express the
fact that hydra.gnu.org delegates to the build machines.

Toggle quote (11 lines)
> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file. Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
> tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

Yes. I would think that NGINX does something like that for its caching,
but I don’t know exactly when/how.

Other solutions I’ve thought about:

1. Produce narinfos and nars periodically rather than on-demand and
serve them as static files.

pros: better HTTP latency and bandwidth
pros: allows us to add a Content-Length for nars
cons: doesn’t reduce load on hydra.gnu.org
cons: introduces arbitrary delays in delivering nars
cons: difficult/expensive to know what new store items are available

2. Produce a narinfo and corresponding nar the first time they are
requested. So, the first time we receive “GET foo.narinfo”, return
404 and spawn a thread to compute foo.narinfo and foo.nar. Return
200 only when both are ready.

The precomputed nar{,info}s would be kept in a cache and we could
make sure a narinfo and its nar have the same lifetime, which
addresses one of the problems we have.

pros: better HTTP latency and bandwidth
pros: allows us to add a Content-Length for nars
pros: helps keep narinfo/nar lifetime in sync
cons: doesn’t reduce load on hydra.gnu.org
cons: exposes inconsistency between the store contents and the HTTP
response (you may get 404 even if the thing is actually in
store), but maybe that’s not a problem

Thoughts?

Ludo’.
T
T
Tobias Geerinckx-Rice wrote on 26 Mar 2017 10:35
(address . mhw@netris.org)
1988d01c-1e67-bf47-2b43-cf3551d0651b@tobias.gr
Mark,

On 24/03/17 09:12, Mark H Weaver wrote:
Toggle quote (5 lines)
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> [badly, ed.]

Eh. You're probably (and disappointingly) right.

When configuring my little cache, I had a clear idea of how such a cache
should work (basically, your last scenario below), then looked at the
nginx documentation to find what I had in mind. ‘proxy_cache_lock’ matched.

I should have been more pessimistic and done more testing.
Shame on me, &c. Too much other things on my mind. :-/

Toggle quote (3 lines)
> Or at least that's what I'd expect based on my reading of the nginx docs
> linked above. I haven't tried it.

I can try to do some simple tests tomorrow.

Toggle quote (5 lines)
> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.

A true mirror at last! Do we have the disc space for that?

And could Hydra actually handle compressing *everything*, without an
infinitely growing back-log? I don't have access to any statistics, but
I'm guessing that a fair number of package+versions are never actually
requested, and hence never compressed. This would change that.

Toggle quote (11 lines)
> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file. Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
> tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.

^ This is so obviously the right solution, that it would be
disappointing if nginx really couldn't be made to do it. It already
buffers proxy responses to a temporary file anyway...

Kind regards,

T G-R
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 27 Mar 2017 04:20
Bandwidth when retrieving substitutes
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
8760ivm0dx.fsf_-_@gnu.org
Hi there!

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (8 lines)
> ‘guix publish’ should be faster and less resource-hungry than Hydra. It
> uses in-process gzip for nar compression instead of bzip2 (I chose level
> 7, which seems to provide compression ratios close to what bzip2
> provides with its default compression level, while being 3 times
> faster). Unlike Hydra it never forks so for instance, 404 responses for
> .narinfo URLs should be quicker. Hopefully, that will improve the
> worst-case (cache miss) throughput.

Another interesting data point on the client side this time:

Toggle snippet (37 lines)
$ wget -O- https://mirror.hydra.gnu.org/nar/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 |bunzip2 >/dev/null
--2017-03-27 13:12:50-- https://mirror.hydra.gnu.org/nar/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17
Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720
Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-nix-archive]
Saving to: ‘STDOUT’

- [ <=> ] 53.01M 9.29MB/s in 5.5s

2017-03-27 13:12:55 (9.57 MB/s) - written to stdout [55582050]

$ wget -O- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 |gunzip >/dev/null
--2017-03-27 13:13:00-- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17
Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720
Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-nix-archive]
Saving to: ‘STDOUT’

- [ <=> ] 59.19M 40.8MB/s in 1.4s

2017-03-27 13:13:02 (40.8 MB/s) - written to stdout [62068901]

$ wget -O- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17 >/dev/null
--2017-03-27 13:15:58-- https://mirror.hydra.gnu.org/guix/nar/gzip/v6rq6j9wdx8ixsks05dxhxr26jgmr6z3-mysql-5.7.17
Resolving mirror.hydra.gnu.org (mirror.hydra.gnu.org)... 131.159.14.26, 2001:4ca0:2001:10:225:90ff:fedb:c720
Connecting to mirror.hydra.gnu.org (mirror.hydra.gnu.org)|131.159.14.26|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/x-nix-archive]
Saving to: ‘STDOUT’

- [ <=> ] 59.19M 42.5MB/s in 1.4s

2017-03-27 13:16:00 (42.5 MB/s) - written to stdout [62068901]

40 MB/s vs. 10 MB/s! (Both items were cached on mirror.hydra.gnu.org.)

IOW, bunzip2 was the bottleneck when retrieving substitutes (and that’s
on an i7.) With ‘perf timechart’ we see that bunzip2 is indeed busy
all the time right from the start.

Ludo’.
T
T
Tobias Geerinckx-Rice wrote on 27 Mar 2017 11:47
Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos
(address . 26201@debbugs.gnu.org)(address . ludo@gnu.org)
bad0ed66-6c44-7147-fc3d-01622cf6c62f@tobias.gr
Guix,

On 26/03/17 19:35, Tobias Geerinckx-Rice wrote:
Toggle quote (2 lines)
> I can try to do some simple tests tomorrow.

Two observations:

- ‘proxy_cache_lock_timeout’ alone won't suffice to serialise requests;
‘proxy_cache_lock_age’ must also be set to an equally ridiculously
long span. Otherwise, multiple requests will still be sent to ‘guix
publish’ if they are more than 5s apart. Bleh.

(The problem then becomes that clients will stall while the file is
being cached, as explained by Mark. curl patiently waited.)

- Say client A requests a nar from ‘guix publish’ (no nginx involved).
If another client requests the same nar while A's still downloading,
‘guix publish’ will... silently drop A's connection?
I was not expecting this.

Kind regards,

T G-R
Attachment: signature.asc
L
L
Ludovic Courtès wrote on 28 Mar 2017 07:47
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)(address . 26201@debbugs.gnu.org)
87wpb931cd.fsf@gnu.org
Hey!

Tobias Geerinckx-Rice <me@tobias.gr> skribis:

Toggle quote (13 lines)
> On 26/03/17 19:35, Tobias Geerinckx-Rice wrote:
>> I can try to do some simple tests tomorrow.
>
> Two observations:
>
> - ‘proxy_cache_lock_timeout’ alone won't suffice to serialise requests;
> ‘proxy_cache_lock_age’ must also be set to an equally ridiculously
> long span. Otherwise, multiple requests will still be sent to ‘guix
> publish’ if they are more than 5s apart. Bleh.
>
> (The problem then becomes that clients will stall while the file is
> being cached, as explained by Mark. curl patiently waited.)

Setting ‘proxy_cache_lock_timeout’ to 5s is reasonable I think: if
you’re unlucky, you wait for 5 seconds, and then we get ‘guix publish’
threads serving the same request in parallel; in the most common case,
there’s only ever one instance of a given request being served at a
given time.

Toggle quote (5 lines)
> - Say client A requests a nar from ‘guix publish’ (no nginx involved).
> If another client requests the same nar while A's still downloading,
> ‘guix publish’ will... silently drop A's connection?
> I was not expecting this.

That would be a bug. Do you have an easy way to reproduce?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 8 Apr 2017 14:17
control message for bug #26201
(address . control@debbugs.gnu.org)
87wpauoayn.fsf@gnu.org
retitle 26201 Downloading substitutes is too slow upon nginx cache misses
L
L
Ludovic Courtès wrote on 8 Apr 2017 14:18
(address . control@debbugs.gnu.org)
87vaqeoayc.fsf@gnu.org
severity 26201 important
L
L
Ludovic Courtès wrote on 17 Apr 2017 14:36
Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos
(name . Mark H Weaver)(address . mhw@netris.org)
87inm2ogxl.fsf@gnu.org
Hello,

ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (28 lines)
> Other solutions I’ve thought about:
>
> 1. Produce narinfos and nars periodically rather than on-demand and
> serve them as static files.
>
> pros: better HTTP latency and bandwidth
> pros: allows us to add a Content-Length for nars
> cons: doesn’t reduce load on hydra.gnu.org
> cons: introduces arbitrary delays in delivering nars
> cons: difficult/expensive to know what new store items are available
>
> 2. Produce a narinfo and corresponding nar the first time they are
> requested. So, the first time we receive “GET foo.narinfo”, return
> 404 and spawn a thread to compute foo.narinfo and foo.nar. Return
> 200 only when both are ready.
>
> The precomputed nar{,info}s would be kept in a cache and we could
> make sure a narinfo and its nar have the same lifetime, which
> addresses one of the problems we have.
>
> pros: better HTTP latency and bandwidth
> pros: allows us to add a Content-Length for nars
> pros: helps keep narinfo/nar lifetime in sync
> cons: doesn’t reduce load on hydra.gnu.org
> cons: exposes inconsistency between the store contents and the HTTP
> response (you may get 404 even if the thing is actually in
> store), but maybe that’s not a problem

The ‘wip-publish-baking’ implements #2 as a new option to ‘guix
publish’. It gives some control on the upper bound on CPU usage since
we can specify how many worker threads are used.

I’ll finish it soon so we can experiment with it.

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 18 Apr 2017 14:27
(name . Mark H Weaver)(address . mhw@netris.org)
87o9vts8xb.fsf@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (17 lines)
> 2. Produce a narinfo and corresponding nar the first time they are
> requested. So, the first time we receive “GET foo.narinfo”, return
> 404 and spawn a thread to compute foo.narinfo and foo.nar. Return
> 200 only when both are ready.
>
> The precomputed nar{,info}s would be kept in a cache and we could
> make sure a narinfo and its nar have the same lifetime, which
> addresses one of the problems we have.
>
> pros: better HTTP latency and bandwidth
> pros: allows us to add a Content-Length for nars
> pros: helps keep narinfo/nar lifetime in sync
> cons: doesn’t reduce load on hydra.gnu.org
> cons: exposes inconsistency between the store contents and the HTTP
> response (you may get 404 even if the thing is actually in
> store), but maybe that’s not a problem

Implemented in commit 00753f7038234a0f5a79be3ec9ab949840a18743.

I’ll set up a test instance shortly.

Ludo’.
L
L
Ludovic Courtès wrote on 19 Apr 2017 07:24
Heads-up: hydra.gnu.org uses ‘guix publish --cache’
(name . Mark H Weaver)(address . mhw@netris.org)
87vaq0o4pd.fsf_-_@gnu.org
ludo@gnu.org (Ludovic Courtès) skribis:

Toggle quote (23 lines)
> ludo@gnu.org (Ludovic Courtès) skribis:
>
>> 2. Produce a narinfo and corresponding nar the first time they are
>> requested. So, the first time we receive “GET foo.narinfo”, return
>> 404 and spawn a thread to compute foo.narinfo and foo.nar. Return
>> 200 only when both are ready.
>>
>> The precomputed nar{,info}s would be kept in a cache and we could
>> make sure a narinfo and its nar have the same lifetime, which
>> addresses one of the problems we have.
>>
>> pros: better HTTP latency and bandwidth
>> pros: allows us to add a Content-Length for nars
>> pros: helps keep narinfo/nar lifetime in sync
>> cons: doesn’t reduce load on hydra.gnu.org
>> cons: exposes inconsistency between the store contents and the HTTP
>> response (you may get 404 even if the thing is actually in
>> store), but maybe that’s not a problem
>
> Implemented in commit 00753f7038234a0f5a79be3ec9ab949840a18743.
>
> I’ll set up a test instance shortly.

I ended up deploying it on hydra.gnu.org directly. :-)

Progressively the cached nar/narinfo at {,mirror.}hydra.gnu.org will be
replaced with the new ones. Now, the /guix/nar URLs have a
‘Content-Length’ header you should see a progress bar when downloading
one of these:

Toggle snippet (11 lines)
$ ./pre-inst-env guix build vim
The following file will be downloaded:
/gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566
@ substituter-started /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566 /gnu/store/rnpz1svz4aw75kibb5qb02hhccy2m4y0-guix-0.12.0-7.aabe/libexec/guix/substitute
Downloading https://mirror.hydra.gnu.org/guix/nar/gzip/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566 (23.4MiB installed)...
vim-8.0.0566 7.8MiB 385KiB/s 00:21 [####################] 100.0%

@ substituter-succeeded /gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566
/gnu/store/ax5cm9gr1741pcq17w7bhgss5nvq5470-vim-8.0.0566

This new caching scheme should put an end to caching of truncated nars
in nginx, which has been too frequent lately.

It should also mostly avoid the problem where we have a narinfo for
something but not the corresponding nar, which leads to user frustration
(‘guix’ reports that the thing will be downloaded and eventually fails
with 410 “Gone” while trying to download it), because ‘guix publish’
caches narinfo/nar pairs together. I say “mostly” because nginx caching
in front of ‘guix publish’ makes things more complicated.

The bandwidth issue reported at the beginning of this thread should be
mostly fixed: serving a narinfo or nar URL is now just sendfile(2),
which is the best we can do; 404s on narinfo should be immediate.

Of course, when the machine is overloaded, we’ll still experience
increased latency and lower bandwidth, but that should be less acute
than with the previous setting.

Please report any problems you may have!

Ludo’.
L
L
Ludovic Courtès wrote on 25 Apr 2017 03:11
control message for bug #26201
(address . control@debbugs.gnu.org)
87h91cdcfv.fsf@gnu.org
tags 26201 fixed
close 26201
M
M
Mark H Weaver wrote on 3 May 2017 01:11
Re: bug#26201: hydra.gnu.org uses ‘guix publish ’ for nars and narinfos
(name . Tobias Geerinckx-Rice)(address . me@tobias.gr)
877f1yjr64.fsf@netris.org
Reviving an old thread...

Tobias Geerinckx-Rice <me@tobias.gr> writes:

Toggle quote (12 lines)
>> IMO, the best solution is to *never* generate nars on Hydra in response
>> to client requests, but rather to have the build slaves pack and
>> compress the nars, copy them to Hydra, and then serve them as static
>> files using nginx.
>
> A true mirror at last! Do we have the disc space for that?
>
> And could Hydra actually handle compressing *everything*, without an
> infinitely growing back-log? I don't have access to any statistics, but
> I'm guessing that a fair number of package+versions are never actually
> requested, and hence never compressed. This would change that.

Actually, IIUC, the build slaves are _already_ compressing everything,
and they always have. They compress the build outputs for transmission
back to the master machine. In the current framework, the master
machine immediately decompresses them upon receipt, and this compression
and decompression is considered an internal detail of the network
transport.

Currently, the master machine stores all build outputs uncompressed in
/gnu/store, and then later recompresses them for transmission to users
and other build slaves. The needless decompression and recompression is
a tremendous amount of wasted work on our master machine. That it's all
stored uncompressed is also a significant waste of disk space, which
leads to significant additional costs during garbage collection.

Essentially, my proposal is for the build slaves to be modified to
prepare the compressed NARs in a form suitable for delivery to end users
(and other build slaves) with minimal processing by our master node.
The master node would be significantly modified to receive, store, and
forward NARs explicitly, without ever decompressing them. As far as I
can tell, this would mean strictly less work to do and less data to
store for every machine and in every case.

Ludovic has pointed out that we cannot do this because Hydra must add
its digital signature, and that this digital signature is stored within
the compressed NAR. Therefore, we cannot avoid having the master
machine decompress and recompress every NAR that is delivered to users.

In my opinion, we should change the way we sign NARs. Signatures should
be external to the NARs, not internal. Not only would this allow us to
decentralize production of our NARs, but more importantly, it would
enable a community of independent builders to add their signatures to a
common pool of NARs. Having a common pool of NARs enables us to store
these NARs in a shared distribution network without duplication. We
cannot even have a common pool of NARs if they contain
build-farm-specific data such as signatures.

Thoughts?

Mark
L
L
Ludovic Courtès wrote on 3 May 2017 02:25
(name . Mark H Weaver)(address . mhw@netris.org)
87k25ywaul.fsf@gnu.org
Hello,

Mark H Weaver <mhw@netris.org> skribis:

Toggle quote (22 lines)
> Actually, IIUC, the build slaves are _already_ compressing everything,
> and they always have. They compress the build outputs for transmission
> back to the master machine. In the current framework, the master
> machine immediately decompresses them upon receipt, and this compression
> and decompression is considered an internal detail of the network
> transport.
>
> Currently, the master machine stores all build outputs uncompressed in
> /gnu/store, and then later recompresses them for transmission to users
> and other build slaves. The needless decompression and recompression is
> a tremendous amount of wasted work on our master machine. That it's all
> stored uncompressed is also a significant waste of disk space, which
> leads to significant additional costs during garbage collection.
>
> Essentially, my proposal is for the build slaves to be modified to
> prepare the compressed NARs in a form suitable for delivery to end users
> (and other build slaves) with minimal processing by our master node.
> The master node would be significantly modified to receive, store, and
> forward NARs explicitly, without ever decompressing them. As far as I
> can tell, this would mean strictly less work to do and less data to
> store for every machine and in every case.

I agree that the redundant compression/decompression is terrible. Yet
I’m not sure how to architect a solution where compression is performed
by build machines. The main issue is that offloading and publication
are two independent mechanisms, as things are.

Maybe each build machine for a build farm use-case we could have a
“semi-offloading” mechanism whereby the master spawns a remote build
without retrieving its result, something akin to:

GUIX_DAEMON_SOCKET=ssh://build-machine.example.org \
guix build /gnu/store/…-foo.drv

In addition, the build machine would publish its result via ‘guix
publish’, which the master could then simply mirror and cache with
nginx.

There’s the issue of signatures, but perhaps we could have a more
sophisticated PKI and have the master delegate to build machines…

Then there are other issues such as that of synchronizing the TTL of a
narinfo and its corresponding nar, which --cache addresses.

Tricky!

Toggle quote (14 lines)
> Ludovic has pointed out that we cannot do this because Hydra must add
> its digital signature, and that this digital signature is stored within
> the compressed NAR. Therefore, we cannot avoid having the master
> machine decompress and recompress every NAR that is delivered to users.
>
> In my opinion, we should change the way we sign NARs. Signatures should
> be external to the NARs, not internal. Not only would this allow us to
> decentralize production of our NARs, but more importantly, it would
> enable a community of independent builders to add their signatures to a
> common pool of NARs. Having a common pool of NARs enables us to store
> these NARs in a shared distribution network without duplication. We
> cannot even have a common pool of NARs if they contain
> build-farm-specific data such as signatures.

Currently the signature is in the narinfos, not in nars proper¹. So we
can already add signatures on an externally provided nar, for instance.

There’s a silly limitation currently, which is that the signature is
computed over all the fields of the narinfo. That’s silly because it
means that if you change, say, the compression format or the URL of the
nar, then the signature becomes invalid. We should fix that at some
point.

Ludo’.

¹ For ‘guix publish’. ‘guix archive --export’ appends a signature to
the nar set.
?
Your comment

This issue is archived.

To comment on this conversation send an email to 26201@patchwise.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 26201
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch