Hi!
Mark H Weaver <mhw@netris.org> skribis:
Toggle quote (2 lines)
> Tobias Geerinckx-Rice <me@tobias.gr> writes:
Toggle quote (40 lines)
>> Are you sure? I was under the impression¹ that this is exactly what
>> ‘proxy_cache_lock on;’ prevents. I'm no nginx guru, obviously, so please
>> — anyone! — correct me if I'm misguided.
>
> I agree that "proxy_cache_lock on" should prevent multiple concurrent
> requests for the same URL, but unfortunately its behavior is quite
> undesirable, and arguably worse than leaving it off in our case. See:
>
> https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_lock
>
> Specifically:
>
> Other requests of the same cache element will either wait for a
> response to appear in the cache or the cache lock for this element to
> be released, up to the time set by the proxy_cache_lock_timeout
> directive.
>
> In our problem case, it takes more than an hour for Hydra to finish
> sending a response for the 'texlive-texmf' nar. During that time, the
> nar will be slowly sent to the first client while it's being packed and
> bzipped on-demand.
>
> IIUC, with "proxy_cache_lock on", we have two choices of how other
> client requests will be treated:
>
> (1) If we increase "proxy_cache_lock_timeout" to a huge value, then
> there will *no* data sent to the other clients until the first
> client has received the entire nar, which means they wait over an
> hour before receiving the first byte. I guess this will result in
> timeouts on the client side.
>
> (2) If "proxy_cache_lock_timeout" is *not* huge, then all other clients
> will get failure responses until the first client has received the
> entire nar.
>
> Either way, this would cause users to see the same download failures
> (requiring user work-arounds like --fallback) that this fix is intended
> to prevent for 'texlive-texmf', but instead of happening only for that
> one nar, it will now happen for *all* large nars.
My understanding is that proxy_cache_lock allows us to avoid spawning
concurrent compression threads of the same item at the same time, while
also avoiding starvation (proxy_cache_lock_timeout should ensure that
nobody ends up waiting until the nar-compression process is done.)
IOW, it should help reduce load in most cases, while introducing small
delays in some cases (if you’re downloading a nar that’s already being
downloaded.)
Toggle quote (5 lines)
> IMO, the best solution is to *never* generate nars on Hydra in response
> to client requests, but rather to have the build slaves pack and
> compress the nars, copy them to Hydra, and then serve them as static
> files using nginx.
The problem is that we want nars to be signed by the master node. Or,
if we don’t require that, we need a PKI that allows us to express the
fact that hydra.gnu.org delegates to the build machines.
Toggle quote (11 lines)
> A far inferior solution, but possibly acceptable and closer to the
> current approach, would be to arrange for all concurrent responses for
> the same nar to be sent incrementally from a single nar-packing process.
> More concretely, while packing and sending a nar response to the first
> client, the data would also be written to a file. Subsequent requests
> for the same nar would be serviced using the equivalent of:
>
> tail --bytes=+0 --follow FILENAME
>
> This way, no one would have to wait an hour to receive the first byte.
Yes. I would think that NGINX does something like that for its caching,
but I don’t know exactly when/how.
Other solutions I’ve thought about:
1. Produce narinfos and nars periodically rather than on-demand and
serve them as static files.
pros: better HTTP latency and bandwidth
pros: allows us to add a Content-Length for nars
cons: doesn’t reduce load on hydra.gnu.org
cons: introduces arbitrary delays in delivering nars
cons: difficult/expensive to know what new store items are available
2. Produce a narinfo and corresponding nar the first time they are
requested. So, the first time we receive “GET foo.narinfo”, return
404 and spawn a thread to compute foo.narinfo and foo.nar. Return
200 only when both are ready.
The precomputed nar{,info}s would be kept in a cache and we could
make sure a narinfo and its nar have the same lifetime, which
addresses one of the problems we have.
pros: better HTTP latency and bandwidth
pros: allows us to add a Content-Length for nars
pros: helps keep narinfo/nar lifetime in sync
cons: doesn’t reduce load on hydra.gnu.org
cons: exposes inconsistency between the store contents and the HTTP
response (you may get 404 even if the thing is actually in
store), but maybe that’s not a problem
Thoughts?
Ludo’.