GNU bug report logs

#20816 Excessive negative cache time for DNS lookups

Package	Source(s)		Maintainer(s)
guix		PTS Buildd Popcon

Reported by Mark H Weaver <mhw@netris.org>
Date 2015-06-15T02:45:03
Severity normal

Reply or subscribe to this bug. View this bug as an mbox, status mbox, or maintainer mbox

Report forwarded to bug-guix@gnu.org:
bug#20816; Package guix. (Mon, 15 Jun 2015 02:45:03 GMT) (full text, mbox, link).

Acknowledgement sent to Mark H Weaver <mhw@netris.org>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org. (Mon, 15 Jun 2015 02:45:04 GMT) (full text, mbox, link).

Message #5 received at submit@debbugs.gnu.org (full text, mbox, reply):

On my GuixSD system, DNS lookup failures (due to lack of network) are
cached for at least 4 minutes, and possibly much longer (I've not had
the patience to wait until it expires).

Even after restarting nscd, the negative cache entry persists for about
another 20 seconds.

I run into this problem quite frequently after waking my system up from
suspend-to-ram.  It typically takes several seconds for wicd to
establish a network connection, and if I make the mistake of triggering
a name lookup of my server (world.peace.net), then I must do the
following:

(1) deco restart nscd
(2) wait about 20 seconds

Looking at %nscd-default-caches in (gnu services base), I see that the
intent is to cache host lookup failures for 20 seconds, and service
lookup failures for an hour.  The fact that the negative cache entry
persists for 20 seconds after restarting nscd would seem to indicate
that nscd considers this to be a host lookup.  So perhaps this is a bug
in nscd.

Is anyone else able to reproduce this problem?

      Mark

Information forwarded to bug-guix@gnu.org:
bug#20816; Package guix. (Mon, 15 Jun 2015 13:16:03 GMT) (full text, mbox, link).

Message #8 received at 20816@debbugs.gnu.org (full text, mbox, reply):

On Sun, Jun 14, 2015 at 10:43 PM, Mark H Weaver <mhw@netris.org> wrote:
> On my GuixSD system, DNS lookup failures (due to lack of network) are
> cached for at least 4 minutes, and possibly much longer (I've not had
> the patience to wait until it expires).
>
> Even after restarting nscd, the negative cache entry persists for about
> another 20 seconds.
>
> I run into this problem quite frequently after waking my system up from
> suspend-to-ram.  It typically takes several seconds for wicd to
> establish a network connection, and if I make the mistake of triggering
> a name lookup of my server (world.peace.net), then I must do the
> following:
>
> (1) deco restart nscd
> (2) wait about 20 seconds
>
> Looking at %nscd-default-caches in (gnu services base), I see that the
> intent is to cache host lookup failures for 20 seconds, and service
> lookup failures for an hour.  The fact that the negative cache entry
> persists for 20 seconds after restarting nscd would seem to indicate
> that nscd considers this to be a host lookup.  So perhaps this is a bug
> in nscd.
>
> Is anyone else able to reproduce this problem?

Yes, I had this exact same issue on Friday after having wireless
network connectivity problems.

- Dave

Information forwarded to bug-guix@gnu.org:
bug#20816; Package guix. (Mon, 15 Jun 2015 20:12:02 GMT) (full text, mbox, link).

Message #11 received at 20816@debbugs.gnu.org (full text, mbox, reply):

[Message part 1 (text/plain, inline)]

Mark H Weaver <mhw@netris.org> skribis:

> On my GuixSD system, DNS lookup failures (due to lack of network) are
> cached for at least 4 minutes, and possibly much longer (I've not had
> the patience to wait until it expires).
>
> Even after restarting nscd, the negative cache entry persists for about
> another 20 seconds.

To reproduce it, I ran ‘guix system vm’ on this:

--8<---------------cut here---------------start------------->8---
(use-modules (gnu) (guix))
(use-package-modules base)

(operating-system
  (host-name "komputilo")
  (timezone "Europe/Berlin")
  (locale "en_US.UTF-8")
  (bootloader (grub-configuration (device "/dev/sdX")))
  (file-systems (cons (file-system
                        (device "root")
                        (title 'label)
                        (mount-point "/")
                        (type "ext4"))
                      %base-file-systems))
  (packages (cons* (canonical-package glibc) %base-packages))
  (services (map (lambda (mservice)
                   (mlet %store-monad ((service mservice))
                     (if (memq 'nscd (service-provision service))
                         (nscd-service (nscd-configuration
                                        (debug-level 10)))
                         mservice)))
                 %base-services)))
--8<---------------cut here---------------end--------------->8---

Then I had one console running “tail -f /var/log/nscd.log”, and another
one where I would run “ping www.gnu.org.”

Initially looking up www.gnu.org fails (because the network is down),
and nscd.log indeed shows:

--8<---------------cut here---------------start------------->8---
Mon Jun 15 21:04:20 2015 - 276: handle_request: request received (Version = 2) from PID 296
Mon Jun 15 21:04:20 2015 - 276: 	GETAI (www.gnu.org)
Mon Jun 15 21:04:20 2015 - 276: Haven't found "www.gnu.org" in hosts cache!
Mon Jun 15 21:04:20 2015 - 276: add new entry "www.gnu.org" of type GETAI for hosts to cache (first)
--8<---------------cut here---------------end--------------->8---

For subsequent requests, it shows fewer details, meaning that the cached
failure is used:

--8<---------------cut here---------------start------------->8---
Mon Jun 15 21:04:25 2015 - 276: handle_request: request received (Version = 2) from PID 297
Mon Jun 15 21:04:25 2015 - 276: 	GETFDHST
Mon Jun 15 21:04:25 2015 - 276: provide access to FD 6, for hosts
--8<---------------cut here---------------end--------------->8---

Twenty seconds after the original failure, we see:

--8<---------------cut here---------------start------------->8---
Mon Jun 15 21:04:40 2015 - 276: pruning hosts cache; time 1434395080
Mon Jun 15 21:04:40 2015 - 276: considering GETAI entry "www.gnu.org", timeout 1434395080
--8<---------------cut here---------------end--------------->8---

So cache expiry seems to be working as expected.

Now, if we enable networking with “dhclient -v eth0” and see what
happens, we get again a series of lookup failures, even after the
www.gnu.org entry has been removed from cache; apparently new failures
get cached.  Eventually, after a bunch of cache prunes (so a few times
20 seconds), lookup succeeds.

The situation is different if you use ‘getent hosts’ instead of ‘ping’–
note that the latter uses getaddrinfo whereas the former uses
gethostbyname{,v6}, which is cached differently.  Here we get a
successful lookup within 20 seconds after networking has been put up.

So my guess is that the GETAI cache is racy or something.

I’ve run nscd with Helgrind, which reports a few things (see attached
file), but I suspect most of it are false positives.

Ludo’.

[vg.log (text/plain, attachment)]

Display info messages

Send a report that this bug log contains spam.

debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Tue Oct 29 14:12:08 2024; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.