GNU bug report logs

#39258 Faster guix search using an sqlite cache

PackageSource(s)Maintainer(s)
guix-patches PTS Buildd Popcon
Full log

Message #283 received at 39258@debbugs.gnu.org (full text, mbox, reply):

Received: (at 39258) by debbugs.gnu.org; 3 May 2020 16:43:53 +0000
From debbugs-submit-bounces@debbugs.gnu.org Sun May 03 12:43:53 2020
Received: from localhost ([127.0.0.1]:57765 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1jVHia-0001U2-95
	for submit@debbugs.gnu.org; Sun, 03 May 2020 12:43:53 -0400
Received: from eggs.gnu.org ([209.51.188.92]:38276)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@gnu.org>) id 1jVHiZ-0001Tq-1f
 for 39258@debbugs.gnu.org; Sun, 03 May 2020 12:43:51 -0400
Received: from fencepost.gnu.org ([2001:470:142:3::e]:55928)
 by eggs.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <ludo@gnu.org>)
 id 1jVHiS-0005fb-Aw; Sun, 03 May 2020 12:43:44 -0400
Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=49660 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@gnu.org>)
 id 1jVHiR-0001GQ-HC; Sun, 03 May 2020 12:43:44 -0400
From: Ludovic Courtès <ludo@gnu.org>
To: zimoun <zimon.toutoune@gmail.com>
Subject: Re: [PATCH v4 0/3] Faster cache generation (similar as v3)
References: <20200503150154.26532-1-zimon.toutoune@gmail.com>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 15 Floréal an 228 de la Révolution
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Sun, 03 May 2020 18:43:41 +0200
In-Reply-To: <20200503150154.26532-1-zimon.toutoune@gmail.com> (zimoun's
 message of "Sun, 3 May 2020 17:01:51 +0200")
Message-ID: <87r1w1ynnm.fsf@gnu.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 39258
Cc: arunisaac@systemreboot.net, mail@ambrevar.xyz, 39258@debbugs.gnu.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
Hello!

zimoun <zimon.toutoune@gmail.com> skribis:

> The aim of this version v4 is to keep the same searching performances as the previous version v3 but to drastically reduce the generation of the cache.  On my laptop, the overhead is now 4 seconds; compared to more than 20 seconds for v2 and v3.
>
> # default
> time guix build /gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-cache.drv --check
> # v4
> time guix build /gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-cache.drv --check
>
> |      | default  | v4        |
> |------+----------+-----------|
> | real | 0m6.012s | 0m10.244s |
> | user | 0m0.541s | 0m0.542s  |
> | sys  | 0m0.033s | 0m0.032s  |

Not bad!

> In the version v3, the cache is built using 'cons' and 'fold-packages' (wrapper to 'fold-module-public-variables').  The version v4 modifies -- by adding other information -- the function 'generate-package-cache' which uses 'vhash' and 'fold-module-public-variables*'.
>
> Therefore the cache '/lib/guix/package.cache' contains more
> information.

This breaks the binary interface, so we’ll have to analyze the impact of
such a change and devise a strategy.

> (The v4 structure of 'package.cache' is a quick draft, so details
> should be discussed and an interesting move should to have a
> structured (binary and all strings) S-exp; because it should become an
> entry point to export the packages list to JSON.  WDYT?)

It’s on purpose that this cache is an object file: it just needs to be
mmap’d, and that’s it.  It’s the cheapest possible way to do it.
Parsing sexps would be more costly, and since we’re talking about
startup time, this is sensitive.

> Now, we are comparing apples to apples and the cost to compute BM25 (v2) is not free at all.  Remember that BM25 is the state-of-the-art of information retrieval (relevance ranking) and it is delegated to Xapian (v2).  I do not know if there is perfomance bottleneck between Guix, Guile-Xapian and Xapian itself but for sure the computation of BM25 is not free.  More about that soon.
>
> To be clear about BM25 and caching, what I have in mind is:
>   1. "guix search --build-index" optionally done by the user if they wants for example the BM25 ranking.

Something that must be done explicitly doesn’t seem great to me.  As a
user, I’d rather not think about search indexes and all.  But I don’t
know, maybe if it happened automatically on the first ‘guix search’
invocation that’d be fine.

>   2. Use BM25 metrics to detect poor package meta-data (synopsis and description); if it worth why not add another checker to "guix lint".

That’d be interesting!

>  1. The name of 'fold-packages*' should be misleading since it does not return "true" packages.

Did you see ‘fold-available-packages’?  It seems you could extend it
instead of introducing ‘fold-packages*’, no?

>  2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not the better.
>
>           (match (package-supported-systems p)
>             (('cache supported-systems)
>              (string-join supported-systems))
>             (_
>              (string-join (package-transitive-supported-systems p)))))
>
>     However it avoids to duplicate code; as it is done in version v3.

I made suggestions to Arun’s v3 about the API here.  Essentially, I
think I proposed having a procedure that takes the list of fields as
keyword parameters, and ‘package->recutils’ would just delegate to that.


>  3. Deprecated packages are displayed (bug in v3 too).
>
>  4. Impolite '@@' is used to access the private license construction.

(guix licenses) could provide a ‘string->license’ procedure.

Stopping here for now because I’m sorta drowning in patch review.  :-)

Thanks for exploring this design space, we’re making progress!

Ludo’.




Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Sun Jun 22 20:54:47 2025; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.