GNU bug report logs

#39258 Faster guix search using an sqlite cache

PackageSource(s)Maintainer(s)
guix-patches PTS Buildd Popcon
Full log

Message #307 received at 39258@debbugs.gnu.org (full text, mbox, reply):

Received: (at 39258) by debbugs.gnu.org; 1 Jun 2020 01:25:28 +0000
From debbugs-submit-bounces@debbugs.gnu.org Sun May 31 21:25:28 2020
Received: from localhost ([127.0.0.1]:34190 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1jfZCh-0005eP-Tj
	for submit@debbugs.gnu.org; Sun, 31 May 2020 21:25:28 -0400
Received: from mail-qt1-f193.google.com ([209.85.160.193]:39444)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <zimon.toutoune@gmail.com>) id 1jfZCg-0005e9-MH
 for 39258@debbugs.gnu.org; Sun, 31 May 2020 21:25:27 -0400
Received: by mail-qt1-f193.google.com with SMTP id k22so6551607qtm.6
 for <39258@debbugs.gnu.org>; Sun, 31 May 2020 18:25:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=/gbfEJHcOpkZHNmmRKEOTH6rOPp5GZXuXF3+L3Pe7W4=;
 b=RJVpNU9buXibDcBj67narNogTWh01MzOur+jAvsaj4KC/NNGcSl9grVoc6y5RRRb7e
 fPlff1++tSGa79Yos+N0ViBFaSGLvps9twGpqG+8mdBkTKKQ/rn1Qu+uvEjlKMp71zNs
 dxzaURhDOpVjkPRgojzlNK5T1Kl3n5v+lMCKKtbv4Mt01z8dkEha+TA5hUpwj6a5PqYq
 EO0LKzNsUhgK8kcSwFgzW5t4ieVUkT9kDh06A9gBXo/Lq+A96EmCO7copG+vzrzM0OXn
 E5VILhL+Xk/ffd8BiBy82fUOyhT3UyY+E9jfnhpM5zOSALrzrKqEIMABZlQpATapYtBa
 sdaA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=/gbfEJHcOpkZHNmmRKEOTH6rOPp5GZXuXF3+L3Pe7W4=;
 b=Acw21+JJ9HCUfRptyDhuz2T2Oxw8YNJyfWIFdbtbPB+y4+L0yuxCS1xVjd0FJ3W1+D
 FxCFslVm2O4oFmgOzTsnqQM9MMBWiwXIAmQxca6UCgHfu/ufoj8mcDT3WN8dSycFtlwK
 5UFcHyfOrIFYsggH3gQU577r8jwuOWLzHHgze3H5nfxI40P9m6BkG5TXK9XcPjXEJd5R
 H/KDs0DGwAKHm0Tk2LEwBLDJvUI9BQBMVgsmWIVmxjT/+kQFwCz1IkOVVhX0f20UtzN1
 lP73kjKbkkD36Wq3U1smim++dsqLCtAfZZ2peZgRLt7hb+/YAW/WBVbXMd+5ZwJIFFwp
 cGyA==
X-Gm-Message-State: AOAM530IFsp6097852P9mPEz6MKRKaYktKP5RmQPcHM0paAMc/gElxBy
 ZRXfB5oq1diwehQ9Rx1uBO/RMIW3IYtPDdJrf0I=
X-Google-Smtp-Source: ABdhPJx8NosTpnZl78ZOtvdCZ9rAxvg+5jRMkVvoQL5iAPDN7bnCmA6VMBsEXBJfTEJaXOGWbSwUIl1SfZf+TimgxjQ=
X-Received: by 2002:aed:3169:: with SMTP id 96mr18719772qtg.211.1590974720770; 
 Sun, 31 May 2020 18:25:20 -0700 (PDT)
MIME-Version: 1.0
References: <20200601000030.7443-1-arunisaac@systemreboot.net>
In-Reply-To: <20200601000030.7443-1-arunisaac@systemreboot.net>
From: zimoun <zimon.toutoune@gmail.com>
Date: Mon, 1 Jun 2020 03:25:09 +0200
Message-ID: <CAJ3okZ1aJWz9k2EwYNfPdBmKNzx-OTC27J4bg7as48qxcoLG-w@mail.gmail.com>
Subject: Re: [PATCH v5 0/4] Optimize guix search
To: Arun Isaac <arunisaac@systemreboot.net>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Score: -0.0 (/)
X-Debbugs-Envelope-To: 39258
Cc: Ludovic Courtès <ludo@gnu.org>, 39258@debbugs.gnu.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -1.0 (-)
Hi Arun,

On Mon, 1 Jun 2020 at 02:00, Arun Isaac <arunisaac@systemreboot.net> wrote:

> Sorry for the long delay in replying to this thread.

Based on the Ludo's comments [1] on v4 which is a simple re-write of
your v3, I am finishing a vN+1.. but time flies and I am late on the
topic too. :-)

Well, this still unsent vN+1 series has the same performance of v4 on
"guix pull" which is a key point compared to v3.  Obviously, the
performance on "guix search" are equivalent on both version.  This
vN+1 builds two caches -- to avoid binary breakage -- in only one go;
the consuming 'fold-modules-public-variables*' is applied only once.

[1] http://issues.guix.gnu.org/39258#93


> I think Ludo is right in that we can improve guix search performance with only
> simple code improvements rather than including xapian or improving our
> existing cache. Here are a few patches on those lines.

Well, improving the cache is easy; at least as you did in v3 by adding
another one.
The most annoying part is the arguments rewrite of 'package->recutils'
to be compliant.

However after some comparisons, I am not convinced that BM25 will be
worth to implement...


> In `relevance`, we set our score to 0 if any of the regexps don't match. Then,
> we might as well not match the remaining regexps. Patch 1 does this early cut
> off optimization.

Interesting.


> Often our search strings are only literal strings. So, we can save some time
> by using string-contains instead of invoking the regexp engine. Patch 2 does
> this. In addition, guile's string-contains uses a naive O(n^2) string search
> algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In
> fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code
> mentions this. If implemented, the KMP algorithm could speed up guix search
> further.
>
> [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

Really interesting idea,


> Patch 3 and 4 are minor improvements.
>
> Here's a rough performance comparison.

On cold or warm cache?


> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search game
>
> real    0m2.261s
> user    0m2.351s
> sys     0m0.104s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search game
>
> real    0m2.661s
> user    0m2.843s
> sys     0m0.080s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time ./pre-inst-env guix search strategy game
>
> real    0m1.613s
> user    0m1.635s
> sys     0m0.096s
> --8<---------------cut here---------------end--------------->8---
>
> --8<---------------cut here---------------start------------->8---
> time guix search strategy game
>
> real    0m2.520s
> user    0m2.583s
> sys     0m0.112s
> --8<---------------cut here---------------end--------------->8---

So in the best case, you have the ratio old/new is 1.5; this new
version is 1.5 faster.

Well, in the extra cache approach (v3 or v4) the ration old/new is
really higher: 3.1 faster on cold cache (which is the one I am
interested in) and 2.4 faster on warm cache.


I will give a look to this new series and report what happens on my
laptop.  But basically, I would like "guix search" under the 1.0
second on my machine. ;-)


Thank you for this new input.

Cheers,
simon




Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Thu Jul 10 13:13:52 2025; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.