GNU bug report logs

#24937 "deleting unused links" GC phase is too slow

PackageSource(s)Maintainer(s)
guix PTS Buildd Popcon
Full log

Message #64 received at 24937@debbugs.gnu.org (full text, mbox, reply):

Received: (at 24937) by debbugs.gnu.org; 13 Nov 2021 16:57:05 +0000
From debbugs-submit-bounces@debbugs.gnu.org Sat Nov 13 11:57:05 2021
Received: from localhost ([127.0.0.1]:47977 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
	id 1mlwKv-0002WG-Ex
	for submit@debbugs.gnu.org; Sat, 13 Nov 2021 11:57:05 -0500
Received: from eggs.gnu.org ([209.51.188.92]:32932)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@gnu.org>) id 1mlwKq-0002Vg-6d
 for 24937@debbugs.gnu.org; Sat, 13 Nov 2021 11:57:03 -0500
Received: from [2001:470:142:3::e] (port=55920 helo=fencepost.gnu.org)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@gnu.org>)
 id 1mlwKk-00041I-TM; Sat, 13 Nov 2021 11:56:54 -0500
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org;
 s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To:
 From; bh=4tz40tkbSYLVJ8OXw7jo84weMsglgRfIObcQWKJyixg=; b=fjdGHs5Gr6kfCEPEpkyI
 wQIPb/LWNJJDezEVvgvU4hFEqrcV+eRksgHO2VrCiluBC6j4FXMDDti/55sW7Ch4l5IiMXzXZzkiK
 OHgUVQ1TMDKZ7/TkAdqp94j9BLDv/bCRD/WXLEhGW2vnPQWzhiSLPaqqxvHbNZgoFvvrCtJWig3aR
 OpcPTNGKBzDCzDcbDkXZBe0GlAZxOMhKMJZnE8iDtk/igRPq2qdlBa/Xgyyl5NG5V2MGxh/OHAbIZ
 yKw5be3yUJfPk3Qrbwkf78E7mntZRTyYUOQ7phdQOqkCWQs4zhk9GtN+iZJYgLhPwFxPm9IDYwQoY
 b1EyjB3A1/4AQQ==;
Received: from 91-160-117-201.subs.proxad.net ([91.160.117.201]:55136
 helo=ribbon)
 by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <ludo@gnu.org>)
 id 1mlwKk-0002ck-Hr; Sat, 13 Nov 2021 11:56:54 -0500
From: Ludovic Courtès <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Subject: Re: bug#24937: "deleting unused links" GC phase is too slow
References: <87wpg7ffbm.fsf@gnu.org> <87pmr9l76m.fsf@gnu.org>
 <87v90ys911.fsf@gmail.com>
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 23 Brumaire an 230 de la Révolution
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-pc-linux-gnu
Date: Sat, 13 Nov 2021 17:56:52 +0100
In-Reply-To: <87v90ys911.fsf@gmail.com> (Maxim Cournoyer's message of "Thu, 11
 Nov 2021 15:59:54 -0500")
Message-ID: <87v90wat9n.fsf@gnu.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 24937
Cc: 24937@debbugs.gnu.org
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit@debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request@debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request@debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces@debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
X-Spam-Score: -3.3 (---)
[Message part 1 (text/plain, inline)]
Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I haven't done any analysis, just grabbed the result, but here it what
> it looks for me:

There’s a bit more than 35% of deduplicated files that are < 1KiB, and
not much to be gained by deduplicating them.

On IRC several people shared the results on their machine; several had
similar results, and one person had a lot more of those small files (50%
of deduplicated files were < 1KiB).

The chart (with a kinda bogus layout) below is perhaps more interesting:
it shows the contribution of files below a certain size to the overall
space savings.

[space-saving-contribution.png (image/png, inline)]
[Message part 3 (text/plain, inline)]
In a nutshell:

  • Files < 1KiB contribute to 0.3% of the space savings;

  • Files < 4KiB contribute to 2.5% of the space savings;

  • Files < 256KiB contribute to 42% of the space savings.

You can create this plot with:

--8<---------------cut here---------------start------------->8---
(make-scatter-plot #:title "Contribution to space savings"
                   #:write-to-png "/tmp/space-saving-contribution.png"
                   #:chart-width 1000
                   #:y-axis-label "contribution (%)"
                   #:x-axis-label "size (B)"
                   #:log-x-base 2
                   #:min-x 513
                   #:data
                   (let ((total (saved-space l)))
                     `(("contribution"
                        ,@(map (lambda (size)
                                 (cons size
                                       (/ (saved-space (filter (lambda (file)
                                                                 (< (deduplicated-file-size
                                                                     file)
                                                                    size))
                                                               l))
                                          total .01)))
                               (map (cut expt 2 <>)
                                    (iota 12 10 1)))))))
--8<---------------cut here---------------end--------------->8---

You can also compute individual points like this:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) 1024))
					     l))
			(saved-space l) 1.)
$60 = 0.0034284626558736746
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) 4096))
					     l))
			(saved-space l) 1.)
$62 = 0.025190871178467848
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
					       (< (deduplicated-file-size file) (expt 2 18)))
					     l))
			(saved-space l) 1.)
$65 = 0.42411104869782185
--8<---------------cut here---------------end--------------->8---

Choosing a deduplication threshold of 2KiB or 4KiB would have a
negligible impact on disk usage on my machine.

Thanks,
Ludo’.

Send a report that this bug log contains spam.


debbugs.gnu.org maintainers <help-debbugs@gnu.org>. Last modified: Sun Sep 7 08:41:31 2025; Machine Name: wallace-server

GNU bug tracking system

Debbugs is free software and licensed under the terms of the GNU Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.