[PATCH 0/7] Add 'generic-html' updater

  • Done
  • quality assurance status badge
Details
2 participants
  • Léo Le Bouter
  • Ludovic Courtès
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
normal

Debbugs page

L
L
Ludovic Courtès wrote on 13 Mar 2021 13:43
(address . guix-patches@gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214326.28052-1-ludo@gnu.org
Hi!

These patches allow ‘guix refresh’ coverage to go from 78% to 88%
as reported by ‘guix refresh --list-updaters’ (both are probably
slightly overestimated) by adding a new ‘generic-html’ updater.

The updater crawls the web page where the package’s source tarball
is stored, using Guile-Lib’s (htmlprag), which we depend on since
commit 02e2e093e858e8a0ca7bd66c1f1f6fd0a1705edb. Among other things,
it handles freedesktop.org packages.

Feedback welcome!

Thanks,
Ludo’.

Ludovic Courtès (7):
gnu-maintenance: Use (htmlprag) for 'latest-html-release'.
gnu-maintenance: 'latest-html-release' considers non-relative URLs.
gnu-maintenance: 'release-file?' rejects checksum files.
gnu-maintenance: 'latest-html-release' can determine signature file
name.
gnu-maintenance: 'latest-html-release' better computes version number.
gnu-maintenance: Add 'generic-html' updater.
gnu: hwloc: Add 'release-monitoring-url' property.

doc/guix.texi | 6 +-
gnu/packages/mpi.scm | 6 ++
guix/gnu-maintenance.scm | 136 ++++++++++++++++++++++++++++-----------
3 files changed, 108 insertions(+), 40 deletions(-)

--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 1/7] gnu-maintenance: Use (htmlprag) for 'latest-html-release'.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-1-ludo@gnu.org
* guix/gnu-maintenance.scm (html->sxml): Remove. Autoload (htmlprag)
instead.
* doc/guix.texi (Requirements): Mention 'guix refresh' for the Guile-Lib
dependency.
---
doc/guix.texi | 3 ++-
guix/gnu-maintenance.scm | 13 +------------
2 files changed, 3 insertions(+), 13 deletions(-)

Toggle diff (47 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 4cf241c56a..97094a7d0a 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -865,7 +865,8 @@ the @code{crate} importer (@pxref{Invoking guix import}).
@item
@uref{https://www.nongnu.org/guile-lib/doc/ref/htmlprag/, Guile-Lib} for
-the @code{go} importer (@pxref{Invoking guix import}).
+the @code{go} importer (@pxref{Invoking guix import}) and for some of
+the ``updaters'' (@pxref{Invoking guix refresh}).
@item
When @url{http://www.bzip.org, libbz2} is available,
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 9e393d18cd..febed57c3a 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -38,6 +38,7 @@
#:use-module (guix upstream)
#:use-module (guix packages)
#:autoload (zlib) (call-with-gzip-input-port)
+ #:autoload (htmlprag) (html->sxml) ;from Guile-Lib
#:export (gnu-package-name
gnu-package-mundane-name
gnu-package-copyright-holder
@@ -447,18 +448,6 @@ hosted on ftp.gnu.org, or not under that name (this is the case for
;;; Latest HTTP release.
;;;
-(define (html->sxml port)
- "Read HTML from PORT and return the corresponding SXML tree."
- (let ((str (get-string-all port)))
- (catch #t
- (lambda ()
- ;; XXX: This is the poor developer's HTML-to-XML converter. It's good
- ;; enough for directory listings at <https://kernel.org/pub> but if
- ;; needed we could resort to (htmlprag) from Guile-Lib.
- (call-with-input-string (string-replace-substring str "<hr>" "<hr />")
- xml->sxml))
- (const '(html))))) ;parse error
-
(define (html-links sxml)
"Return the list of links found in SXML, the SXML tree of an HTML page."
(let loop ((sxml sxml)
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 2/7] gnu-maintenance: 'latest-html-release' considers non-relative URLs.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-2-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): Allow for URL to be an
arbitrary URL rather than a relative URL reference.
---
guix/gnu-maintenance.scm | 30 ++++++++++++++++--------------
1 file changed, 16 insertions(+), 14 deletions(-)

Toggle diff (48 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index febed57c3a..98d326e500 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -1,5 +1,5 @@
;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 Ludovic Courtès <ludo@gnu.org>
;;; Copyright © 2012, 2013 Nikita Karetnikov <nikita@karetnikov.org>
;;; Copyright © 2021 Simon Tournier <zimon.toutoune@gmail.com>
;;;
@@ -479,19 +479,21 @@ return the corresponding signature URL, or #f it signatures are unavailable."
(port (http-fetch/cached uri #:ttl 3600))
(sxml (html->sxml port)))
(define (url->release url)
- (and (string=? url (basename url)) ;relative reference?
- (release-file? package url)
- (let-values (((name version)
- (package-name->name+version
- (tarball-sans-extension url)
- #\-)))
- (upstream-source
- (package name)
- (version version)
- (urls (list (string-append base-url directory "/" url)))
- (signature-urls
- (list (file->signature
- (string-append base-url directory "/" url))))))))
+ (let* ((base (basename url))
+ (url (if (string=? base url)
+ (string-append base-url directory "/" url)
+ url)))
+ (and (release-file? package base)
+ (let-values (((name version)
+ (package-name->name+version
+ (tarball-sans-extension base)
+ #\-)))
+ (upstream-source
+ (package name)
+ (version version)
+ (urls (list url))
+ (signature-urls
+ (list (file->signature url))))))))
(define candidates
(filter-map url->release (html-links sxml)))
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 3/7] gnu-maintenance: 'release-file?' rejects checksum files.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-3-ludo@gnu.org
* guix/gnu-maintenance.scm (release-file?): Reject ".md5sum",
".sha1sum", and ".sha256sum".
---
guix/gnu-maintenance.scm | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

Toggle diff (17 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 98d326e500..a8b24fa336 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -247,7 +247,9 @@ network to check in GNU's database."
(define (release-file? project file)
"Return #f if FILE is not a release tarball of PROJECT, otherwise return
true."
- (and (not (member (file-extension file) '("sig" "sign" "asc")))
+ (and (not (member (file-extension file)
+ '("sig" "sign" "asc"
+ "md5sum" "sha1sum" "sha256sum")))
(and=> (regexp-exec %tarball-rx file)
(lambda (match)
;; Filter out unrelated files, like `guile-www-1.1.1'.
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 4/7] gnu-maintenance: 'latest-html-release' can determine signature file name.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-4-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): #:file->signature
defaults to #f.
[file->signature/guess]: New procedure.
[url->release]: Use it when FILE->SIGNATURE is #f.
Introduce 'links' variable.
(url-prefix-rewrite): Check whether URL is true before calling
'string-prefix?'.
(latest-savannah-release): Adjust comment about detached signatures.
---
guix/gnu-maintenance.scm | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)

Toggle diff (76 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index a8b24fa336..3bffa4d11e 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -470,16 +470,29 @@ hosted on ftp.gnu.org, or not under that name (this is the case for
#:key
(base-url "https://kernel.org/pub")
(directory (string-append "/" package))
- (file->signature (cut string-append <> ".sig")))
+ file->signature)
"Return an <upstream-source> for the latest release of PACKAGE (a string) on
SERVER under DIRECTORY, or #f. BASE-URL should be the URL of an HTML page,
typically a directory listing as found on 'https://kernel.org/pub'.
-FILE->SIGNATURE must be a procedure; it is passed a source file URL and must
-return the corresponding signature URL, or #f it signatures are unavailable."
- (let* ((uri (string->uri (string-append base-url directory "/")))
- (port (http-fetch/cached uri #:ttl 3600))
- (sxml (html->sxml port)))
+When FILE->SIGNATURE is omitted or #f, guess the detached signature file name,
+if any. Otherwise, FILE->SIGNATURE must be a procedure; it is passed a source
+file URL and must return the corresponding signature URL, or #f it signatures
+are unavailable."
+ (let* ((uri (string->uri (string-append base-url directory "/")))
+ (port (http-fetch/cached uri #:ttl 3600))
+ (sxml (html->sxml port))
+ (links (delete-duplicates (html-links sxml))))
+ (define (file->signature/guess url)
+ (let ((base (basename url)))
+ (any (lambda (link)
+ (any (lambda (extension)
+ (and (string=? (string-append base extension)
+ (basename link))
+ (string-append url extension)))
+ '(".asc" ".sig" ".sign")))
+ links)))
+
(define (url->release url)
(let* ((base (basename url))
(url (if (string=? base url)
@@ -495,10 +508,10 @@ return the corresponding signature URL, or #f it signatures are unavailable."
(version version)
(urls (list url))
(signature-urls
- (list (file->signature url))))))))
+ (list ((or file->signature file->signature/guess) url))))))))
(define candidates
- (filter-map url->release (html-links sxml)))
+ (filter-map url->release links))
(close-port port)
(match candidates
@@ -614,7 +627,7 @@ releases are on gnu.org."
(define (url-prefix-rewrite old new)
"Return a one-argument procedure that rewrites URL prefix OLD to NEW."
(lambda (url)
- (if (string-prefix? old url)
+ (if (and url (string-prefix? old url))
(string-append new (string-drop url (string-length old)))
url)))
@@ -646,9 +659,8 @@ releases are on gnu.org."
(directory (dirname (uri-path uri)))
(rewrite (url-prefix-rewrite %savannah-base
"mirror://savannah")))
- ;; Note: We use the default 'file->signature', which adds ".sig", but not
- ;; all projects on Savannah follow that convention: some use ".asc" and
- ;; perhaps some lack signatures altogether.
+ ;; Note: We use the default 'file->signature', which adds ".sig", ".asc",
+ ;; or whichever detached signature naming scheme PACKAGE uses.
(and=> (latest-html-release package
#:base-url %savannah-base
#:directory directory)
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 5/7] gnu-maintenance: 'latest-html-release' better computes version number.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-5-ludo@gnu.org
* guix/gnu-maintenance.scm (latest-html-release): Use 'tarball->version'
rather than 'package-name->name+version' to extract the version number.
This fixes problems with packages like 'netsurf' and 'libdom' that have
"-src" in their tarball name, where "src" would be taken as the new
version number.
---
guix/gnu-maintenance.scm | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

Toggle diff (21 lines)
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 3bffa4d11e..5aa16acfde 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -499,12 +499,9 @@ are unavailable."
(string-append base-url directory "/" url)
url)))
(and (release-file? package base)
- (let-values (((name version)
- (package-name->name+version
- (tarball-sans-extension base)
- #\-)))
+ (let ((version (tarball->version base)))
(upstream-source
- (package name)
+ (package package)
(version version)
(urls (list url))
(signature-urls
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 7/7] gnu: hwloc: Add 'release-monitoring-url' property.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-7-ludo@gnu.org
* gnu/packages/mpi.scm (hwloc-1)[properties]: New field.
---
gnu/packages/mpi.scm | 6 ++++++
1 file changed, 6 insertions(+)

Toggle diff (19 lines)
diff --git a/gnu/packages/mpi.scm b/gnu/packages/mpi.scm
index 53ee6ef1cd..a8ebd8aeb8 100644
--- a/gnu/packages/mpi.scm
+++ b/gnu/packages/mpi.scm
@@ -66,6 +66,12 @@
(sha256
(base32
"0za1b9lvrm3rhn0lrxja5f64r0aq1qs4m0pxn1ji2mbi8ndppyyx"))))
+
+ (properties
+ ;; Tell the 'generic-html' updater to monitor this URL for updates.
+ `((release-monitoring-url
+ . "https://www-lb.open-mpi.org/software/hwloc/current")))
+
(build-system gnu-build-system)
(outputs '("out" ;'lstopo' & co., depends on Cairo, libx11, etc.
"lib" ;small closure
--
2.30.1
L
L
Ludovic Courtès wrote on 13 Mar 2021 13:46
[PATCH 6/7] gnu-maintenance: Add 'generic-html' updater.
(address . 47126@debbugs.gnu.org)(name . Ludovic Courtès)(address . ludo@gnu.org)
20210313214620.28186-6-ludo@gnu.org
This brings total updater coverage, as reported by 'guix refresh
--list-updaters', from 78% to 88.3%. Among many other things, it covers
freedesktop.org packages.

* guix/gnu-maintenance.scm (html-updatable-package?)
(latest-html-updatable-release): New procedures.
(%generic-html-updater): New variable.
* doc/guix.texi (Invoking guix refresh): Document it.
---
doc/guix.texi | 3 +++
guix/gnu-maintenance.scm | 58 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 60 insertions(+), 1 deletion(-)

Toggle diff (104 lines)
diff --git a/doc/guix.texi b/doc/guix.texi
index 97094a7d0a..89c8c58295 100644
--- a/doc/guix.texi
+++ b/doc/guix.texi
@@ -11693,6 +11693,9 @@ the updater for @uref{https://www.stackage.org, Stackage} packages.
the updater for @uref{https://crates.io, Crates} packages.
@item launchpad
the updater for @uref{https://launchpad.net, Launchpad} packages.
+@item generic-html
+a generic updater that crawls the HTML page where the source tarball of
+the package is hosted, when applicable.
@end table
For instance, the following command only checks for updates of Emacs
diff --git a/guix/gnu-maintenance.scm b/guix/gnu-maintenance.scm
index 5aa16acfde..ced5497b37 100644
--- a/guix/gnu-maintenance.scm
+++ b/guix/gnu-maintenance.scm
@@ -28,6 +28,7 @@
#:use-module (srfi srfi-1)
#:use-module (srfi srfi-11)
#:use-module (srfi srfi-26)
+ #:use-module (srfi srfi-34)
#:use-module (rnrs io ports)
#:use-module (system foreign)
#:use-module (guix http-client)
@@ -66,7 +67,8 @@
%gnu-ftp-updater
%savannah-updater
%xorg-updater
- %kernel.org-updater))
+ %kernel.org-updater
+ %generic-html-updater))
;;; Commentary:
;;;
@@ -697,6 +699,53 @@ releases are on gnu.org."
#:file->signature file->signature)
(cut adjusted-upstream-source <> rewrite))))
+(define html-updatable-package?
+ ;; Return true if the given package may be handled by the generic HTML
+ ;; updater.
+ (let ((hosting-sites '("github.com" "github.io" "gitlab.com"
+ "notabug.org" "sr.ht"
+ "gforge.inria.fr" "gitlab.inria.fr"
+ "ftp.gnu.org" "download.savannah.gnu.org"
+ "pypi.org" "crates.io" "rubygems.org"
+ "bioconductor.org")))
+ (url-predicate (lambda (url)
+ (match (string->uri url)
+ (#f #f)
+ (uri
+ (let ((scheme (uri-scheme uri))
+ (host (uri-host uri)))
+ (and (memq scheme '(http https))
+ (not (member host hosting-sites))))))))))
+
+(define (latest-html-updatable-release package)
+ "Return the latest release of PACKAGE. Do that by crawling the HTML page of
+the directory containing its source tarball."
+ (let* ((uri (string->uri
+ (match (origin-uri (package-source package))
+ ((? string? url) url)
+ ((url _ ...) url))))
+ (custom (assoc-ref (package-properties package)
+ 'release-monitoring-url))
+ (base (or custom
+ (string-append (symbol->string (uri-scheme uri))
+ "://" (uri-host uri))))
+ (directory (if custom
+ ""
+ (dirname (uri-path uri))))
+ (package (package-upstream-name package)))
+ (catch #t
+ (lambda ()
+ (guard (c ((http-get-error? c) #f))
+ (latest-html-release package
+ #:base-url base
+ #:directory directory)))
+ (lambda (key . args)
+ ;; Return false and move on upon connection failures.
+ (unless (memq key '(gnutls-error tls-certificate-error
+ system-error))
+ (apply throw key args))
+ #f))))
+
(define %gnu-updater
;; This is for everything at ftp.gnu.org.
(upstream-updater
@@ -737,4 +786,11 @@ releases are on gnu.org."
(pred (url-prefix-predicate "mirror://kernel.org/"))
(latest latest-kernel.org-release)))
+(define %generic-html-updater
+ (upstream-updater
+ (name 'generic-html)
+ (description "Updater that crawls HTML pages.")
+ (pred html-updatable-package?)
+ (latest latest-html-updatable-release)))
+
;;; gnu-maintenance.scm ends here
--
2.30.1
L
L
Léo Le Bouter wrote on 17 Mar 2021 03:18
[PATCH 0/7] Add 'generic-html' updater
(address . 47126@debbugs.gnu.org)
5a2391930ed890f8cf61da88ccda6df9bb874630.camel@zaclys.net
Hello!

That's awesome thanks a lot Ludo!!

I am wondering, does this handle cases where there's a subfolder with
version and then another tarball with version as well?

Like GNOME for example:

I see this is a generic solution, I see you made available some options
to customize per-package as needed but can we get as precise/reliable
as Debian's watch/uscan with that?

Or if I understand correctly, we should always point it to a page where
the link for the latest release is always published? That last thing
really sounds nice!

Léo
-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEEFIvLi9gL+xax3g6RRaix6GvNEKYFAmBR15EACgkQRaix6GvN
EKYmFxAAsgGHKw+x7IWA4NJMYWvLo0WhY1Ss3wt0CbnOeNDW3aaDV1S39WCs5zPO
5GpYp1Y/jo1PGDgilJL5V/Aof8McoRQi1PttVTmfUDAg2n3q4Uv9eKJlKfEFPopZ
JgxYE/1Y9b+6qjNagBT921AgNpuL0JVdsftdhcOEmmr2ROLTSMLBvHiKc9uCXPDv
e/hqIVYfXtf6p+EJIwAbEruqhPuLFuPcmOKoEm8hd6T/0cBYoRt36g+uQ8Dkyil5
YEPR61XbGo+A6n88uoTmUrCmGvuiQ+uhvI513PlvTrOPtPYvRQf23r5vWii8Dazj
az1BVBH3EnKkZRXhPjE9IG+mwxPq/MWzaZsuyKP5qBwDGZmSrzpzarbaYso4y4La
NdbpCmUMdLYVvK9F/o2UB6P9wyMXH6HCtvZUg86WiylhGnceMj65bZdrnmqWlQAo
f9O6WTwdXU6vD4RgjBpINqsPH2iqGEFkKDP68ZznNTqjAlEbC5vOoWVVIkO32VmI
ZtEf2sQggUFDO9ez+78VgY7he4YXeqfzenDwfyFhSRIe+LKGy0oNbrYuizSkbviy
vOy1YblmEQnE+vG1d7Uwy7CWANBFPSEUJIHZdKrutV2NPH6CexgpktEjFcoeSTu2
PfR/BfIEoIbZMN/4GVwhwcakc08njQV0c7L4JTqQcm2swr/GdAU=
=faOe
-----END PGP SIGNATURE-----


L
L
Ludovic Courtès wrote on 17 Mar 2021 06:52
(name . Léo Le Bouter)(address . lle-bout@zaclys.net)(address . 47126@debbugs.gnu.org)
87lfal51lf.fsf_-_@gnu.org
Hi Léo,

Léo Le Bouter <lle-bout@zaclys.net> skribis:

Toggle quote (2 lines)
> That's awesome thanks a lot Ludo!!

Just pushed this series as fe96f64110676f28b948f0d31a1726501abdae0e.
Unleash your update powers, comrades! :-)

Toggle quote (10 lines)
> I am wondering, does this handle cases where there's a subfolder with
> version and then another tarball with version as well?
>
> Like GNOME for example:
> https://download.gnome.org/sources/NetworkManager/1.31/NetworkManager-1.31.1.tar.xz
>
> I see this is a generic solution, I see you made available some options
> to customize per-package as needed but can we get as precise/reliable
> as Debian's watch/uscan with that?

There’s a ‘gnome’ updater for GNOME:


And yes, it actually works. :-)

In the case of NetworkManager, there’s a bug right now:

Toggle snippet (6 lines)
$ guix refresh network-manager
ni sekvas la redirektigon al 'https://download.gnome.org/sources/NetworkManager/cache.json'...
ni sekvas la redirektigon al 'https://fr2.rpmfind.net/linux/gnome.org/sources/NetworkManager/cache.json'...
gnu/packages/gnome.scm:7648:13: network-manager would be upgraded from 1.24.0 to rc2

I’ll see what’s up. But otherwise ‘guix refresh -t gnome’ produces
sensible results.

At any rate, updaters sometimes bitrot, produce buggy results as in the
example above. Please do use ‘guix refresh’ and report any issues!

Also, there are still ~12% of packages for which none of the updaters
apply. We should investigate and see how we can bring that down to
zero.

Thanks for your feedback!

Ludo’.
L
L
Ludovic Courtès wrote on 17 Mar 2021 06:53
control message for bug #47126
(address . control@debbugs.gnu.org)
87k0q551ky.fsf@gnu.org
tags 47126 fixed
close 47126
quit
?
Your comment

This issue is archived.

To comment on this conversation send an email to 47126@patchwise.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 47126
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch