Report forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Sun, 23 Dec 2018 14:21:02 GMT) (full text, mbox, link).
Acknowledgement sent
to Ludovic Courtès <ludo@gnu.org>:
New bug report received and forwarded. Copy sent to bug-guix@gnu.org.
(Sun, 23 Dec 2018 14:21:02 GMT) (full text, mbox, link).
Thanks for looking into this, Ludo.
At first glance, I'd say that this is not a compilation option but the way
strings are encoded by default. It seems that multibyte encoding is used all
over the place by a few compilers including SBCL (and CCL I think).
One way I know around this (I'm by no mean a Common Lisp expert) is the
flexi-streams package for re-encoding.
More generally, shouldn't we make the reference scanner a bit smarter? In
particular, how does it handle non-ASCII references? Maybe it would not be
unreasonable to handle UTF-8 and UCS-4 for instance?
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Sun, 23 Dec 2018 18:32:35 +0100
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
[...]
>> Apparently this string literal is stored as UTF-32 (UCS-4) or similar,
>> which prevents the reference scanner and the grafting code from finding
>> it, and problems ensue. :-)
>
> IMO, we should consider modifying Guix to search for store references
> encoded in UTF-32 and/or UTF-16. I wouldn't be surprised if some other
> programs use those encodings. I'd be willing to work on it.
I don’t think we’ve encountered the problem before. This would require
fixing both the scanner and the grafting code (though eventually that
might be a single code base when the Scheme-implemented daemon is
merged) in non-trivial ways.
One issue is that users of an old daemon would get a different behavior
than users of a new daemon. It would be the first time we introduce
such a significant change in the daemon since Guix was started.
For now I lean towards looking for a way to address the issue
specifically for SBCL. I’d be tempted to generalize if and only if we
find other occurrences of the problem that would make the benefits
outweigh the development and maintenance costs.
WDYT?
I remember discussing in the past some sort of “pluggable” reference
scanning mechanism that could also work for compressed archives, etc.
That also looks like the right thing, but it has a development and
maintenance cost that’s pretty high whereas we might be able to address
the same problems in much simpler ways.
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Sun, 23 Dec 2018 22:02:02 GMT) (full text, mbox, link).
> I don’t think we’ve encountered the problem before.
Actually it does ring a bell for me. Didn't we have a similar issue with Fish,
or some dependency?
> For now I lean towards looking for a way to address the issue
> specifically for SBCL.
Don't forget that we currently have 5 Lisp compilers.
Besides, it's not clear that this can be fixed on the compiler's side, it could
very well be that patches will be required on a per-project basis.
--
Pierre Neidhardt
https://ambrevar.xyz/
Severity set to 'important' from 'normal'
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Mon, 24 Dec 2018 14:56:02 GMT) (full text, mbox, link).
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Mon, 24 Dec 2018 14:58:02 GMT) (full text, mbox, link).
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Mon, 24 Dec 2018 15:57:50 +0100
Hi!
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> Thanks for looking into this, Ludo.
>
> At first glance, I'd say that this is not a compilation option but the way
> strings are encoded by default. It seems that multibyte encoding is used all
> over the place by a few compilers including SBCL (and CCL I think).
>
> One way I know around this (I'm by no mean a Common Lisp expert) is the
> flexi-streams package for re-encoding.
OK, we need to investigate.
> More generally, shouldn't we make the reference scanner a bit smarter? In
> particular, how does it handle non-ASCII references? Maybe it would not be
> unreasonable to handle UTF-8 and UCS-4 for instance?
Store file names are always ASCII so problems arise when they are stored
as UTF-16 or UTF-32/UCS-4.
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Mon, 24 Dec 2018 15:07:01 GMT) (full text, mbox, link).
Cc: Mark H Weaver <mhw@netris.org>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Mon, 24 Dec 2018 16:06:09 +0100
Hi Pierre,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>> I don’t think we’ve encountered the problem before.
>
> Actually it does ring a bell for me. Didn't we have a similar issue with Fish,
> or some dependency?
We did have a problem with Fish but I can no longer find it. Do you
remember what it was? Something with C++, no?
>> For now I lean towards looking for a way to address the issue
>> specifically for SBCL.
>
> Don't forget that we currently have 5 Lisp compilers.
> Besides, it's not clear that this can be fixed on the compiler's side, it could
> very well be that patches will be required on a per-project basis.
I know little about CL but maybe we can find a solution that works for
all five compilers. At least that would be the first approach I would
suggest following.
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Mon, 24 Dec 2018 17:10:01 GMT) (full text, mbox, link).
> Store file names are always ASCII so problems arise when they are stored
> as UTF-16 or UTF-32/UCS-4.
I understand that most programs stick to ASCII filenames, but what about the odd
one using non-English, special characters?
> We did have a problem with Fish but I can no longer find it. Do you
> remember what it was? Something with C++, no?
I think bug #30265.
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Mon, 24 Dec 2018 13:12:23 -0500
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>
>>> For now I lean towards looking for a way to address the issue
>>> specifically for SBCL.
>>
>> Don't forget that we currently have 5 Lisp compilers.
>> Besides, it's not clear that this can be fixed on the compiler's side, it could
>> very well be that patches will be required on a per-project basis.
>
> I know little about CL but maybe we can find a solution that works for
> all five compilers. At least that would be the first approach I would
> suggest following.
I can't imagine a solution that would work for all five compilers, but
perhaps that's a failure of imagination on my part. Of course, you're
welcome to search for such a solution. Can you give me a rough outline
of what you have in mind?
Of course, the usual reason to choose UTF-32 is to support non-ASCII
characters while retaining fixed-width code points, so that string
lookups are straightforward and efficient. Using UTF-8 improves space
efficiency, but at the cost of extra code complexity. That extra
complexity is what I guess we would need to add to each program that
currently uses UTF-32. Alternatively, we could extend the on-disk
format to support UTF-8 and then add some kind of "load hook" that
converts the string to UTF-32 at load time. Either way, it's likely to
be a can of worms.
Consider the case of Guile. Years ago we agreed to switch to UTF-8 as
its sole internal string encoding, but it hasn't yet been done because
it's a big job, even for those of us already intimately familiar with
the code.
Now imagine how hard it would be for someone who barely uses Guile, but
nevertheless felt compelled to change our internal string representation
to use UTF-8. Moreover, imagine that they hoped to find a single
solution that would work for several different Scheme implementations.
What would you say to them if they proposed to find a general solution
to convert several Scheme implementations to use UTF-8 as their string
representation, to save themselves the trouble of having to understand
each implementation individually?
I really think it would be a mistake to try to force every program and
language implementation to use our preferred string representation. I
suspect it would be vastly easier to compromise and support a few other
popular string representations in Guix, namely UTF-16 and UTF-32.
If you don't want to change the daemon, it could be worked around in our
build-side code as follows: we could add a new phase to certain build
systems (or possibly gnu-build-system) that scans each output for
UTF-16/32 encoded store references that are never referenced in UTF-8.
If such references exist, a file with an unobtrusive name would be added
to that output containing those references encoded in UTF-8. This would
enable our daemon's existing reference scanner to find all of the
references.
Our grafting code would then need to be extended to recognize and
transform store references encoded in UTF-16/32 as well as UTF-8.
What do you think?
Regards,
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Mon, 24 Dec 2018 23:59:02 GMT) (full text, mbox, link).
Cc: Mark H Weaver <mhw@netris.org>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Wed, 26 Dec 2018 17:07:33 +0100
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>> Store file names are always ASCII so problems arise when they are stored
>> as UTF-16 or UTF-32/UCS-4.
>
> I understand that most programs stick to ASCII filenames, but what about the odd
> one using non-English, special characters?
That’s a separate debate. :-) Essentially this restriction on store
file names has always been there in Guix (and Nix before that). If we
were to change it, that would raise compatibility issues.
>> We did have a problem with Fish but I can no longer find it. Do you
>> remember what it was? Something with C++, no?
>
> I think bug #30265.
Oh I see, UCS-4 as well. (I can’t believe this bug is still open given
the relatively simple solutions outlined at
<https://issues.guix.info/issue/30265#8>. :-))
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Wed, 26 Dec 2018 16:15:01 GMT) (full text, mbox, link).
Cc: Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Wed, 26 Dec 2018 17:14:09 +0100
Hello!
Mark H Weaver <mhw@netris.org> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>>
>>>> For now I lean towards looking for a way to address the issue
>>>> specifically for SBCL.
>>>
>>> Don't forget that we currently have 5 Lisp compilers.
>>> Besides, it's not clear that this can be fixed on the compiler's side, it could
>>> very well be that patches will be required on a per-project basis.
>>
>> I know little about CL but maybe we can find a solution that works for
>> all five compilers. At least that would be the first approach I would
>> suggest following.
>
> I can't imagine a solution that would work for all five compilers, but
> perhaps that's a failure of imagination on my part. Of course, you're
> welcome to search for such a solution. Can you give me a rough outline
> of what you have in mind?
I have nothing specific in mind, I’m just brainstorming with everyone
here. :-)
For a similar situation in C++, there’s a fairly simple and local
workaround:
https://issues.guix.info/issue/30265#8
I’m not familiar with CL but I thought that it we could achieve
something similar, that would be great—I’m not suggesting to change the
CL compilers in any non-trivial way.
For example I guess we could always store the file name as a literal
byte vector/list and add a call to turn that into a string.
Does that make sense?
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 27 Dec 2018 10:38:01 GMT) (full text, mbox, link).
> : > Store file names are always ASCII so problems arise when they are stored
> : > as UTF-16 or UTF-32/UCS-4.
> :
> : I understand that most programs stick to ASCII filenames, but what about the odd
> : one using non-English, special characters?
>
> That’s a separate debate. :-) Essentially this restriction on store
> file names has always been there in Guix (and Nix before that). If we
> were to change it, that would raise compatibility issues.
But what happens if we attempt to store "á" in the store?
> For example I guess we could always store the file name as a literal
> byte vector/list and add a call to turn that into a string.
In the case of Next, that would be a simple patch, but other programs could get
much more complicated. In the end, this approach requires a linear amount of
work. Conversely, adding UCS-* support to the scanner would fix this issue once
and for all.
> : > We did have a problem with Fish but I can no longer find it. Do you
> : > remember what it was? Something with C++, no?
> :
> : I think bug #30265.
>
> Oh I see, UCS-4 as well. (I can’t believe this bug is still open given
> the relatively simple solutions outlined at
> <https://issues.guix.info/issue/30265#8>. :-))
Well, if currently only two packages out of 8500+ suffer from this, then I think
it's easier to go with Ludo's suggestion of patching the code to use ASCII
strings.
Does anyone know about more packages with this issue? It could also be that
more packages suffer from this, unbeknownst to us.
--
Pierre Neidhardt
https://ambrevar.xyz/
Hi Mark,
On Mon, 24 Dec 2018 13:12:23 -0500
Mark H Weaver <mhw@netris.org> wrote:
> Of course, the usual reason to choose UTF-32 is to support non-ASCII
> characters while retaining fixed-width code points, so that string
> lookups are straightforward and efficient.
This kind of lookup is almost never what is necessary. There are many
people who assume character is the same as codepoint and to those people
UTF-32 brings something to the table, but it's really not useful if people
do text processing correctly, see below.
(Of course whether packages actually do this remains to be seen)
> Using UTF-8 improves space efficiency, but at the cost of extra code
>complexity.
I agree.
> That extra
> complexity is what I guess we would need to add to each program that
> currently uses UTF-32.
Yes, but they usually have to do stream processing even with UTF-32 (because
a character can be composed of possibly infinite number of codepoints),
so the infrastructure should be already there and the effort should be
minimal.
> Alternatively, we could extend the on-disk
> format to support UTF-8 and then add some kind of "load hook" that
> converts the string to UTF-32 at load time. Either way, it's likely to
> be a can of worms.
If it ever came to that, a pluggable reference scanner would be
preferrable. But really, it would irk me to have so much complexity
in something so basic (the reference scanner) for no end-user gain
(as a distribution we could just mandate UTF-8 for references and the
problem would be gone for the user with no loss of functionality).
It's always easy to add special cases - but more code means more bugs
and I think if possible it's best to have only the simple case implemented
in the core - because it's less complicated which means more likely
to be correct (for the case it does handle). In the end it depends on
what would be more code, and more widely used.
Also, if we wanted to debug reference errors, we couldn't use grep anymore
because it can't handle utf-32 either (neither can any of the other UNIX tools).
Also, I really don't want to return to the time where I had to call iconv
once every three commands to be able to do anything useful on UNIX.
Also, the build daemon is written in C++ and C++ strings are widely
known to have very very bad codepoint awareness (to say nothing about
the horrible conversion facilities).
Also, if both UTF-32 and UTF-8 are used on disk, care needs to not misdetect
an UTF-8 sequence as an UTF-32 sequence of different text - or the other way
around -, but that's unlikely for ASCII strings.
> I really think it would be a mistake to try to force every program and
> language implementation to use our preferred string representation. I
> suspect it would be vastly easier to compromise and support a few other
> popular string representations in Guix, namely UTF-16 and UTF-32.
In 1992, UTF-8 was invented. Subsequently, most of the Internet,
all new GNU Linux distributions etc, all UNIX GUI frameworks, Subversion
etc standardized on UTF-8, with the eventual goal of standardizing all
network transfer and storage to UTF-8. I think that by now the outliers
are the ones who need to change, otherwise these senseless encoding
conversions will never cease. It's not like different encodings allow for
better expression of writings or anything useful to the end user.
As a distribution we can't force upstream to change, but just filing
bug reports upstream would make us see where they stand on this.
> If you don't want to change the daemon, it could be worked around in our
> build-side code as follows: we could add a new phase to certain build
> systems (or possibly gnu-build-system) that scans each output for
> UTF-16/32 encoded store references that are never referenced in UTF-8.
> If such references exist, a file with an unobtrusive name would be added
> to that output containing those references encoded in UTF-8. This would
> enable our daemon's existing reference scanner to find all of the
> references.
I agree that that would be nice. As a first step, even just detecting
problems like that and erroring out would be okay - in order to find them
in the first place. Right now, it's difficult to detect and so also difficult
to say how wide-spread the problem is. If the problem is wide-spread enough
my tune could change very quickly.
What you propose is similar to what I did in Java in Guix, only it gives
us even more advantages in the Java case (faster class loading and
eventual non-propagated inputs).
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 27 Dec 2018 09:03:12 -0500
Pierre Neidhardt <mail@ambrevar.xyz> writes:
>> : > Store file names are always ASCII so problems arise when they are stored
>> : > as UTF-16 or UTF-32/UCS-4.
>> :
>> : I understand that most programs stick to ASCII filenames, but what about the odd
>> : one using non-English, special characters?
>>
>> That’s a separate debate. :-) Essentially this restriction on store
>> file names has always been there in Guix (and Nix before that). If we
>> were to change it, that would raise compatibility issues.
>
> But what happens if we attempt to store "á" in the store?
Indeed. Although we might restrict the immediate entries within
/gnu/store to ASCII characters, file names deeper within those
directories may have non-ASCII characters. More generally, store
references may occur within larger strings which might include non-ASCII
characters.
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 27 Dec 2018 14:31:02 GMT) (full text, mbox, link).
Cc: Ludovic Courtès <ludo@gnu.org>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 27 Dec 2018 09:29:42 -0500
Hi Danny,
Danny Milosavljevic <dannym@scratchpost.org> writes:
> On Mon, 24 Dec 2018 13:12:23 -0500
> Mark H Weaver <mhw@netris.org> wrote:
>
>> Of course, the usual reason to choose UTF-32 is to support non-ASCII
>> characters while retaining fixed-width code points, so that string
>> lookups are straightforward and efficient.
>
> This kind of lookup is almost never what is necessary. There are many
> people who assume character is the same as codepoint and to those people
> UTF-32 brings something to the table, but it's really not useful if people
> do text processing correctly, see below.
>
> (Of course whether packages actually do this remains to be seen)
>
>> That extra
>> complexity is what I guess we would need to add to each program that
>> currently uses UTF-32.
>
> Yes, but they usually have to do stream processing even with UTF-32 (because
> a character can be composed of possibly infinite number of codepoints),
I agree with you. However, as silly as it might be, the fact remains
that almost every modern programming language and string library uses
code points as the base units by which to index strings.
> so the infrastructure should be already there and the effort should be
> minimal.
The infrastructure might or might not be there, depending on the
sophistication of the program's unicode support, but even if it _is_
there, it will most likely be a layer that expects to iterate over
strings indexed by code point to find graphemes, etc.
Anyway, if you truly believe the effort should be minimal, feel free to
investigate and propose patches to fix our 5 common lisp compilers and
Fish to avoid storing UTF-32 in the object code.
> Also, if both UTF-32 and UTF-8 are used on disk, care needs to not misdetect
> an UTF-8 sequence as an UTF-32 sequence of different text - or the other way
> around -, but that's unlikely for ASCII strings.
This is not an issue because the substrings that the reference scanner
and grafter are looking for are ASCII-only, even if they are part of a
larger non-ASCII string. Specifically, they only need to look for the
nix hashes.
>> I really think it would be a mistake to try to force every program and
>> language implementation to use our preferred string representation. I
>> suspect it would be vastly easier to compromise and support a few other
>> popular string representations in Guix, namely UTF-16 and UTF-32.
>
> In 1992, UTF-8 was invented. Subsequently, most of the Internet,
> all new GNU Linux distributions etc, all UNIX GUI frameworks, Subversion
> etc standardized on UTF-8, with the eventual goal of standardizing all
> network transfer and storage to UTF-8. I think that by now the outliers
> are the ones who need to change,
I agree that we need to standardize on Unicode. However, given the
perhaps unfortunate fact that almost everyone has standardized on code
points as the units by which to index strings, choosing UTF-32 as an
internal representation is a very reasonable choice, IMO.
Anyway, feel free to engage with the developers of the Common Lisp
implementations that use UTF-32 and try to convince them to change.
The remaining question is: what to do if upstream refuses to change? Do
we exclude that software in Guix, or do we maintain our own patches to
override upstream's decision?
>> If you don't want to change the daemon, it could be worked around in our
>> build-side code as follows: we could add a new phase to certain build
>> systems (or possibly gnu-build-system) that scans each output for
>> UTF-16/32 encoded store references that are never referenced in UTF-8.
>> If such references exist, a file with an unobtrusive name would be added
>> to that output containing those references encoded in UTF-8. This would
>> enable our daemon's existing reference scanner to find all of the
>> references.
>
> I agree that that would be nice. As a first step, even just detecting
> problems like that and erroring out would be okay - in order to find them
> in the first place. Right now, it's difficult to detect and so also difficult
> to say how wide-spread the problem is. If the problem is wide-spread enough
> my tune could change very quickly.
Sure, it would be useful to have more data on what packages are
currently affected by this issue.
Regards,
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 27 Dec 2018 14:46:01 GMT) (full text, mbox, link).
Cc: Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 27 Dec 2018 15:45:32 +0100
Hello,
Mark H Weaver <mhw@netris.org> skribis:
> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>
>>> : > Store file names are always ASCII so problems arise when they are stored
>>> : > as UTF-16 or UTF-32/UCS-4.
>>> :
>>> : I understand that most programs stick to ASCII filenames, but what about the odd
>>> : one using non-English, special characters?
>>>
>>> That’s a separate debate. :-) Essentially this restriction on store
>>> file names has always been there in Guix (and Nix before that). If we
>>> were to change it, that would raise compatibility issues.
>>
>> But what happens if we attempt to store "á" in the store?
>
> Indeed. Although we might restrict the immediate entries within
> /gnu/store to ASCII characters, file names deeper within those
> directories may have non-ASCII characters. More generally, store
> references may occur within larger strings which might include non-ASCII
> characters.
Right. For example ‘nss-certs’ contains non-ASCII, UTF-8-encoded file
names.
For “top-level” store file names, the restriction is enforced by
‘checkStoreName’ in libstore/store-api.cc.
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 27 Dec 2018 15:03:01 GMT) (full text, mbox, link).
Danny Milosavljevic <dannym@scratchpost.org> writes:
> In 1992, UTF-8 was invented. Subsequently, most of the Internet,
> all new GNU Linux distributions etc, all UNIX GUI frameworks, Subversion
> etc standardized on UTF-8, with the eventual goal of standardizing all
> network transfer and storage to UTF-8. I think that by now the outliers
> are the ones who need to change, otherwise these senseless encoding
> conversions will never cease. It's not like different encodings allow for
> better expression of writings or anything useful to the end user.
>
> As a distribution we can't force upstream to change, but just filing
> bug reports upstream would make us see where they stand on this.
I agree with this. Reporting upstream should be a first step.
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Mark H Weaver <mhw@netris.org>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 27 Dec 2018 18:03:11 +0100
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> Just to be sure I understand: non-toplevel, non-ASCII file names will
> not be scanned properly, right?
Every file in the store is properly scanned for references. It’s just
that users cannot create top-level items with a non-ASCII file name.
I hope this clarifies things!
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 27 Dec 2018 18:58:01 GMT) (full text, mbox, link).
> Every file in the store is properly scanned for references. It’s just
> that users cannot create top-level items with a non-ASCII file name.
So if '/gnu/store/...-foo/á' is stored as UTF-8 in a binary, then it will be
found? Is it because the filesystem encoding is also UTF-8 and Guix scans over
byte arrays?
Sorry for dragging on this, I guess I should look at the code at this point but
I have very little time these days.
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Mark H Weaver <mhw@netris.org>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 27 Dec 2018 22:54:36 +0100
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>> Every file in the store is properly scanned for references. It’s just
>> that users cannot create top-level items with a non-ASCII file name.
>
> So if '/gnu/store/...-foo/á' is stored as UTF-8 in a binary, then it will be
> found? Is it because the filesystem encoding is also UTF-8 and Guix scans over
> byte arrays?
The reference scanner, currently written in C++, traverses whole
directory trees. Being C++ it treats file names as byte arrays so it
doesn’t matter what the file name encoding is.
Note also that the reference scanner only looks for “xyz…-foo”; what
comes before and after doesn’t matter. So for example if you have
“/gnu/store/xyz…-foo/à”, what’s important is the “xyz…-foo” bit.
This is all happening in libstore/references.cc (which is surprisingly
small) and in (guix build graft) for the grafting part, which Mark wrote
a while back.
HTH,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 27 Dec 2018 22:06:02 GMT) (full text, mbox, link).
> The reference scanner, currently written in C++, traverses whole
> directory trees. Being C++ it treats file names as byte arrays so it
> doesn’t matter what the file name encoding is.
But what matters then is that the filename encodings on the filesystem and in the
binary match, right?
> Note also that the reference scanner only looks for “xyz…-foo”; what
> comes before and after doesn’t matter. So for example if you have
> “/gnu/store/xyz…-foo/à”, what’s important is the “xyz…-foo” bit.
OK, makes sense, then my main worry is just moot :)
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Mark H Weaver <mhw@netris.org>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 27 Dec 2018 23:59:12 +0100
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
>> The reference scanner, currently written in C++, traverses whole
>> directory trees. Being C++ it treats file names as byte arrays so it
>> doesn’t matter what the file name encoding is.
>
> But what matters then is that the filename encodings on the filesystem and in the
> binary match, right?
I’m not sure what you call “the binary”. Do you mean the nar?
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Fri, 28 Dec 2018 07:48:01 GMT) (full text, mbox, link).
> I’m not sure what you call “the binary”. Do you mean the nar?
No, in this case I referred to "/bin/next" in sbcl-next. So any file in the nar
passed to the reference scanner.
--
Pierre Neidhardt
https://ambrevar.xyz/
Many grafting issues with Nyxt have been reported recently:
https://github.com/atlas-engineer/nyxt/issues/1257
Seems that glib is being grafted, which breaks cl-cffi-gtk and thus
Nyxt.
Is there a way to disable grafting for a given package? This would at
least be a local workaround for Nyxt.
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Mark H Weaver <mhw@netris.org>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Tue, 30 Mar 2021 22:09:10 +0200
Hi,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> Many grafting issues with Nyxt have been reported recently:
>
> https://github.com/atlas-engineer/nyxt/issues/1257
>
> Seems that glib is being grafted, which breaks cl-cffi-gtk and thus
> Nyxt.
>
> Is there a way to disable grafting for a given package? This would at
> least be a local workaround for Nyxt.
No. I provided guidance in 2018 on how to address this issue:
https://issues.guix.gnu.org/33848
Did anyone talk to the SBCL folks? Please, let’s address this rather
than look for workarounds.
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Wed, 31 Mar 2021 07:11:02 GMT) (full text, mbox, link).
A brief discussion has ensued on SBCL bugtracker:
- It's unlikely that SBCL will change its internal string
representation.
- The main recommendation for an easy fix without updating the scanner
is that we tweaked our build system to dump the store reference to a
separate ASCII file.
Thoughts?
Guillaume?
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Guillaume Le Vaillant <glv@posteo.net>, Mark H Weaver <mhw@netris.org>,
33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Wed, 31 Mar 2021 22:42:24 +0200
Hi,
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> A brief discussion has ensued on SBCL bugtracker:
Nice, thanks for starting the discussion!
> - It's unlikely that SBCL will change its internal string
> representation.
Of course, I would not suggest that.
What could have been nice is if there’s a way to mark specific strings
as being ASCII, or if there’s a “byte vector” data type compatible with
strings, for instance.
> - The main recommendation for an easy fix without updating the scanner
> is that we tweaked our build system to dump the store reference to a
> separate ASCII file.
That’s a good idea: simple and efficient. We do that for the initrd,
for instance (commit b36e06c2b0889f1d0f939465589d36887ff24d33).
This could be done in ‘asdf-build-system/sbcl’ I suppose. I can see two
drawbacks:
1. Some packages like ‘nyxt’ don’t use it, so we’d have to duplicate
the phase there.
2. It may be coarse-grain compared to scanning binaries for references
(for example, we might retain references to build-only tools, such
as libraries used only for tests).
That’s probably acceptable though, and certainly better than the status quo.
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Wed, 31 Mar 2021 20:58:02 GMT) (full text, mbox, link).
Hi Ludo,
> What could have been nice is if there’s a way to mark specific strings
> as being ASCII, or if there’s a “byte vector” data type compatible with
> strings, for instance.
It does and it could work, but according to upstream it's just simpler
to dump deps in a separate file.
> 1. Some packages like ‘nyxt’ don’t use it, so we’d have to duplicate
> the phase there.
In the case of Nyxt, the reason it does not use it is because the
asdf-build-system/sbcl binary production has some drawbacks. I could
work on overhauling the build system to fix the uncanny behaviour, then
Nyxt would use it.
> 2. It may be coarse-grain compared to scanning binaries for references
> (for example, we might retain references to build-only tools, such
> as libraries used only for tests).
The build system could easily leave out Lisp native-inputs, no?
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
To: Pierre Neidhardt <mail@ambrevar.xyz>, Ludovic Courtès
<ludo@gnu.org>, Guillaume Le Vaillant <glv@posteo.net>
Cc: 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 01 Apr 2021 02:03:06 -0400
Pierre Neidhardt <mail@ambrevar.xyz> writes:
> - The main recommendation for an easy fix without updating the scanner
> is that we tweaked our build system to dump the store reference to a
> separate ASCII file.
Sounds good. I made a similar proposal in Dec 2018, earlier in this
thread <https://bugs.gnu.org/33848#31>. I wrote:
If you don't want to change the daemon, it could be worked around in our
build-side code as follows: we could add a new phase to certain build
systems (or possibly gnu-build-system) that scans each output for
UTF-16/32 encoded store references that are never referenced in UTF-8.
If such references exist, a file with an unobtrusive name would be added
to that output containing those references encoded in UTF-8. This would
enable our daemon's existing reference scanner to find all of the
references.
Our grafting code would then need to be extended to recognize and
transform store references encoded in UTF-16/32 as well as UTF-8.
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 01 Apr 2021 07:14:01 GMT) (full text, mbox, link).
Thank you for the reminder, Mark, I had forgotten about your suggestion.
If we are going for a SBCL-specific solution, I believe that all
references are stored in plain text files at some points (the .asd files
and the .lisp files) which are often encoded in ASCII / UTF-8, so
searching these files without having to deal with their encoding might
be enough. But of course this is less general and more brittle.
Mark, Guillaume, would you have time to give this a try?
I'm a bit busy at the moment. Let me know if you can't work on it, I'll
try to find time to work on a patch.
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 01 Apr 2021 09:57:07 +0200
Hi,
Mark H Weaver <mhw@netris.org> skribis:
> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>> - The main recommendation for an easy fix without updating the scanner
>> is that we tweaked our build system to dump the store reference to a
>> separate ASCII file.
>
> Sounds good. I made a similar proposal in Dec 2018, earlier in this
> thread <https://bugs.gnu.org/33848#31>. I wrote:
>
> If you don't want to change the daemon, it could be worked around in our
> build-side code as follows: we could add a new phase to certain build
> systems (or possibly gnu-build-system) that scans each output for
> UTF-16/32 encoded store references that are never referenced in UTF-8.
> If such references exist, a file with an unobtrusive name would be added
> to that output containing those references encoded in UTF-8. This would
> enable our daemon's existing reference scanner to find all of the
> references.
>
> Our grafting code would then need to be extended to recognize and
> transform store references encoded in UTF-16/32 as well as UTF-8.
Oh thanks for the reminder.
The separate ASCII file doesn’t solve it all because, as you write, we’d
need to change the grafting code as well.
Then it might be simpler to use a “byte vector” data type for those
strings. We’ll have to wait for Pierre’s patches to get a better idea.
:-)
Thanks,
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 01 Apr 2021 08:49:02 GMT) (full text, mbox, link).
Hi!
> The separate ASCII file doesn’t solve it all because, as you write, we’d
> need to change the grafting code as well.
>
> Then it might be simpler to use a “byte vector” data type for those
> strings.
Which strings and where would we use byte vectors?
> We’ll have to wait for Pierre’s patches to get a better idea.
> :-)
By "Pierre's patches", you mean a patch to add a build phase that
generates an file listings all references?
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> If we are going for a SBCL-specific solution, I believe that all
> references are stored in plain text files at some points (the .asd files
> and the .lisp files) which are often encoded in ASCII / UTF-8, so
> searching these files without having to deal with their encoding might
> be enough. But of course this is less general and more brittle.
A store reference to a C library in a standalone Lisp binary can come
either from the current package or from some dependencies
(cl+ssl, cl-cffi-gtk, etc.). So we would need to scan the source
code of all the Lisp dependencies recursively to get the full list of
store refrences.
And as Mark wrote below, with the current grafting code, this list of
store references will not solve grafting for paths that are in UTF-32 or
both in ASCII and UTF-32 in the binary file.
Ludovic Courtès <ludo@gnu.org> skribis:
> Mark H Weaver <mhw@netris.org> skribis:
>
>> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>>> - The main recommendation for an easy fix without updating the scanner
>>> is that we tweaked our build system to dump the store reference to a
>>> separate ASCII file.
>>
>> Sounds good. I made a similar proposal in Dec 2018, earlier in this
>> thread <https://bugs.gnu.org/33848#31>. I wrote:
>>
>> If you don't want to change the daemon, it could be worked around in our
>> build-side code as follows: we could add a new phase to certain build
>> systems (or possibly gnu-build-system) that scans each output for
>> UTF-16/32 encoded store references that are never referenced in UTF-8.
>> If such references exist, a file with an unobtrusive name would be added
>> to that output containing those references encoded in UTF-8. This would
>> enable our daemon's existing reference scanner to find all of the
>> references.
>>
>> Our grafting code would then need to be extended to recognize and
>> transform store references encoded in UTF-16/32 as well as UTF-8.
>
> Oh thanks for the reminder.
>
> The separate ASCII file doesn’t solve it all because, as you write, we’d
> need to change the grafting code as well.
>
> Then it might be simpler to use a “byte vector” data type for those
> strings. We’ll have to wait for Pierre’s patches to get a better idea.
> :-)
I'm not sure what you mean with the "byte vector" data type here. Could
you explain what you're thinking about with a few more details?
Hi Guillaume!
> A store reference to a C library in a standalone Lisp binary can come
> either from the current package or from some dependencies
> (cl+ssl, cl-cffi-gtk, etc.). So we would need to scan the source
> code of all the Lisp dependencies recursively to get the full list of
> store refrences.
I don't think there is need to scan recursively: if package A depends on
B which depends on C, then A can lists the dependency on B in a file,
and B can do the same for C. This way the relationship between A and C
is properly stored.
Am I missing something?
> And as Mark wrote below, with the current grafting code, this list of
> store references will not solve grafting for paths that are in UTF-32 or
> both in ASCII and UTF-32 in the binary file.
Indeed, and that's the core of the issue here I believe, since grafting
is what breaks Nyxt in practice.
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> Hi Guillaume!
>
>> A store reference to a C library in a standalone Lisp binary can come
>> either from the current package or from some dependencies
>> (cl+ssl, cl-cffi-gtk, etc.). So we would need to scan the source
>> code of all the Lisp dependencies recursively to get the full list of
>> store refrences.
>
> I don't think there is need to scan recursively: if package A depends on
> B which depends on C, then A can lists the dependency on B in a file,
> and B can do the same for C. This way the relationship between A and C
> is properly stored.
>
> Am I missing something?
Oh, you say this file would be created for every Lisp package. I thought
it would only be for the standalone binary case, because the "regular"
asdf-build-system/sbcl used for Lisp libraries ships the sources and its
make-asdf-configuration phase creates links to the required Lisp
dependencies in '/gnu/store/...', so there should not be missing
references.
>> And as Mark wrote below, with the current grafting code, this list of
>> store references will not solve grafting for paths that are in UTF-32 or
>> both in ASCII and UTF-32 in the binary file.
>
> Indeed, and that's the core of the issue here I believe, since grafting
> is what breaks Nyxt in practice.
>
> Cheers!
I just wondered: does the grafting code take '.fasl' files into
consideration?
If yes, I guess a Lisp library could also end up in a strange
half-grafted state if the grafting code modifies ASCII references and
not UTF-32 ones...
Guillaume Le Vaillant <glv@posteo.net> writes:
> Oh, you say this file would be created for every Lisp package. I thought
> it would only be for the standalone binary case, because the "regular"
> asdf-build-system/sbcl used for Lisp libraries ships the sources and its
> make-asdf-configuration phase creates links to the required Lisp
> dependencies in '/gnu/store/...', so there should not be missing
> references.
Right.
The only case where there could be a missing reference is if the source
code contains an FFI reference stored in non-ASCII / UTF-8.
So we need to parse other encodings too as Mark suggested if I
understand correctly.
> I just wondered: does the grafting code take '.fasl' files into
> consideration?
> If yes, I guess a Lisp library could also end up in a strange
> half-grafted state if the grafting code modifies ASCII references and
> not UTF-32 ones...
Absolutely, .fasls suffer from the same problem since they may encode
strings as UTF-32.
--
Pierre Neidhardt
https://ambrevar.xyz/
I'm not familiar with the grafting code, so anyone who is (Mark? Ludo?)
might be able to fix this much quicker than me! :)
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Guillaume Le Vaillant <glv@posteo.net>, Mark H Weaver <mhw@netris.org>,
33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 01 Apr 2021 17:24:37 +0200
Pierre Neidhardt <mail@ambrevar.xyz> skribis:
> I'm not familiar with the grafting code, so anyone who is (Mark? Ludo?)
> might be able to fix this much quicker than me! :)
There’s ‘%graft-hooks’ in (guix build graft). One could add a hook that
would do nothing except for SBCL packages; for SBCL packages, it would
to the UCS-4 rewriting “somehow”.
The other option, which might be easier, would be to arrange to not use
UCS-4 in the first place, by choosing a bytevector data type for string
literals known to contain a store reference. It can be done if we know
where those references come from—e.g., they’re introduced by
‘substitute*’ on the source.
I hope this makes sense!
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 01 Apr 2021 17:26:01 GMT) (full text, mbox, link).
To: Ludovic Courtès <ludo@gnu.org>, Pierre Neidhardt
<mail@ambrevar.xyz>
Cc: Guillaume Le Vaillant <glv@posteo.net>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 01 Apr 2021 13:23:51 -0400
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> What could have been nice is if there’s a way to mark specific strings
> as being ASCII, or if there’s a “byte vector” data type compatible with
> strings, for instance.
Do we know that all strings containing store references will be
representable in ASCII?
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 01 Apr 2021 17:36:02 GMT) (full text, mbox, link).
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 01 Apr 2021 13:33:55 -0400
Pierre Neidhardt <mail@ambrevar.xyz> writes:
> I'm not familiar with the grafting code, so anyone who is (Mark? Ludo?)
> might be able to fix this much quicker than me! :)
I'll think about what would be required to modify our grafting code to
support UCS-4.
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Fri, 02 Apr 2021 15:14:02 GMT) (full text, mbox, link).
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Fri, 02 Apr 2021 17:13:23 +0200
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> Ludovic Courtès <ludo@gnu.org> writes:
>> What could have been nice is if there’s a way to mark specific strings
>> as being ASCII, or if there’s a “byte vector” data type compatible with
>> strings, for instance.
>
> Do we know that all strings containing store references will be
> representable in ASCII?
The basename part of /gnu/store/* is guaranteed to be pure ASCII. Now,
you’re right that file names that come after could be non-ASCII, though
that’s very unlikely. We’d have to look at concrete examples I guess.
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Fri, 02 Apr 2021 22:49:02 GMT) (full text, mbox, link).
Here's a preliminary draft patch to add support for UTF-32 and UTF-16
references to our grafting code. I haven't yet measured the efficiency
impact of these changes, but I suspect it's not too bad.
I'd be curious to know whether it fixes the Nyxt graft.
Mark
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Sat, 03 Apr 2021 16:10:18 -0400
Pierre Neidhardt <mail@ambrevar.xyz> writes:
> Wow, that was fast, thank you Mark!
>
> Any idea how I can test this, i.e. how I can force a graft?
Just apply the patch to a git checkout of Guix, build it, and then use
it to build anything you like, e.g. "./pre-inst-env guix build nyxt".
With this patch applied, all graft derivations will be rebuilt, but
*only* grafts. When it's ready (i.e. when it has better comments,
docstrings, etc), this change is perfectly appropriate for 'master'.
FYI, I've already applied this new grafting code to my private branch of
Guix, switched to a system and user profile built using it, rebooted
into the new system, and I'm typing this email on it. I've also checked
that none of the processes running on my system include executables or
shared libraries from ungrafted store items (after removing my ibus
.cache files, see <https://bugs.gnu.org/47576>).
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Mon, 05 Apr 2021 19:46:01 GMT) (full text, mbox, link).
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Mon, 05 Apr 2021 21:45:43 +0200
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> Pierre Neidhardt <mail@ambrevar.xyz> writes:
>
>> Wow, that was fast, thank you Mark!
Yup, thumbs up!
>> Any idea how I can test this, i.e. how I can force a graft?
>
> Just apply the patch to a git checkout of Guix, build it, and then use
> it to build anything you like, e.g. "./pre-inst-env guix build nyxt".
>
> With this patch applied, all graft derivations will be rebuilt, but
> *only* grafts. When it's ready (i.e. when it has better comments,
> docstrings, etc), this change is perfectly appropriate for 'master'.
The GC’s scanner still gets it wrong though. I wonder whether having
the grafting code more capable than the scanner could lead to bad
surprises. WDYT?
Thank you for looking into this!
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Mon, 05 Apr 2021 23:07:02 GMT) (full text, mbox, link).
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Mon, 05 Apr 2021 19:04:44 -0400
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> Mark H Weaver <mhw@netris.org> skribis:
>
>> With this patch applied, all graft derivations will be rebuilt, but
>> *only* grafts. When it's ready (i.e. when it has better comments,
>> docstrings, etc), this change is perfectly appropriate for 'master'.
>
> The GC’s scanner still gets it wrong though. I wonder whether having
> the grafting code more capable than the scanner could lead to bad
> surprises. WDYT?
I've thought about it, and I've been unable to think of any
disadvantage to making the grafter more capable than the scanner.
It seems to me a pure win.
That the scanner fails to find all references is clearly an important
problem that should be fixed ASAP, but as far as I can tell, improving
the grafter would not make that problem any worse or create any new
problems.
Improving the grafter should have the following effects:
(1) Reducing the number of cases where ungrafted code with security
flaws is being used on our systems.
(2) Fixing problems in our Fish, Nyxt, and Common Lisp packages.
Improving the scanner, or adding phases to selected packages or build
systems to copy hidden references to an ASCII file, should have the
following effects:
(1) Reducing the number of cases where run-time dependencies are not
known to Guix, which could lead to dependencies being prematurely
GC'd or excluded from things like "guix pack".
So, it seems to me that we should persue both of these improvements
concurrently, and I see no practical advantage to postponing one for the
sake of rolling them both out at the same time. Of course, I welcome
other opinions on this.
What do you think?
Thanks,
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Tue, 06 Apr 2021 08:17:02 GMT) (full text, mbox, link).
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Tue, 06 Apr 2021 10:16:20 +0200
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> That the scanner fails to find all references is clearly an important
> problem that should be fixed ASAP, but as far as I can tell, improving
> the grafter would not make that problem any worse or create any new
> problems.
>
> Improving the grafter should have the following effects:
>
> (1) Reducing the number of cases where ungrafted code with security
> flaws is being used on our systems.
>
> (2) Fixing problems in our Fish, Nyxt, and Common Lisp packages.
These are cases where the scanner may get the references wrong in the
first place though.
OTOH, there are also cases where the scanner gets the references right.
For example, earlier Pierre mentioned that grafting breaks the reference
of nyxt to glib:
https://issues.guix.gnu.org/33848#26
It turns out that the scanner does find a reference to glib “by chance”:
--8<---------------cut here---------------start------------->8---
$ guix gc --references $(guix build nyxt --no-grafts) | grep glib-2
/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6
$ grep -r 4vmhbc31cpbnlw3c96kcc094ihmaf7dv $(guix build nyxt --no-grafts)
Duuma dosiero /gnu/store/5pgh0cn1kzyajaanj7f1iyp91hd917d6-nyxt-2-pre-release-6/bin/.nyxt-real kongruas
--8<---------------cut here---------------end--------------->8---
So in this case, the fixed grafter is a net win.
After applying the patch you sent, I confirm that Nyxt starts just fine
when running:
./pre-inst-env guix environment --ad-hoc nyxt -CN -E ^DISPLAY --share=/tmp/.X11-unix -- nyxt
… whereas on master it fails to start with:
--8<---------------cut here---------------start------------->8---
Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING
{10010D0103}>:
Problem running initialization hook GLIB::RUN-INITIALIZERS:
Unable to load any of the alternatives:
("/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so.0"
"/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so")
--8<---------------cut here---------------end--------------->8---
> Improving the scanner, or adding phases to selected packages or build
> systems to copy hidden references to an ASCII file, should have the
> following effects:
>
> (1) Reducing the number of cases where run-time dependencies are not
> known to Guix, which could lead to dependencies being prematurely
> GC'd or excluded from things like "guix pack".
>
> So, it seems to me that we should persue both of these improvements
> concurrently, and I see no practical advantage to postponing one for the
> sake of rolling them both out at the same time. Of course, I welcome
> other opinions on this.
As always I worry about added complexity. In this case, I think you’re
right: the UTF-{16,32}-capable grafter would most likely fix a number of
issues right away, including this Nyxt issue.
Thanks!
Ludo’.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Tue, 06 Apr 2021 08:24:02 GMT) (full text, mbox, link).
Hi Mark, hi Ludo,
> After applying the patch you sent, I confirm that Nyxt starts just fine
> when running:
>
> ./pre-inst-env guix environment --ad-hoc nyxt -CN -E ^DISPLAY --share=/tmp/.X11-unix -- nyxt
>
> … whereas on master it fails to start with:
>
> --8<---------------cut here---------------start------------->8---
> Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING
> {10010D0103}>:
> Problem running initialization hook GLIB::RUN-INITIALIZERS:
> Unable to load any of the alternatives:
> ("/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so.0"
> "/gnu/store/4vmhbc31cpbnlw3c96kcc094ihmaf7dv-glib-2.62.6/lib/libglib-2.0.so")
> --8<---------------cut here---------------end--------------->8---
Fantastic, this looks like it's fixing this grafting issue indeed!
I also agree this looks like a net win even though the `guix gc` issue
is not fix.
The latter seems to be relatively easier to fix, doesn't it?
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Mark H Weaver <mhw@netris.org>, Pierre Neidhardt <mail@ambrevar.xyz>,
33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Tue, 6 Apr 2021 13:23:58 -0400
On Mon, Apr 05, 2021 at 09:45:43PM +0200, Ludovic Courtès wrote:
> The GC’s scanner still gets it wrong though. I wonder whether having
> the grafting code more capable than the scanner could lead to bad
> surprises. WDYT?
I'm going off-topic, but I've wished we had a generic fast string-search
(and replace?) procedure.
The go-build-system has a slow "one byte a time" implementation because
I couldn't figure out how to re-use the code in (guix build grafts):
https://git.savannah.gnu.org/cgit/guix.git/tree/guix/build/go-build-system.scm?h=v1.2.0#n269
There are probably some other places we could use a fast procedure. It
might even be something to add to Guile.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Tue, 06 Apr 2021 23:24:02 GMT) (full text, mbox, link).
To: Leo Famulari <leo@famulari.name>, Ludovic Courtès
<ludo@gnu.org>
Cc: Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Tue, 06 Apr 2021 19:21:30 -0400
Hi Leo,
Leo Famulari <leo@famulari.name> writes:
> On Mon, Apr 05, 2021 at 09:45:43PM +0200, Ludovic Courtès wrote:
>> The GC’s scanner still gets it wrong though. I wonder whether having
>> the grafting code more capable than the scanner could lead to bad
>> surprises. WDYT?
>
> I'm going off-topic, but I've wished we had a generic fast string-search
> (and replace?) procedure.
Guile has several functions to help with this, e.g. 'string-contains',
'string-replace', 'string-replace-substring', and of course the regexp
functions.
The grafting code can't use these things because (1) we are not
searching for a single string, but rather for arbitrary nix hashes, and
(2) we can't easily use the regexp functions because there could be NULs
present.
> The go-build-system has a slow "one byte a time" implementation because
> I couldn't figure out how to re-use the code in (guix build grafts):
>
> https://git.savannah.gnu.org/cgit/guix.git/tree/guix/build/go-build-system.scm?h=v1.2.0#n269
From a glance, I would think that 'replace-store-references' from
(guix build grafts) is directly applicable here. Do you remember what
the difficulty was in using it?
Alternatively, since you are only searching for a single string to
replace, I think you could read the entire file into a single string
(e.g. using 'get-string-all' with "ISO-8859-1" encoding) and use
'string-replace-substring'.
What do you think?
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 08 Apr 2021 10:14:02 GMT) (full text, mbox, link).
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> Please add a “Fixes” line in the commit log.
Done, thanks.
> I’m not reviewing the code in depth and I trust your judgment.
>
> The risks of bugs I can think of are: missed ASCII references (a
> regression, whereby some ASCII references would not get rewritten),
This is indeed a risk whenever we modify the grafting code, which is
written for efficiency, not clarity. I've tried to be careful, and have
checked that my newly grafted system and user profiles do not retain
references to ungrafted code, modulo the following pre-existing issues:
<https://bugs.gnu.org/47576>
(ibus-daemon launches ungrafted subprocesses)
<https://bugs.gnu.org/47614>
(Chunked store references in .zo files in Racket 8)
> and unrelated UTF-{16,32}-base32-looking references getting rewritten.
>
> I guess the latter is very unlikely because only sequences found in the
> replacement table may be rewritten, right?
Yes, that's correct.
> The former should be caught by ‘tests/grafts.scm’ but we could also
> check the closure of real-world systems, for instance, to make sure it
> doesn’t refer to ungrafted things.
As I mention above, I've already done so for my own GNOME-based Guix
system.
> Do you know how this affects performance?
I have not yet measured it, but subjectively, I do not notice any
obvious difference in speed.
I expect that the main performance impact is that large blocks of NULs
will be processed more slowly by the current draft version of the new
grafting code. That's because NULs can now be part of a nix hash, and
therefore the new grafting code can only advance 1 byte position when
seeing a NUL, whereas previously it would skip ahead 33 positions in
that case.
If desired, the handling of NULs could be made more efficient, at the
cost of a bit more complexity. When seeing a NUL, we could check the
adjacent bytes to see if this is part of a nix-base32 character in
UTF-16 or UTF-32 encoding. If not, we could skip ahead.
> Perhaps add short docstrings for clarity.
Done.
>> +(for-each
>> + (lambda (char-size1)
>> + (for-each
>> + (lambda (char-size2)
>> + (for-each
>> + (lambda (gap)
>> + (for-each
>> + (lambda (offset)
>> + (test-equal (format #f "replace-store-references, char-sizes ~a ~a, gap ~s, offset ~a"
>> + char-size1 char-size2 gap offset)
[...]
>> + ;; offsets to test
>> + (map (lambda (i) (- buffer-size (* 40 char-size1) i))
>> + (iota 30))))
>> + ;; gaps
>> + '("" "-" " " "a")))
>> + ;; char-size2 values to test
>> + '(1 2)))
>> + ;; char-size1 values to test
>> + '(1 2 4))
>
> For clarity, perhaps you can define a top-level procedure for the test
> and call it from ‘for-each’.
Good idea. I'd like to optimize the tests a bit before pushing this.
They take a fairly long time to run, and lead to a huge 1.5G grafts.log
file. It might be sufficient to avoid 'test-equal' here.
> Modulo these very minor issues, it looks like it’s ready to go!
Sounds good. Thanks for the review!
Below, I've attached my current revision of this draft patch, which
incorporates your suggestions regarding the main code. What remains is
to improve the tests.
Regards,
Mark
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Wed, 14 Apr 2021 12:55:43 +0200
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
>> The risks of bugs I can think of are: missed ASCII references (a
>> regression, whereby some ASCII references would not get rewritten),
>
> This is indeed a risk whenever we modify the grafting code, which is
> written for efficiency, not clarity. I've tried to be careful, and have
> checked that my newly grafted system and user profiles do not retain
> references to ungrafted code, modulo the following pre-existing issues:
>
> <https://bugs.gnu.org/47576>
> (ibus-daemon launches ungrafted subprocesses)
>
> <https://bugs.gnu.org/47614>
> (Chunked store references in .zo files in Racket 8)
OK.
>> and unrelated UTF-{16,32}-base32-looking references getting rewritten.
>>
>> I guess the latter is very unlikely because only sequences found in the
>> replacement table may be rewritten, right?
>
> Yes, that's correct.
>
>> The former should be caught by ‘tests/grafts.scm’ but we could also
>> check the closure of real-world systems, for instance, to make sure it
>> doesn’t refer to ungrafted things.
>
> As I mention above, I've already done so for my own GNOME-based Guix
> system.
Yes, we should be safe.
>> Do you know how this affects performance?
>
> I have not yet measured it, but subjectively, I do not notice any
> obvious difference in speed.
>
> I expect that the main performance impact is that large blocks of NULs
> will be processed more slowly by the current draft version of the new
> grafting code. That's because NULs can now be part of a nix hash, and
> therefore the new grafting code can only advance 1 byte position when
> seeing a NUL, whereas previously it would skip ahead 33 positions in
> that case.
>
> If desired, the handling of NULs could be made more efficient, at the
> cost of a bit more complexity. When seeing a NUL, we could check the
> adjacent bytes to see if this is part of a nix-base32 character in
> UTF-16 or UTF-32 encoding. If not, we could skip ahead.
Let’s keep it this way for now; we can always revisit later if we feel
the need for it.
>> For clarity, perhaps you can define a top-level procedure for the test
>> and call it from ‘for-each’.
>
> Good idea. I'd like to optimize the tests a bit before pushing this.
> They take a fairly long time to run, and lead to a huge 1.5G grafts.log
> file. It might be sufficient to avoid 'test-equal' here.
Ah yes, rather use ‘test-assert’ or some such to avoid the huge log
file. :-)
> Below, I've attached my current revision of this draft patch, which
> incorporates your suggestions regarding the main code. What remains is
> to improve the tests.
LGTM! Feel free to push this version or an improved one. I think it’s
good to have it in the upcoming release, and if it’s pushed sooner,
we’ll have more time to react in case something’s wrong.
Thank you!
Ludo’.
Added indication that bug 33848 blocks47297
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Wed, 14 Apr 2021 15:20:01 GMT) (full text, mbox, link).
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Wed, 14 Apr 2021 22:38:02 GMT) (full text, mbox, link).
Cc: Mark H Weaver <mhw@netris.org>, Pierre Neidhardt <mail@ambrevar.xyz>,
33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Wed, 14 Apr 2021 18:37:45 -0400
On Wed, Apr 14, 2021 at 12:55:43PM +0200, Ludovic Courtès wrote:
> LGTM! Feel free to push this version or an improved one. I think it’s
> good to have it in the upcoming release, and if it’s pushed sooner,
> we’ll have more time to react in case something’s wrong.
Just FYI to everyone reading, we aim to release 1.2.1 this Sunday, April
18.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Thu, 15 Apr 2021 07:29:02 GMT) (full text, mbox, link).
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Thu, 15 Apr 2021 03:26:33 -0400
Ludovic Courtès <ludo@gnu.org> writes:
> LGTM! Feel free to push this version or an improved one. I think it’s
> good to have it in the upcoming release, and if it’s pushed sooner,
> we’ll have more time to react in case something’s wrong.
I pushed an improved version of the patch to 'master' as commit
1bab9b9f17256a9e4f45f5b0cceb8b52e0a1b1ed.
Thanks,
Mark
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Fri, 16 Apr 2021 09:45:02 GMT) (full text, mbox, link).
Just tested like Ludo did: Nyxt works now without the --no-grafts
option!
Thank you so much, Mark, this is a huge step forward for Nyxt (and Guix
for course)! :)
--
Pierre Neidhardt
https://ambrevar.xyz/
Added tag(s) fixed.
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Thu, 22 Apr 2021 13:37:02 GMT) (full text, mbox, link).
bug closed, send any further explanations to
33848@debbugs.gnu.org and Ludovic Courtès <ludo@gnu.org>
Request was from Ludovic Courtès <ludo@gnu.org>
to control@debbugs.gnu.org.
(Thu, 22 Apr 2021 13:37:02 GMT) (full text, mbox, link).
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Fri, 30 Apr 2021 20:05:01 GMT) (full text, mbox, link).
To: Pierre Neidhardt <mail@ambrevar.xyz>, Ludovic Courtès
<ludo@gnu.org>
Cc: Guillaume Le Vaillant <glv@posteo.net>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Fri, 30 Apr 2021 16:03:18 -0400
Hi Pierre, and Ludovic,
Pierre Neidhardt <mail@ambrevar.xyz> writes:
> I also agree this looks like a net win even though the `guix gc` issue
> is not fix.
> The latter seems to be relatively easier to fix, doesn't it?
Yes. I now have a plan to finally fix this bug for good.
I intend to write a scanner in Scheme that is able to find Nix hashes
encoded in ASCII, UTF-16, or UTF-32. Using that, I'll write a procedure
that, for each package output, finds all store references that are found
encoded in UTF-16 or UTF-32 but never in ASCII, and write those
references to a file (if that set is nonempty). This procedure can then
be used by selected packages and/or build-systems.
However, there's one thing I will need: the set of all transitive inputs
(and native-inputs, including implicit inputs) available to the build,
i.e. the set of possible hashes that might legitimately be found by the
scanner.
Ludovic: what's the best way to get that list from the build-side code?
Thanks,
Mark
--
Disinformation flourishes because many people care deeply about injustice
but very few check the facts. Ask me about <https://stallmansupport.org>.
Information forwarded
to bug-guix@gnu.org: bug#33848; Package guix.
(Sat, 01 May 2021 09:23:03 GMT) (full text, mbox, link).
Hi Mark,
thanks for the update and thanks for getting your hands dirty, this is a
very valuable contribution to the core of Guix!
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
Cc: Guillaume Le Vaillant <glv@posteo.net>,
Pierre Neidhardt <mail@ambrevar.xyz>, 33848@debbugs.gnu.org
Subject: Re: bug#33848: Store references in SBCL-compiled code are "invisible"
Date: Tue, 11 May 2021 10:46:05 +0200
Hi Mark,
Mark H Weaver <mhw@netris.org> skribis:
> I intend to write a scanner in Scheme that is able to find Nix hashes
> encoded in ASCII, UTF-16, or UTF-32. Using that, I'll write a procedure
> that, for each package output, finds all store references that are found
> encoded in UTF-16 or UTF-32 but never in ASCII, and write those
> references to a file (if that set is nonempty). This procedure can then
> be used by selected packages and/or build-systems.
>
> However, there's one thing I will need: the set of all transitive inputs
> (and native-inputs, including implicit inputs) available to the build,
> i.e. the set of possible hashes that might legitimately be found by the
> scanner.
>
> Ludovic: what's the best way to get that list from the build-side code?
You can use #:references-graphs for that.
Sorry for the delay!
Ludo’.
bug archived.
Request was from Debbugs Internal Request <help-debbugs@gnu.org>
to internal_control@debbugs.gnu.org.
(Tue, 08 Jun 2021 11:24:06 GMT) (full text, mbox, link).
Debbugs is free software and licensed under the terms of the
GNU Public License version 2. The current version can be
obtained from https://bugs.debian.org/debbugs-source/.