guix-daemon run as non-root sets up /etc/group incorrectly in build container

  • Open
  • quality assurance status badge
Details
2 participants
  • keinflue
  • Ludovic Courtès
Owner
unassigned
Submitted by
keinflue
Severity
important

Debbugs page

K
K
keinflue wrote on 17 Apr 04:20 -0700
(address . bug-guix@gnu.org)(address . ludo@gnu.org)
86b5c54e8412686790b6bf50525a6231@posteo.net
When using the new ability of guix-daemon to run as non-root with the
help of user namespaces, the testsuite of coreutils fails.

This is because the daemon incorrectly uses the host GID instead of the
guest GID in the build container's /etc/group, which the testsuite uses
to lookup the group's name via id -gn.
L
L
Ludovic Courtès wrote on 17 Apr 06:30 -0700
(name . keinflue)(address . keinflue@posteo.net)(address . 77862@debbugs.gnu.org)
878qny530h.fsf@gnu.org
Hi,

keinflue <keinflue@posteo.net> writes:

Toggle quote (3 lines)
> When using the new ability of guix-daemon to run as non-root with the
> help of user namespaces, the testsuite of coreutils fails.

Could you include a build log snippet? (Also useful to have it inline
so that someone searching for discussions about the bug can find it.)

Toggle quote (4 lines)
> This is because the daemon incorrectly uses the host GID instead of
> the guest GID in the build container's /etc/group, which the testsuite
> uses to lookup the group's name via id -gn.

I believe the fix you suggest is this:
Toggle diff (13 lines)
diff --git a/nix/libstore/build.cc b/nix/libstore/build.cc
index 4ee4a1ae5f..a1f39d9a8b 100644
--- a/nix/libstore/build.cc
+++ b/nix/libstore/build.cc
@@ -1854,7 +1854,7 @@ void DerivationGoal::startBuilder()
view of the system (e.g., "id -gn"). */
writeFile(chrootRootDir + "/etc/group",
(format("nixbld:!:%1%:\n")
- % (buildUser.enabled() ? buildUser.getGID() : getgid())).str());
+ % (buildUser.enabled() ? buildUser.getGID() : guestGID)).str());
/* Create /etc/hosts with localhost entry. */
if (!fixedOutput)
Correct?

Thanks,
Ludo’.
K
K
keinflue wrote on 17 Apr 08:36 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
936405d1bcbed15df2266c30cfc4ca33@posteo.net
Here are excerpts from the build log:

Toggle quote (5 lines)
> ERROR: tests/chown/separator
> ============================
>
> ++ initial_cwd_=/tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1

[...]

Toggle quote (37 lines)
> ++ id -u
> + id_u=30001
> + test -n 30001
> ++ id -un
> + id_un=nixbld
> + test -n nixbld
> ++ id -g
> + id_g=30000
> + test -n 30000
> ++ id -gn
> id: cannot find name for group ID 30000
> + id_gn=30000
> + framework_failure_
> + warn_ 'separator.sh: set-up failure: '
> + case $IFS in
> + printf '%s\n' 'separator.sh: set-up failure: '
> separator.sh: set-up failure:
> + test 9 = 2
> + printf '%s\n' 'separator.sh: set-up failure: '
> + sed 1q
> + Exit 99
> + set +e
> + exit 99
> + exit 99
> + remove_tmp_
> + __st=99
> + cleanup_
> + :
> + test '' = yes
> + cd /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1
> + chmod -R u+rwx
> /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1/gt-separator.sh.Fk4W
> + rm -rf
> /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1/gt-separator.sh.Fk4W
> + exit 99
> ERROR tests/chown/separator.sh (exit status: 99)

[...]

Toggle quote (7 lines)
> error: in phase 'check': uncaught exception:
> srfi-34 #<condition &invoke-error [program: "make" arguments: ("check"
> "-j" "16") exit-status: 2 term-signal: #f stop-signal: #f] 2df6100> >
> phase `check' failed after 15.2 seconds
> command "make" "check" "-j" "16" failed with status 2
> build process 2 exited with status 256

Yes, I believe the patch as suggested is correct (with my limited
understanding given that the lines above were changed in the same way).

Unfortunately I made a mistake and accidentally lost the container in
which I tried this, so I can not verify right now whether the patch
actually resolves the issue.

It might take me a day or two to restore it.

This happened either during or shortly after bootstrap builds, so I
don't know whether this was the final coreutils package or one from
commencement.scm.

Best,
keinflue
L
L
Ludovic Courtès wrote on 17 Apr 09:51 -0700
(name . keinflue)(address . keinflue@posteo.net)(address . 77862@debbugs.gnu.org)
87a58e3f4q.fsf@gnu.org
keinflue <keinflue@posteo.net> writes:

Toggle quote (2 lines)
> Here are excerpts from the build log:

Thanks.

Toggle quote (6 lines)
> Unfortunately I made a mistake and accidentally lost the container in
> which I tried this, so I can not verify right now whether the patch
> actually resolves the issue.
>
> It might take me a day or two to restore it.

No worries, I’ll wait for your feedback.

Toggle quote (4 lines)
> This happened either during or shortly after bootstrap builds, so I
> don't know whether this was the final coreutils package or one from
> commencement.scm.

OK.

If you have a setup for full rebuilds (no substitutes) running in a
container, I’m curious to learn more about it!

Ludo’.
L
L
Ludovic Courtès wrote on 18 Apr 13:31 -0700
control message for bug #77862
(address . control@debbugs.gnu.org)
87sem5i53w.fsf_-_@gnu.org
severity 77862 important
quit
K
K
keinflue wrote on 19 Apr 04:18 -0700
Re: guix-daemon run as non-root sets up /etc/group incorrectly in build container
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
8c2080a3681e7d2e1d38bb4d3e1463d0@posteo.net
I can confirm that the patch resolves the particular failing test.

However I overlooked that there are other failing tests:

Toggle quote (6 lines)
> FAIL: tests/chgrp/default-no-deref.sh
> FAIL: tests/chgrp/no-x.sh
> FAIL: tests/chgrp/posix-H.sh
> FAIL: tests/chgrp/recurse.sh
> FAIL: tests/chgrp/basic.sh

Here is an example of the failures:

Toggle quote (43 lines)
> + require_membership_in_two_groups_
> + test 0 = 0
> + groups='30000 65534'
> + case "$groups" in
> + require_local_dir_
> + require_mount_list_
> + local 'mount_list_fail=cannot read table of mounted file systems'
> + df --local
> + grep -F 'cannot read table of mounted file systems'
> + is_local_dir_ .
> + test 1 = 1
> + df --local .
> + set _ 30000 65534
> + shift
> + g2=65534
> + mkdir d
> + touch f
> + ln -s ../f d/s
> ++ stat --printf=%g f
> + g_init=30000
> + chgrp -R 65534 d
> chgrp: changing group of 'd/s': Invalid argument
> chgrp: changing group of 'd': Invalid argument
> + fail=1
> ++ stat --printf=%g f
> + test 30000 = 30000
> + Exit 1
> + set +e
> + exit 1
> + exit 1
> + remove_tmp_
> + __st=1
> + cleanup_
> + :
> + test '' = yes
> + cd /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1
> + chmod -R u+rwx
> /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1/gt-default-no-deref.sh.AEHe
> + rm -rf
> /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1/gt-default-no-deref.sh.AEHe
> + exit 1
> FAIL tests/chgrp/default-no-deref.sh (exit status: 1)

I think this happens if the user running guix-daemon has supplementary
groups. These are not mapped via /proc/gid_map in the build container
and therefore are reported as the overflow gid (65534) by getgroups.

The test cases assume that they can change ownership to this additional
group but that is not permitted on the overflow gid.

I think supplementary groups should be dropped in the user namespace for
the build container to make the behavior reproducible. Unfortunately
this may be impossible if the parent namespace has set
/proc/[...]/setgroups to "deny".

Best,
keinflue

On 17.04.2025 18:51, Ludovic Courtès wrote:
Toggle quote (24 lines)
> keinflue <keinflue@posteo.net> writes:
>
>> Here are excerpts from the build log:
>
> Thanks.
>
>> Unfortunately I made a mistake and accidentally lost the container in
>> which I tried this, so I can not verify right now whether the patch
>> actually resolves the issue.
>>
>> It might take me a day or two to restore it.
>
> No worries, I’ll wait for your feedback.
>
>> This happened either during or shortly after bootstrap builds, so I
>> don't know whether this was the final coreutils package or one from
>> commencement.scm.
>
> OK.
>
> If you have a setup for full rebuilds (no substitutes) running in a
> container, I’m curious to learn more about it!
>
> Ludo’.
K
K
keinflue wrote on 19 Apr 04:48 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
6bbd118c38ab49593cce3749029569c8@posteo.net
On 17.04.2025 18:51, Ludovic Courtès wrote:
Toggle quote (23 lines)
> keinflue <keinflue@posteo.net> writes:
>
>> Here are excerpts from the build log:
>
> Thanks.
>
>> Unfortunately I made a mistake and accidentally lost the container in
>> which I tried this, so I can not verify right now whether the patch
>> actually resolves the issue.
>>
>> It might take me a day or two to restore it.
>
> No worries, I’ll wait for your feedback.
>
>> This happened either during or shortly after bootstrap builds, so I
>> don't know whether this was the final coreutils package or one from
>> commencement.scm.
>
> OK.
>
> If you have a setup for full rebuilds (no substitutes) running in a
> container, I’m curious to learn more about it!

I basically just used "guix shell -CN -D guix" plus some extra packages
and shares. Inside the container I built and ran guix from git with
--with-store-dir and NIX_STORE set to a different path than /gnu/store.
Initially I forgot to add a share for /var which is why I unfortunately
broke the container once I existed it.

Toggle quote (1 lines)
> Ludo’.
K
K
keinflue wrote on 19 Apr 07:36 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
9b324ff4015e176164829814dfe5cd43@posteo.net
I just realized that there is also need for special handling of
/dev/kvm. When running with privileges it was possible to add the build
users to the kvm group to get access to /dev/kvm in the build container.

To have this continue to work in unprivileged mode, the kvm group (or I
guess more specifically the group of /dev/kvm) should be mapped in the
user namespace and presumably should not be dropped from supplementary
groups.

There is also /dev/tty in the container which has unmapped group
ownership, although that doesn't prevent access to it.

Alternatively I guess all supplementary groups could be preserved, but
they should then also all be mapped into the user namespace. Then the
user would have to drop supplementary groups manually (if they are able
to) before running guix-daemon if they do not want any of them to
propagate into the build environment and can't create a new user
specifically for that purpose (e.g. because they do not have root access
on the machine).

On 19.04.2025 13:18, keinflue wrote:
Toggle quote (95 lines)
> I can confirm that the patch resolves the particular failing test.
>
> However I overlooked that there are other failing tests:
>
>> FAIL: tests/chgrp/default-no-deref.sh
>> FAIL: tests/chgrp/no-x.sh
>> FAIL: tests/chgrp/posix-H.sh
>> FAIL: tests/chgrp/recurse.sh
>> FAIL: tests/chgrp/basic.sh
>
> Here is an example of the failures:
>
>> + require_membership_in_two_groups_
>> + test 0 = 0
>> + groups='30000 65534'
>> + case "$groups" in
>> + require_local_dir_
>> + require_mount_list_
>> + local 'mount_list_fail=cannot read table of mounted file systems'
>> + df --local
>> + grep -F 'cannot read table of mounted file systems'
>> + is_local_dir_ .
>> + test 1 = 1
>> + df --local .
>> + set _ 30000 65534
>> + shift
>> + g2=65534
>> + mkdir d
>> + touch f
>> + ln -s ../f d/s
>> ++ stat --printf=%g f
>> + g_init=30000
>> + chgrp -R 65534 d
>> chgrp: changing group of 'd/s': Invalid argument
>> chgrp: changing group of 'd': Invalid argument
>> + fail=1
>> ++ stat --printf=%g f
>> + test 30000 = 30000
>> + Exit 1
>> + set +e
>> + exit 1
>> + exit 1
>> + remove_tmp_
>> + __st=1
>> + cleanup_
>> + :
>> + test '' = yes
>> + cd /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1
>> + chmod -R u+rwx
>> /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1/gt-default-no-deref.sh.AEHe
>> + rm -rf
>> /tmp/guix-build-coreutils-9.1.drv-0/coreutils-9.1/gt-default-no-deref.sh.AEHe
>> + exit 1
>> FAIL tests/chgrp/default-no-deref.sh (exit status: 1)
>
> I think this happens if the user running guix-daemon has supplementary
> groups. These are not mapped via /proc/gid_map in the build container
> and therefore are reported as the overflow gid (65534) by getgroups.
>
> The test cases assume that they can change ownership to this
> additional group but that is not permitted on the overflow gid.
>
> I think supplementary groups should be dropped in the user namespace
> for the build container to make the behavior reproducible.
> Unfortunately this may be impossible if the parent namespace has set
> /proc/[...]/setgroups to "deny".
>
> Best,
> keinflue
>
> On 17.04.2025 18:51, Ludovic Courtès wrote:
>> keinflue <keinflue@posteo.net> writes:
>>
>>> Here are excerpts from the build log:
>>
>> Thanks.
>>
>>> Unfortunately I made a mistake and accidentally lost the container in
>>> which I tried this, so I can not verify right now whether the patch
>>> actually resolves the issue.
>>>
>>> It might take me a day or two to restore it.
>>
>> No worries, I’ll wait for your feedback.
>>
>>> This happened either during or shortly after bootstrap builds, so I
>>> don't know whether this was the final coreutils package or one from
>>> commencement.scm.
>>
>> OK.
>>
>> If you have a setup for full rebuilds (no substitutes) running in a
>> container, I’m curious to learn more about it!
>>
>> Ludo’.
L
L
Ludovic Courtès wrote on 25 Apr 11:39 -0700
(name . keinflue)(address . keinflue@posteo.net)(address . 77862@debbugs.gnu.org)
87selwccgu.fsf@gnu.org
Hi,

I committed the /etc/group fix in
0d3bc50b0cffeae05beb12d0c270c6599186c0d7 together with a test.

keinflue <keinflue@posteo.net> writes:

Toggle quote (12 lines)
> I think this happens if the user running guix-daemon has supplementary
> groups. These are not mapped via /proc/gid_map in the build container
> and therefore are reported as the overflow gid (65534) by getgroups.
>
> The test cases assume that they can change ownership to this
> additional group but that is not permitted on the overflow gid.
>
> I think supplementary groups should be dropped in the user namespace
> for the build container to make the behavior
> reproducible. Unfortunately this may be impossible if the parent
> namespace has set /proc/[...]/setgroups to "deny".

I came up with this test:

Toggle snippet (23 lines)
(use-modules (guix)
(gcrypt hash)
(gnu packages bootstrap))

(computed-file "kvm-access"
#~(begin
(pk '#$(gettimeofday))
(let ((st (stat "/dev/kvm")))
(pk '/dev/kvm st)
(pk '/dev/kvm:owner (stat:uid st) (stat:gid st))
(pk 'getgroups (getgroups))
;; XXX: When running the daemon as root, /dev/kvm is
;; owned by UID 0, which has no entry in /etc/passwd.
;; (pk 'kvm-user (getpwuid (stat:uid st)))
;; xxx: /etc/group never contained an entry to the "kvm"
;; group so the thing below always failed.
;; (pk 'kvm-group (getgrgid (stat:gid st)))
)
(when (open-fdes "/dev/kvm" O_RDWR)
(mkdir #$output)))
#:guile %bootstrap-guile)

Privileged:

Toggle snippet (22 lines)
$ guix build -f ~/src/guix-debugging/dev-kvm-access.scm
substitute: looking for substitutes on 'http://192.168.1.48:8123'... 0.0%guix substitute: warning: 192.168.1.48: connection failed: Connection timed out
substitute:
substitute: looking for substitutes on 'https://ci.guix.gnu.org'... 100.0%
substitute: looking for substitutes on 'https://bordeaux.guix.gnu.org'... 100.0%
substitute: looking for substitutes on 'https://guix.bordeaux.inria.fr'... 100.0%
The following derivation will be built:
/gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv
substitute: looking for substitutes on 'http://192.168.1.48:8123'... 0.0%
building /gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv...

;;; ((1745606160 . 233876))

;;; (/dev/kvm #(6 483 8624 1 0 984 2792 0 1745359386 1745359386 1745359386 4096 0 char-special 432 382791307 382791307 1745359386))

;;; (/dev/kvm:owner 0 984)

;;; (getgroups #(984 30000))
successfully built /gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv
/gnu/store/36fin1iw2fh9066jg0y2fjd78j9wyjwp-kvm-access

Unprivileged:

Toggle snippet (21 lines)
$ ./test-env guix build -f ~/src/guix-debugging/dev-kvm-access.scm
accepted connection from pid 2591, user ludo
accepted connection from pid 2601, user ludo
substitute: guix substitute: warning: ACL for archive imports seems to be uninitialized, substitutes may be unavailable
substitute: guix substitute: warning: authentication and authorization of substitutes disabled!
The following derivation will be built:
/home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv
substitute: guix substitute: warning: authentication and authorization of substitutes disabled!
building /home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv...

;;; ((1745606200 . 636919))

;;; (/dev/kvm #(6 483 8624 1 65534 65534 2792 0 1745359386 1745359386 1745359386 4096 0 char-special 432 382791307 382791307 1745359386))

;;; (/dev/kvm:owner 65534 65534)

;;; (getgroups #(65534 65534 65534 65534 65534 65534 65534 30000 65534))
successfully built /home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv
/home/ludo/src/guix/test-tmp/store/ffh8zaw279dgdsh6q54mlldh4nikxiqp-kvm-access

In both cases, /dev/kvm is accessible.

In both cases, only the primary group has an entry in /etc/group;
supplementary groups are lacking.

So:

1. I don’t think we need to map the “kvm” UID/GID into the user
namespace;

2. I’m confused as to what makes the Coreutils test suite fail.

It would still be good to drop any supplementary group other than “kvm”
though.

WDYT?

Thanks,
Ludo’.
K
K
keinflue wrote on 25 Apr 17:23 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
f1345bf03b9170036f9c9dcc3fa80467@posteo.net
On 25.04.2025 20:39, Ludovic Courtès wrote:
Toggle quote (123 lines)
> Hi,
>
> I committed the /etc/group fix in
> 0d3bc50b0cffeae05beb12d0c270c6599186c0d7 together with a test.
>
> keinflue <keinflue@posteo.net> writes:
>
>> I think this happens if the user running guix-daemon has supplementary
>> groups. These are not mapped via /proc/gid_map in the build container
>> and therefore are reported as the overflow gid (65534) by getgroups.
>>
>> The test cases assume that they can change ownership to this
>> additional group but that is not permitted on the overflow gid.
>>
>> I think supplementary groups should be dropped in the user namespace
>> for the build container to make the behavior
>> reproducible. Unfortunately this may be impossible if the parent
>> namespace has set /proc/[...]/setgroups to "deny".
>
> I came up with this test:
>
> --8<---------------cut here---------------start------------->8---
> (use-modules (guix)
> (gcrypt hash)
> (gnu packages bootstrap))
>
> (computed-file "kvm-access"
> #~(begin
> (pk '#$(gettimeofday))
> (let ((st (stat "/dev/kvm")))
> (pk '/dev/kvm st)
> (pk '/dev/kvm:owner (stat:uid st) (stat:gid st))
> (pk 'getgroups (getgroups))
> ;; XXX: When running the daemon as root, /dev/kvm
> is
> ;; owned by UID 0, which has no entry in
> /etc/passwd.
> ;; (pk 'kvm-user (getpwuid (stat:uid st)))
> ;; xxx: /etc/group never contained an entry to the
> "kvm"
> ;; group so the thing below always failed.
> ;; (pk 'kvm-group (getgrgid (stat:gid st)))
> )
> (when (open-fdes "/dev/kvm" O_RDWR)
> (mkdir #$output)))
> #:guile %bootstrap-guile)
> --8<---------------cut here---------------end--------------->8---
>
> Privileged:
>
> --8<---------------cut here---------------start------------->8---
> $ guix build -f ~/src/guix-debugging/dev-kvm-access.scm
> substitute: looking for substitutes on 'http://192.168.1.48:8123'...
> 0.0%guix substitute: warning: 192.168.1.48: connection failed:
> Connection timed out
> substitute:
> substitute: looking for substitutes on 'https://ci.guix.gnu.org'...
> 100.0%
> substitute: looking for substitutes on
> 'https://bordeaux.guix.gnu.org'... 100.0%
> substitute: looking for substitutes on
> 'https://guix.bordeaux.inria.fr'... 100.0%
> The following derivation will be built:
> /gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv
> substitute: looking for substitutes on 'http://192.168.1.48:8123'...
> 0.0%
> building /gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv...
>
> ;;; ((1745606160 . 233876))
>
> ;;; (/dev/kvm #(6 483 8624 1 0 984 2792 0 1745359386 1745359386
> 1745359386 4096 0 char-special 432 382791307 382791307 1745359386))
>
> ;;; (/dev/kvm:owner 0 984)
>
> ;;; (getgroups #(984 30000))
> successfully built
> /gnu/store/vc5p6bfrzr7khgp9jha8h6kplixcl5h6-kvm-access.drv
> /gnu/store/36fin1iw2fh9066jg0y2fjd78j9wyjwp-kvm-access
> --8<---------------cut here---------------end--------------->8---
>
> Unprivileged:
>
> --8<---------------cut here---------------start------------->8---
> $ ./test-env guix build -f ~/src/guix-debugging/dev-kvm-access.scm
> accepted connection from pid 2591, user ludo
> accepted connection from pid 2601, user ludo
> substitute: guix substitute: warning: ACL for archive imports seems to
> be uninitialized, substitutes may be unavailable
> substitute: guix substitute: warning: authentication and authorization
> of substitutes disabled!
> The following derivation will be built:
>
> /home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv
> substitute: guix substitute: warning: authentication and authorization
> of substitutes disabled!
> building
> /home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv...
>
> ;;; ((1745606200 . 636919))
>
> ;;; (/dev/kvm #(6 483 8624 1 65534 65534 2792 0 1745359386 1745359386
> 1745359386 4096 0 char-special 432 382791307 382791307 1745359386))
>
> ;;; (/dev/kvm:owner 65534 65534)
>
> ;;; (getgroups #(65534 65534 65534 65534 65534 65534 65534 30000
> 65534))
> successfully built
> /home/ludo/src/guix/test-tmp/store/5p4qn8d3bgnj60a2kwpliiwk81bvrcjp-kvm-access.drv
> /home/ludo/src/guix/test-tmp/store/ffh8zaw279dgdsh6q54mlldh4nikxiqp-kvm-access
> --8<---------------cut here---------------end--------------->8---
>
> In both cases, /dev/kvm is accessible.
>
> In both cases, only the primary group has an entry in /etc/group;
> supplementary groups are lacking.
>
> So:
>
> 1. I don’t think we need to map the “kvm” UID/GID into the user
> namespace;

For the purpose of the passive permission checks that is not necessary,
yes. There are no uids or gids being translated between the user
namespaces. However if all supplementary groups would be dropped, that
would include the kvm group and then this test will fail to access
/dev/kvm. That was the problem I saw with that first suggestion.

Toggle quote (2 lines)
> 2. I’m confused as to what makes the Coreutils test suite fail.

The result from getgroups includes both the primary gid 30000 and a
supplementary gid 65534 (where the repeated 65534 are the overflow gid
produced by viewing supplementary gids that aren't mapped into the user
namespace via /proc/[pid]/gid_map).
Coreutils sees this and so assumes that it can do the equivalent of

touch testfile
chgrp 65534 testfile

to create a file owned by group 30000 initially and to then change group
ownership of that file to 65534. Normally an unprivileged user is
allowed to change group ownership of files they own between groups that
they are member of, so this would always succeed outside a user
namespace context.

However, any uid/gid used inside the user namespace is translated back
to the host namespace via the uid/gid_map before permission checks. But
in this case because 65534 doesn't map back to any gid in the host
namespace, the syscall will fail.

If there is no supplementary group reported by getgroups at all, then
coreutils just skips the test and it is ok again. Probably the coreutils
test case should remove any gid reported by getgroups that is equal to
the overflow gid before making that decision.

Dropping all supplementary groups from the build process (after unshare
and before writing "deny" to /proc/pid/setgroups) would make it so that
this test case is always skipped by having getgroups only report 30000,
however that would also drop the kvm group as mentioned above and is
also not permitted in all environments (e.g. when the parent namespace
already set /proc/[pid]/setgroups to "deny").

So I think that instead either all supplementary groups of the user or
at least the kvm group specifically needs to be mapped via
/proc/[pid]/gid_map. When doing so getgroups would report 30000 and 984
(assuming identity gid map for 984) in your test case above and the
coreutils test case would work again, because

chgrp 984 testfile

would then succeed with 984 mapping back to the host namespace to a
supplementary group of the process.

From a point of reproducibility and information leakage into the build
container I think however that it would be preferable to not retain
supplementary groups if possible. In contrast to the privileged build
with a distinct build user that the can be given desired supplementary
groups at will, the unprivileged environment may be one where the
supplementary groups of the user running the daemon can't easily be
changed to what is supposed to be seen in the build environment.

The contents of /etc/group are not relevant for this test case failure,
they are never consulted.

But a few other asides (for which I don't necessarily think anything
should be changed):

- I also noticed that the build container /etc/group is written with
65534 assumed as overflow gid. I am not sure whether anyone actually
does this, but the overflow uid/gid are technically configurable (and
retrievable) via sysctl entries (/proc/sys/kernel/overflow(uid|gid)).
65534 is just the default value.

- I also noticed that the operating-system defaults do not write an
entry for the overflow gid to /etc/group (while they do for the overflow
uid to /etc/passwd). I think such an entry should exist by default as
well. The entry for /etc/passwd also assumes the default overflow uid of
65534. This isn't only relevant for a user namespace context, but also
file systems that can't map the whole range of Linux uids/gids.

Toggle quote (7 lines)
> It would still be good to drop any supplementary group other than “kvm”
> though.
>
> WDYT?
>
> Thanks,
> Ludo’.
L
L
Ludovic Courtès wrote on 26 Apr 01:50 -0700
(name . keinflue)(address . keinflue@posteo.net)(address . 77862@debbugs.gnu.org)
87h62b9uhg.fsf@gnu.org
Hi,

keinflue <keinflue@posteo.net> writes:

Toggle quote (22 lines)
>> 2. I’m confused as to what makes the Coreutils test suite fail.
>
> The result from getgroups includes both the primary gid 30000 and a
> supplementary gid 65534 (where the repeated 65534 are the overflow gid
> produced by viewing supplementary gids that aren't mapped into the
> user namespace via /proc/[pid]/gid_map).
> Coreutils sees this and so assumes that it can do the equivalent of
>
> touch testfile
> chgrp 65534 testfile
>
> to create a file owned by group 30000 initially and to then change
> group ownership of that file to 65534. Normally an unprivileged user
> is allowed to change group ownership of files they own between groups
> that they are member of, so this would always succeed outside a user
> namespace context.
>
> However, any uid/gid used inside the user namespace is translated back
> to the host namespace via the uid/gid_map before permission
> checks. But in this case because 65534 doesn't map back to any gid in
> the host namespace, the syscall will fail.

Oh right, got it.

Toggle quote (23 lines)
> If there is no supplementary group reported by getgroups at all, then
> coreutils just skips the test and it is ok again. Probably the
> coreutils test case should remove any gid reported by getgroups that
> is equal to the overflow gid before making that decision.
>
> Dropping all supplementary groups from the build process (after
> unshare and before writing "deny" to /proc/pid/setgroups) would make
> it so that this test case is always skipped by having getgroups only
> report 30000, however that would also drop the kvm group as mentioned
> above and is also not permitted in all environments (e.g. when the
> parent namespace already set /proc/[pid]/setgroups to "deny").
>
> So I think that instead either all supplementary groups of the user or
> at least the kvm group specifically needs to be mapped via
> /proc/[pid]/gid_map. When doing so getgroups would report 30000 and
> 984 (assuming identity gid map for 984) in your test case above and
> the coreutils test case would work again, because
>
> chgrp 984 testfile
>
> would then succeed with 984 mapping back to the host namespace to a
> supplementary group of the process.

Having reread user_namespaces(7), I don’t think we can change the set of
supplementary groups at all: that effectively requires root privileges.

So the best we can do is map the “kvm” group inside the user namespace.

Toggle quote (8 lines)
> From a point of reproducibility and information leakage into the build
> container I think however that it would be preferable to not retain
> supplementary groups if possible. In contrast to the privileged build
> with a distinct build user that the can be given desired supplementary
> groups at will, the unprivileged environment may be one where the
> supplementary groups of the user running the daemon can't easily be
> changed to what is supposed to be seen in the build environment.

I agree, though in practice, the daemon will usually run under a
dedicated user anyway: this is what ‘guix-install.sh’ does on other
distros, and this is what happens with (privileged? #t) on Guix System.
In these cases, there’d be no observable difference.

The observable difference (namely getgroups(2) returning a list of
unmapped GIDs) would be when people run the daemon as their own user,
which is currently inconvenient.

Toggle quote (7 lines)
> - I also noticed that the build container /etc/group is written with
> 65534 assumed as overflow gid. I am not sure whether anyone actually
> does this, but the overflow uid/gid are technically configurable
> (and retrievable) via sysctl entries
> (/proc/sys/kernel/overflow(uid|gid)). 65534 is just the default
> value.

Note that the “nobody” UID in /etc/passwd dates back to before the
unprivileged daemon implementation. It just happens to match the
default overflow UID, but I agree we should use the right one here.

Toggle quote (8 lines)
> - I also noticed that the operating-system defaults do not write an
> entry for the overflow gid to /etc/group (while they do for the
> overflow uid to /etc/passwd). I think such an entry should exist by
> default as well. The entry for /etc/passwd also assumes the default
> overflow uid of 65534. This isn't only relevant for a user namespace
> context, but also file systems that can't map the whole range of
> Linux uids/gids.

I’m not sure this needs to be changed because it’s not all that
different from the preexisting situation where “kvm” would not have an
entry in /etc/group.

Thanks,
Ludo’.
K
K
keinflue wrote on 26 Apr 04:08 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
657fe5f89e0b1fd4792028ae2d55bbc5@posteo.net
Hi,

On 26.04.2025 10:50, Ludovic Courtès wrote:

Toggle quote (29 lines)
>> If there is no supplementary group reported by getgroups at all, then
>> coreutils just skips the test and it is ok again. Probably the
>> coreutils test case should remove any gid reported by getgroups that
>> is equal to the overflow gid before making that decision.
>>
>> Dropping all supplementary groups from the build process (after
>> unshare and before writing "deny" to /proc/pid/setgroups) would make
>> it so that this test case is always skipped by having getgroups only
>> report 30000, however that would also drop the kvm group as mentioned
>> above and is also not permitted in all environments (e.g. when the
>> parent namespace already set /proc/[pid]/setgroups to "deny").
>>
>> So I think that instead either all supplementary groups of the user or
>> at least the kvm group specifically needs to be mapped via
>> /proc/[pid]/gid_map. When doing so getgroups would report 30000 and
>> 984 (assuming identity gid map for 984) in your test case above and
>> the coreutils test case would work again, because
>>
>> chgrp 984 testfile
>>
>> would then succeed with 984 mapping back to the host namespace to a
>> supplementary group of the process.
>
> Having reread user_namespaces(7), I don’t think we can change the set
> of
> supplementary groups at all: that effectively requires root privileges.
>
> So the best we can do is map the “kvm” group inside the user namespace.

I also had another look and I missed that effectively CAP_SETGID is
required in the _parent_ namespace in order to use setgroups (because
otherwise writing "deny" to /proc/[pid]/setgroups is essentially
forced).

But the same seems to also be required to map more than the own
effective uid/gid of the process into the namespace.

Mapping more uids/gids is otherwise usually handled by a setuid utility
(newuidmap, newgidmap) I think.

So I guess neither solution of dropping or mapping supplementary groups
will work completely unprivileged and the only solution is to modify or
disable the coreutils test case.

And mapping only kvm wouldn't be sufficient either, unless only that
specific group would be supported. All supplementary groups of the
process must be mapped for the test case to succeed (or at least one of
them, I haven't checked by which rule the test cases chooses the gid for
the test scenario from among the supplementary gids).

Best,
keinflue
L
L
Ludovic Courtès wrote on 2 May 08:38 -0700
(name . keinflue)(address . keinflue@posteo.net)(address . 77862@debbugs.gnu.org)
875xijc99o.fsf@gnu.org
Hello,

keinflue <keinflue@posteo.net> writes:

Toggle quote (8 lines)
> I also had another look and I missed that effectively CAP_SETGID is
> required in the _parent_ namespace in order to use setgroups (because
> otherwise writing "deny" to /proc/[pid]/setgroups is essentially
> forced).
>
> But the same seems to also be required to map more than the own
> effective uid/gid of the process into the namespace.

Right, user_namespaces(7) makes it clear:

• The data written to uid_map (gid_map) must consist of a sin‐
gle line that maps the writing process's effective user ID
(group ID) in the parent user namespace to a user ID (group
ID) in the user namespace.

Toggle quote (4 lines)
> So I guess neither solution of dropping or mapping supplementary
> groups will work completely unprivileged and the only solution is to
> modify or disable the coreutils test case.

Yes, I came to this conclusion as well.

I believe the attached Coreutils patch should fix that (yet to be
tested). Would be worth reporting upstream as well because in a way
it’s a failure of the test framework.

Thanks,
Ludo’.
Toggle diff (18 lines)
diff --git a/init.cfg b/init.cfg
index 856aa2ee7..e19ec5a31 100644
--- a/init.cfg
+++ b/init.cfg
@@ -488,7 +488,12 @@ require_membership_in_two_groups_()
{
test $# = 0 || framework_failure_
- groups=${COREUTILS_GROUPS-$( (id -G || /usr/xpg4/bin/id -G) 2>/dev/null)}
+ # Always pretend this user account is not a member of any
+ # supplementary group. This avoids wrong expectations from tests
+ # when the supplementary group is the overflow GID as is the case
+ # when 'guix-daemon' runs as an unprivileged user that is part of
+ # supplementary groups such as 'kvm'.
+ groups=
case "$groups" in
*' '*) ;;
*) skip_ 'requires membership in two groups
K
K
keinflue wrote on 2 May 19:16 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
22a7f1de383bd8bd4521c8a4b78993f3@posteo.net
Hi,

On 02.05.2025 17:38, Ludovic Courtès wrote:
Toggle quote (29 lines)
> Hello,
>
> keinflue <keinflue@posteo.net> writes:
>
>> I also had another look and I missed that effectively CAP_SETGID is
>> required in the _parent_ namespace in order to use setgroups (because
>> otherwise writing "deny" to /proc/[pid]/setgroups is essentially
>> forced).
>>
>> But the same seems to also be required to map more than the own
>> effective uid/gid of the process into the namespace.
>
> Right, user_namespaces(7) makes it clear:
>
> • The data written to uid_map (gid_map) must consist of a sin‐
> gle line that maps the writing process's effective user ID
> (group ID) in the parent user namespace to a user ID (group
> ID) in the user namespace.
>
>> So I guess neither solution of dropping or mapping supplementary
>> groups will work completely unprivileged and the only solution is to
>> modify or disable the coreutils test case.
>
> Yes, I came to this conclusion as well.
>
> I believe the attached Coreutils patch should fix that (yet to be
> tested). Would be worth reporting upstream as well because in a way
> it’s a failure of the test framework.

I tried to test the patch:

I had trouble with the normal origin patches mechanism. The first
failing coreutils package is coreutils-final from commencement.scm and I
didn't manage to properly replace that package without a dependency on
the unpatched package remaining. Instead I wrote an equivalent
substitute* in a post-unpack phase.

The test cases mentioned earlier now succeeded, but another testsuite
for gnulib (also as part of coreutils) failed afterwards. The failure
cause is the same, except this time written in C sources. I also fixed
that via substitute* clauses. See attached patch. This unfortunately
causes rebuilds of the coreutils-mesboot (and coreutils-boot0) packages
as well, although those do not perform the tests.

It seems that now the system build proceeds much further (still
running).

I will see whether I can report the issue(s) upstream to coreutils and
gnulib. I noticed that in coreutils 9.2 (guix is currently 9.1) a
similar fix was applied to handle special gids on MacOS. Unfortunately
the default Linux overflow gid is not included in that list. In any
case, the patch needs to be adjusted for newer coreutils versions.

Toggle quote (3 lines)
>
> Thanks,
> Ludo’.
Toggle diff (16 lines)
diff --git a/gnu/packages/base.scm b/gnu/packages/base.scm
index 4c96ffa1a4..5b014dbf01 100644
--- a/gnu/packages/base.scm
+++ b/gnu/packages/base.scm
@@ -532,6 +532,11 @@ (define-public coreutils
" test-utimensat")))
'())
#:phases (modify-phases %standard-phases
+ (add-after 'unpack 'patch
+ (lambda _
+ (substitute* "gnulib-tests/test-chown.h" (("gids_count =.*") "gids_count = 1;\n"))
+ (substitute* "gnulib-tests/test-lchown.h" (("gids_count =.*") "gids_count = 1;\n"))
+ (substitute* "init.cfg" (("groups=.*") "groups=\n"))))
(add-before 'build 'patch-shell-references
(lambda _
;; 'split' uses either $SHELL or /bin/sh. Set $SHELL so
K
K
keinflue wrote on 3 May 04:00 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
3d2f28f5fa7d133f97988a9f05cf3942@posteo.net
Toggle quote (1 lines)
> It seems that now the system build proceeds much further (still
running).

Unfortunately the python package also fails with equivalent test
failures. It also has another failure mode where it expects a syscall to
change ownership to the overflow uid to result in EPERM, while it will
produce EINVAL (which happens even if there are no supplementary
groups). Should I post the details here or open a new issue?

Toggle quote (6 lines)
> I will see whether I can report the issue(s) upstream to coreutils and
> gnulib. I noticed that in coreutils 9.2 (guix is currently 9.1) a
> similar fix was applied to handle special gids on MacOS. Unfortunately
> the default Linux overflow gid is not included in that list. In any
> case, the patch needs to be adjusted for newer coreutils versions.

coreutils already responded and fixed the issue

I still have to report to gnulib, but wanted to try building the
standalone gnulib package first, which caused me to trip over the python
issues.

Toggle quote (3 lines)
>>
>> Thanks,
>> Ludo’.
L
L
Ludovic Courtès wrote on 3 May 09:14 -0700
(name . keinflue)(address . keinflue@posteo.net)(address . 77862@debbugs.gnu.org)
87r0158ye6.fsf@gnu.org
Hi,

keinflue <keinflue@posteo.net> writes:

Toggle quote (6 lines)
> Unfortunately the python package also fails with equivalent test
> failures. It also has another failure mode where it expects a syscall
> to change ownership to the overflow uid to result in EPERM, while it
> will produce EINVAL (which happens even if there are no supplementary
> groups). Should I post the details here or open a new issue?

I think you can post it here. Perhaps we should eventually keep all the
issues in this category together in a text file somewhere, with log
excerpts: that would allow us to better assess the packages affected by
this difference between the privileged and the unprivileged daemon is.

Toggle quote (9 lines)
>> I will see whether I can report the issue(s) upstream to coreutils and
>> gnulib. I noticed that in coreutils 9.2 (guix is currently 9.1) a
>> similar fix was applied to handle special gids on MacOS. Unfortunately
>> the default Linux overflow gid is not included in that list. In any
>> case, the patch needs to be adjusted for newer coreutils versions.
>
> coreutils already responded and fixed the issue
> (https://debbugs.gnu.org/cgi/bugreport.cgi?bug=78225).

That was fast!

Toggle quote (4 lines)
> I still have to report to gnulib, but wanted to try building the
> standalone gnulib package first, which caused me to trip over the
> python issues.

Alright.

Thanks a lot for this very important work.

I wonder if we should set up a separate Cuirass instance or something
building everything with the unprivileged daemon.

Thanks,
Ludo’.
K
K
keinflue wrote on 3 May 12:05 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)(address . 77862@debbugs.gnu.org)
e153f8b1b38b01c93bc29d11091091a3@posteo.net
On 03.05.2025 18:14, Ludovic Courtès wrote:
Toggle quote (16 lines)
> Hi,
>
> keinflue <keinflue@posteo.net> writes:
>
>> Unfortunately the python package also fails with equivalent test
>> failures. It also has another failure mode where it expects a syscall
>> to change ownership to the overflow uid to result in EPERM, while it
>> will produce EINVAL (which happens even if there are no supplementary
>> groups). Should I post the details here or open a new issue?
>
> I think you can post it here. Perhaps we should eventually keep all
> the
> issues in this category together in a text file somewhere, with log
> excerpts: that would allow us to better assess the packages affected by
> this difference between the privileged and the unprivileged daemon is.

It seems that the "chown to overflowgid" issue is somewhat widespread. I
also see the testsuite for go (bootstrap) failing in the same way. I'd
guess most implementations of "chown" system call wrappers in various
languages will have test cases like this that fail to anticipate user
namespaces. I will let my system build keep running a bit longer and
will then post the list of packages I found with log excerpts here.

Toggle quote (4 lines)
>
> I wonder if we should set up a separate Cuirass instance or something
> building everything with the unprivileged daemon.

That would probably help since I am going to only test the packages that
I am using myself in order to evaluate switching to the unprivileged
guix-daemon. I don't have the resources to do more.

Toggle quote (2 lines)
> Thanks,
> Ludo’.
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 77862@patchwise.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 77862
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch