Non-deterministic Gash error in ‘gcc-mesboot-4.9.4’

  • Open
  • quality assurance status badge
Details
6 participants
  • Andreas Enge
  • Janneke Nieuwenhuizen
  • Ludovic Courtès
  • Ludovic Courtès
  • Timothy Sample
  • Z572
Owner
unassigned
Submitted by
Ludovic Courtès
Severity
important

Debbugs page

L
L
Ludovic Courtès wrote on 18 Jan 14:08 -0800
Non-deterministic Gash error in ‘gcc-mesboot-4 .9.4’
(address . bug-guix@gnu.org)
87msfnsrli.fsf@inria.fr
Hello,

I stumbled upon this interesting non-deterministic failure while
building ‘gcc-mesboot-4.9.4.drv’ on current ‘core-packages-team’ (which
is unchanged compared to ‘master’):

Toggle snippet (88 lines)
source directory: "/tmp/guix-build-gcc-mesboot-4.9.4.drv-0/gcc-4.9.4" (relative from build: ".")
build directory: "/tmp/guix-build-gcc-mesboot-4.9.4.drv-0/gcc-4.9.4"
configure flags: ("CONFIG_SHELL=/gnu/store/bhmkf29xki04mmydpm0axpbh35md4vfb-gash-boot-0.3.0/bin/bash" "SHELL=/gnu/store/bhmkf29xki04mmydpm0axpbh35md4vfb-gash-boot-0.3.0/bin/bash" "--prefix=/gnu/store/mgbd56zvid129vkk8l9zir7pf46r5038-gcc-mesboot-4.9.4" "--enable-fast-install" "--build=x86_64-unknown-linux-gnu" "--prefix=/gnu/store/mgbd56zvid129vkk8l9zir7pf46r5038-gcc-mesboot-4.9.4" "--build=i686-unknown-linux-gnu" "--host=i686-unknown-linux-gnu" "--with-host-libstdcxx=-lsupc++" "--with-native-system-header-dir=/gnu/store/qxp7icgwbn1hqqwvkan7aljgzfn439zh-glibc-mesboot-2.16.0/include" "--with-build-sysroot=/gnu/store/qxp7icgwbn1hqqwvkan7aljgzfn439zh-glibc-mesboot-2.16.0/include" "--disable-bootstrap" "--disable-decimal-float" "--disable-libatomic" "--disable-libcilkrts" "--disable-libgomp" "--disable-libitm" "--disable-libmudflap" "--disable-libquadmath" "--disable-libsanitizer" "--disable-libssp" "--disable-libvtv" "--disable-lto" "--disable-lto-plugin" "--disable-multilib" "--disable-plugin" "--disable-threads" "--enable-languages=c,c++" "--enable-static" "--enable-shared" "--enable-threads=single" "--disable-libstdcxx-pch" "--disable-build-with-cxx")
Backtrace:
In gash/eval.scm:
221: 19 [eval-sh (<sh-set!> ("ac_useropt" (<sh-cmd-sub> #)))]
In srfi/srfi-1.scm:
642: 18 [for-each #<procedure 1502320 at gash/eval.scm:221:17 (name word)> # #]
In gash/eval.scm:
222: 17 [#<procedure 1502320 at gash/eval.scm:221:17 (name word)> "ac_useropt" #]
131: 16 [eval-word (<sh-cmd-sub> (<sh-pipeline> # #)) #:output string ...]
121: 15 [expand-word (<sh-cmd-sub> (<sh-pipeline> # #)) #:output string ...]
In gash/shell.scm:
289: 14 [sh:substitute-command #<procedure 15022a0 at gash/eval.scm:129:35 ()>]
270: 13 [%subshell #<procedure v ()>]
In ice-9/boot-9.scm:
157: 12 [catch quit #<procedure v ()> ...]
In ice-9/r4rs.scm:
176: 11 [with-output-to-port #<variable 13a02e0 value: #<output: file 39>> ...]
In srfi/srfi-1.scm:
619: 10 [for-each #<procedure eval-sh (exp)> ((<sh-pipeline> # #))]
In gash/shell.scm:
344: 9 [sh:pipeline #<procedure 1506f40 at gash/eval.scm:149:6 ()> ...]
310: 8 [plumb #<input: #{read pipe}# 36> #f ...]
270: 7 [%subshell #<procedure thunk* ()>]
In ice-9/boot-9.scm:
157: 6 [catch quit #<procedure thunk* ()> ...]
In gash/shell.scm:
316: 5 [thunk*]
129: 4 [sh:exec-let () "sed" "s/[-+.]/_/g"]
92: 3 [exec-utility () ...]
In srfi/srfi-1.scm:
616: 2 [for-each #<procedure ec3b20 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
In ice-9/boot-9.scm:
1473: 1 [dup->port #<input: file 38> "r" 7]
In unknown file:
?: 0 [fdopen 7 "r"]

ERROR: In procedure fdopen:
ERROR: In procedure scm_fdes_to_port: Bad file descriptor
Backtrace:
In gash/eval.scm:
221: 19 [eval-sh (<sh-set!> ("ac_useropt" (<sh-cmd-sub> #)))]
In srfi/srfi-1.scm:
642: 18 [for-each #<procedure 1502320 at gash/eval.scm:221:17 (name word)> # #]
In gash/eval.scm:
222: 17 [#<procedure 1502320 at gash/eval.scm:221:17 (name word)> "ac_useropt" #]
131: 16 [eval-word (<sh-cmd-sub> (<sh-pipeline> # #)) #:output string ...]
121: 15 [expand-word (<sh-cmd-sub> (<sh-pipeline> # #)) #:output string ...]
In gash/shell.scm:
289: 14 [sh:substitute-command #<procedure 15022a0 at gash/eval.scm:129:35 ()>]
270: 13 [%subshell #<procedure v ()>]
In ice-9/boot-9.scm:
157: 12 [catch quit #<procedure v ()> ...]
In ice-9/r4rs.scm:
176: 11 [with-output-to-port #<variable 13a02e0 value: #<output: file 39>> ...]
In srfi/srfi-1.scm:
619: 10 [for-each #<procedure eval-sh (exp)> ((<sh-pipeline> # #))]
In gash/shell.scm:
347: 9 [sh:pipeline #<procedure 1506f40 at gash/eval.scm:149:6 ()> ...]
310: 8 [plumb #f #<output: #{write pipe}# 38> ...]
270: 7 [%subshell #<procedure thunk* ()>]
In ice-9/boot-9.scm:
157: 6 [catch quit #<procedure thunk* ()> ...]
In gash/shell.scm:
316: 5 [thunk*]
129: 4 [sh:exec-let () "printf" "%s\\n" "libsanitizer"]
92: 3 [exec-utility () ...]
In srfi/srfi-1.scm:
616: 2 [for-each #<procedure ec3b20 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
In ice-9/boot-9.scm:
1473: 1 [dup->port #<input: file 36> "r" 7]
In unknown file:
?: 0 [fdopen 7 "r"]

ERROR: In procedure fdopen:
ERROR: In procedure scm_fdes_to_port: Bad file descriptor
checking build system type... i686-unknown-linux-gnu
checking host system type... i686-unknown-linux-gnu
checking target system type... i686-unknown-linux-gnu
checking for a BSD-compatible install... ./install-sh -c
checking whether ln works... yes
checking whether ln -s works... yes
checking for a sed that does not truncate output... /gnu/store/i61mvrw30k8ng8hxym8s180nydnsbji6-gash-utils-boot-0.2.0/bin/sed
checking for gawk... gawk
checking for libsanitizer support... yes

What happens is that Gash crashes in the middle of a substitution on
$ac_useropt. As a result, ‘--disable-libsanitizer’ (and other options,
it seems) are discarded, hence the “libsanitizer support... yes” line.
Hours later, build fails while trying to build libsanitizer.

Any idea what could cause EBADF?

Thanks,
Ludo’.
L
L
Ludovic Courtès wrote on 19 Jan 10:24 -0800
Re: bug#75658: Non-deterministic Gash error in ‘gcc-mesboot-4.9.4’
(address . 75658@debbugs.gnu.org)
87wmeqpspy.fsf@gnu.org
Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

Toggle quote (4 lines)
> I stumbled upon this interesting non-deterministic failure while
> building ‘gcc-mesboot-4.9.4.drv’ on current ‘core-packages-team’ (which
> is unchanged compared to ‘master’):

Just got another one:

Toggle snippet (46 lines)
checking for struct sigaction.sa_sigaction... yes
checking for volatile sig_atomic_t... yes
checking for sighandler_t... yes
checking for sigprocmask... (cached) yes
checking whether sleep is declared... yes
checking for working sleep... yes
checking for socklen_t... Backtrace:
In gash/shell.scm:
129: 19 [sh:exec-let () "ac_fn_c_try_compile" "2817"]
In gash/environment.scm:
215: 18 [save-variables-excursion () ...]
292: 17 [with-arguments # #<procedure 2210f00 at gash/shell.scm:145:25 ()>]
389: 16 [call-with-return #<procedure 2210e40 at gash/shell.scm:147:28 ()>]
In srfi/srfi-1.scm:
619: 15 [for-each #<procedure eval-sh (exp)> ((<sh-begin> # # # ...))]
619: 14 [for-each #<procedure eval-sh (exp)> (# # # # ...)]
In gash/shell.scm:
441: 13 [sh:cond # #]
55: 12 [without-errexit #<procedure 13185e0 at gash/eval.scm:149:6 ()>]
372: 11 [sh:and #<procedure 1318560 at gash/eval.scm:149:6 ()> ...]
55: 10 [without-errexit #<procedure 1318560 at gash/eval.scm:149:6 ()>]
372: 9 [sh:and #<procedure 1318500 at gash/eval.scm:149:6 ()> ...]
55: 8 [without-errexit #<procedure 1318500 at gash/eval.scm:149:6 ()>]
In srfi/srfi-1.scm:
616: 7 [for-each #<procedure eval-sh (exp)> (# # # # ...)]
619: 6 [for-each #<procedure eval-sh (exp)> (# # #)]
In gash/shell.scm:
245: 5 [#<procedure 1f63030 at gash/shell.scm:239:17 ()>]
129: 4 [sh:exec-let () "grep" "-v" "^ *+" "conftest.err"]
92: 3 [exec-utility () ...]
In srfi/srfi-1.scm:
616: 2 [for-each #<procedure ea9a60 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
In ice-9/boot-9.scm:
1473: 1 [dup->port #<input: file 20> "r" 7]
In unknown file:
?: 0 [fdopen 7 "r"]

ERROR: In procedure fdopen:
ERROR: In procedure scm_fdes_to_port: Bad file descriptor
yes
checking whether symlink handles trailing slash correctly... yes
checking whether <sys/ioctl.h> declares ioctl... yes
checking for unsetenv... yes
checking for unsetenv() return type... int

That one likely doesn’t change the build outcome since it still
determines that ‘socklen_t’ is defined, but it sounds a bit like a dice
roll.

Ludo’.
L
L
Ludovic Courtès wrote on 6 Feb 07:17 -0800
control message for bug #75518
(address . control@debbugs.gnu.org)
877c636r2f.fsf@gnu.org
block 75518 by 75658
quit
L
L
Ludovic Courtès wrote on 17 Feb 13:17 -0800
control message for bug #75658
(address . control@debbugs.gnu.org)
87frkcl18h.fsf@gnu.org
severity 75658 important
quit
L
L
Ludovic Courtès wrote on 11 Mar 14:42 -0700
Re: bug#75658: Non-deterministic Gash error in ‘gcc-mesboot-4.9.4’
(address . 75658@debbugs.gnu.org)
87bju7i6rr.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (8 lines)
> Ludovic Courtès <ludovic.courtes@inria.fr> skribis:
>
>> I stumbled upon this interesting non-deterministic failure while
>> building ‘gcc-mesboot-4.9.4.drv’ on current ‘core-packages-team’ (which
>> is unchanged compared to ‘master’):
>
> Just got another one:

A few more, obtained by running the start of the ‘configure’ script in a
loop (added an ‘exit’ on line 2562, which is after the first 4 lines of
output).

while ./configure CONFIG_SHELL=/gnu/store/98bd49rhyia49y0b9d7sk8phsq14g3nk-gash-boot-0.3.0/bin/bash SHELL=/gnu/store/98bd49rhyia49y0b9d7sk8phsq14g3nk-gash-boot-0.3.0/bin/bash --prefix=/gnu/store/awkbdj5j41pv5kiy9ifs0zl40jamwfw4-gcc-mesboot-4.9.4 --enable-fast-install --build=x86_64-unknown-linux-gnu --prefix=/gnu/store/awkbdj5j41pv5kiy9ifs0zl40jamwfw4-gcc-mesboot-4.9.4 --build=i686-unknown-linux-gnu --host=i686-unknown-linux-gnu --with-host-libstdcxx=-lsupc++ --with-native-system-header-dir=/gnu/store/gc91zbacrk6prhvm91cj3x9rr3v2k17q-glibc-mesboot-2.16.0/include --with-build-sysroot=/gnu/store/gc91zbacrk6prhvm91cj3x9rr3v2k17q-glibc-mesboot-2.16.0/include --disable-bootstrap --disable-decimal-float --disable-libatomic --disable-libcilkrts --disable-libgomp --disable-libitm --disable-libmudflap --disable-libquadmath --disable-libsanitizer --disable-libssp --disable-libvtv --disable-lto --disable-lto-plugin --disable-multilib --disable-plugin --disable-threads --enable-languages=c,c++ --enable-static --enable-shared --enable-threads=single --disable-libstdcxx-pch --disable-build-with-cxx ; do : ;done

Toggle snippet (42 lines)
warning: failed to install locale: Invalid argument
Backtrace:
In gash/environment.scm:
371: 19 [call-with-break #<procedure 2dda9450 at gash/shell.scm:400:6 ()>]
In srfi/srfi-1.scm:
619: 18 [for-each #<procedure 2dda9420 at gash/shell.scm:401:18 (value)> #]
In gash/environment.scm:
353: 17 [call-with-continue #<procedure 2de13460 at gash/eval.scm:158:14 ()>]
In srfi/srfi-1.scm:
616: 16 [for-each #<procedure eval-sh (exp)> (# # #)]
619: 15 [for-each #<procedure eval-sh (exp)> ((<sh-set!> ("ac_optarg" #)))]
In gash/eval.scm:
221: 14 [eval-sh (<sh-set!> ("ac_optarg" (<sh-cmd-sub> #)))]
In srfi/srfi-1.scm:
642: 13 [for-each #<procedure 2da0f5e0 at gash/eval.scm:221:17 (name word)> # #]
In gash/eval.scm:
222: 12 [#<procedure 2da0f5e0 at gash/eval.scm:221:17 (name word)> "ac_optarg" #]
131: 11 [eval-word (<sh-cmd-sub> (<sh-exec> "expr" # ":" ...)) #:output string ...]
121: 10 [expand-word (<sh-cmd-sub> (<sh-exec> "expr" # ...)) #:output ...]
In gash/shell.scm:
289: 9 [sh:substitute-command #<procedure 2da0f560 at gash/eval.scm:129:35 ()>]
270: 8 [%subshell #<procedure v ()>]
In ice-9/boot-9.scm:
157: 7 [catch quit #<procedure v ()> ...]
In ice-9/r4rs.scm:
176: 6 [with-output-to-port #<variable 2de5dc00 value: #<output: file /dev/pts/19>> ...]
In srfi/srfi-1.scm:
619: 5 [for-each #<procedure eval-sh (exp)> ((<sh-exec> "expr" # ":" ...))]
In gash/shell.scm:
129: 4 [sh:exec-let () "expr" ...]
92: 3 [exec-utility () ...]
In srfi/srfi-1.scm:
616: 2 [for-each #<procedure 2d60f0a0 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
In ice-9/boot-9.scm:
1473: 1 [dup->port #<input: file /dev/pts/19> "r0" 7]
In unknown file:
?: 0 [fdopen 7 "r0"]

ERROR: In procedure fdopen:
ERROR: In procedure scm_fdes_to_port: Bad file descriptor

And:

Toggle snippet (39 lines)
Backtrace:
In ice-9/boot-9.scm:
157: 17 [catch #t #<catch-closure 25cdf0a0> ...]
In unknown file:
?: 16 [apply-smob/1 #<catch-closure 25cdf0a0>]
In ice-9/boot-9.scm:
63: 15 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
432: 14 [eval # #]
In ice-9/boot-9.scm:
793: 13 [call-with-input-file "./configure" ...]
In gash/gash.scm:
121: 12 [#<procedure 262f7700 at gash/gash.scm:120:19 (port)> #<input: ./configure 5>]
In gash/repl.scm:
38: 11 [run-repl #<input: ./configure 5> #f]
In gash/environment.scm:
371: 10 [call-with-break #<procedure 26335c00 at gash/shell.scm:400:6 ()>]
In srfi/srfi-1.scm:
616: 9 [for-each #<procedure 26335bd0 at gash/shell.scm:401:18 (value)> #]
In gash/environment.scm:
353: 8 [call-with-continue #<procedure 26315260 at gash/eval.scm:158:14 ()>]
In srfi/srfi-1.scm:
619: 7 [for-each #<procedure eval-sh (exp)> (# # #)]
In gash/shell.scm:
441: 6 [sh:cond #]
55: 5 [without-errexit #<procedure 26861c80 at gash/eval.scm:149:6 ()>]
129: 4 [sh:exec-let () "test" "-n" ""]
92: 3 [exec-utility () ...]
In srfi/srfi-1.scm:
619: 2 [for-each #<procedure 26272b60 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
In ice-9/boot-9.scm:
1473: 1 [dup->port #<output: file /dev/pts/19> "w0" 6]
In unknown file:
?: 0 [fdopen 6 "w0"]

ERROR: In procedure fdopen:
ERROR: In procedure scm_fdes_to_port: Bad file descriptor

And:

Toggle snippet (31 lines)
Backtrace:
In ice-9/boot-9.scm:
157: 13 [catch #t #<catch-closure 1879d00> ...]
In unknown file:
?: 12 [apply-smob/1 #<catch-closure 1879d00>]
In ice-9/boot-9.scm:
63: 11 [call-with-prompt prompt0 ...]
In ice-9/eval.scm:
432: 10 [eval # #]
In ice-9/boot-9.scm:
793: 9 [call-with-input-file "./configure" ...]
In gash/gash.scm:
121: 8 [#<procedure 1e905e0 at gash/gash.scm:120:19 (port)> #<input: ./configure 5>]
In gash/repl.scm:
38: 7 [run-repl #<input: ./configure 5> #f]
In gash/shell.scm:
441: 6 [sh:cond #]
55: 5 [without-errexit #<procedure 2192680 at gash/eval.scm:149:6 ()>]
129: 4 [sh:exec-let () "test" "xi686-unknown-linux-gnu" "!=" "x"]
92: 3 [exec-utility () ...]
In srfi/srfi-1.scm:
619: 2 [for-each #<procedure 1e06920 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
In ice-9/boot-9.scm:
1473: 1 [dup->port #<output: file /dev/pts/19> "w0" 6]
In unknown file:
?: 0 [fdopen 6 "w0"]

ERROR: In procedure fdopen:
ERROR: In procedure scm_fdes_to_port: Bad file descriptor

All these happen before the line:

Toggle snippet (3 lines)
checking build system type... i686-unknown-linux-gnu

Good news: I was able to reproduce with Gash over Guile 3.0.9:

Toggle snippet (44 lines)
ludo@ribbon /tmp/guix-build-gcc-mesboot-4.9.4.drv-0/gcc-4.9.4$ guix build gash
/gnu/store/mz5swdf35iwplrgdvm4z256py585nxi6-gash-0.3.0
ludo@ribbon /tmp/guix-build-gcc-mesboot-4.9.4.drv-0/gcc-4.9.4$ while /gnu/store/mz5swdf35iwplrgdvm4z256py585nxi6-gash-0.3.0/bin/gash ./configure CONFIG_SHELL=/gnu/store/98bd49rhyia49y0b9d7sk8phsq14g3nk-gash-boot-0.3.0/bin/bash SHELL=/gnu/store/98bd49rhyia49y0b9d7sk8phsq14g3nk-gash-boot-0.3.0/bin/bash --prefix=/gnu/store/awkbdj5j41pv5kiy9ifs0zl40jamwfw4-gcc-mesboot-4.9.4 --enable-fast-install --build=x86_64-unknown-linux-gnu --prefix=/gnu/store/awkbdj5j41pv5kiy9ifs0zl40jamwfw4-gcc-mesboot-4.9.4 --build=i686-unknown-linux-gnu --host=i686-unknown-linux-gnu --with-host-libstdcxx=-lsupc++ --with-native-system-header-dir=/gnu/store/gc91zbacrk6prhvm91cj3x9rr3v2k17q-glibc-mesboot-2.16.0/include --with-build-sysroot=/gnu/store/gc91zbacrk6prhvm91cj3x9rr3v2k17q-glibc-mesboot-2.16.0/include --disable-bootstrap --disable-decimal-float --disable-libatomic --disable-libcilkrts --disable-libgomp --disable-libitm --disable-libmudflap --disable-libquadmath --disable-libsanitizer --disable-libssp --disable-libvtv --disable-lto --disable-lto-plugin --disable-multilib --disable-plugin --disable-threads --enable-languages=c,c++ --enable-static --enable-shared --enable-threads=single --disable-libstdcxx-pch --disable-build-with-cxx ; do : ;done
[…]
Backtrace:
In ice-9/boot-9.scm:
1752:10 18 (with-exception-handler _ _ #:unwind? _ #:unwind-for-type _)
In unknown file:
17 (apply-smob/0 #<thunk 7f8e78d15300>)
In ice-9/boot-9.scm:
724:2 16 (call-with-prompt _ _ #<procedure default-prompt-handler (k proc)>)
In ice-9/eval.scm:
619:8 15 (_ #(#(#<directory (guile-user) 7f8e78d18c80>)))
In ice-9/ports.scm:
433:17 14 (call-with-input-file _ _ #:binary _ #:encoding _ #:guess-encoding _)
In gash/gash.scm:
121:27 13 (_ _)
In gash/repl.scm:
38:14 12 (run-repl _ _)
In gash/environment.scm:
375:8 11 (call-with-break _)
In srfi/srfi-1.scm:
634:9 10 (for-each #<procedure 7f8e75612420 at gash/shell.scm:401:18 (value)> _)
In gash/environment.scm:
357:8 9 (call-with-continue _)
In srfi/srfi-1.scm:
634:9 8 (for-each #<procedure eval-sh (exp)> _)
634:9 7 (for-each #<procedure eval-sh (exp)> _)
In gash/shell.scm:
55:39 6 (sh:and _ #<procedure 7f8e75656da0 at gash/eval.scm:149:6 ()>)
245:24 5 (_)
159:10 4 (sh:exec-let _ "expr" . _)
92:9 3 (exec-utility _ "/run/current-system/profile/bin/expr" "expr" ("xliba…" …))
In srfi/srfi-1.scm:
634:9 2 (for-each #<procedure 7f8e760654c0 at gash/shell.scm:70:12 (i)> _)
In ice-9/ports.scm:
317:17 1 (dup->port _ _ _)
In unknown file:
0 (fdopen 6 "w0")

ERROR: In procedure fdopen:
In procedure scm_fdes_to_port: Bad file descriptor

Enough backtraces for now. To be continued…

Ludo’.
L
L
Ludovic Courtès wrote on 13 Mar 17:00 -0700
(address . 75658@debbugs.gnu.org)
871pv0wkfu.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (22 lines)
> In gash/shell.scm:
> 289: 9 [sh:substitute-command #<procedure 2da0f560 at gash/eval.scm:129:35 ()>]
> 270: 8 [%subshell #<procedure v ()>]
> In ice-9/boot-9.scm:
> 157: 7 [catch quit #<procedure v ()> ...]
> In ice-9/r4rs.scm:
> 176: 6 [with-output-to-port #<variable 2de5dc00 value: #<output: file /dev/pts/19>> ...]
> In srfi/srfi-1.scm:
> 619: 5 [for-each #<procedure eval-sh (exp)> ((<sh-exec> "expr" # ":" ...))]
> In gash/shell.scm:
> 129: 4 [sh:exec-let () "expr" ...]
> 92: 3 [exec-utility () ...]
> In srfi/srfi-1.scm:
> 616: 2 [for-each #<procedure 2d60f0a0 at gash/shell.scm:70:12 (i)> (0 1 2 ...)]
> In ice-9/boot-9.scm:
> 1473: 1 [dup->port #<input: file /dev/pts/19> "r0" 7]
> In unknown file:
> ?: 0 [fdopen 7 "r0"]
>
> ERROR: In procedure fdopen:
> ERROR: In procedure scm_fdes_to_port: Bad file descriptor

I was able to capture an strace log of this:

Toggle snippet (30 lines)
15837 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb10dad7850) = 15838
15838 set_robust_list(0x7fb10dad7860, 24) = 0
15837 wait4(15838, <unfinished ...>
15838 close(3) = 0
15838 close(4) = 0
15838 pipe2([3, 4], O_CLOEXEC) = 0
[...]
15838 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fb10beaa990, parent_tid=0x7fb10beaa990, exit_signal=0, stack=0x7fb10b51b000, stack_size=0x98ef80, tls=0x7fb10beaa6c0} => {parent_tid=[15839]}, 88) = 15839
15839 rseq(0x7fb10beaafe0, 0x20, 0, 0x53053053 <unfinished ...>
15838 rt_sigprocmask(SIG_SETMASK, [], <unfinished ...>
[...]
15838 lseek(2, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
15839 close(10) = 0
15839 close(17 <unfinished ...>
15838 dup2(22, 6 <unfinished ...>
15839 <... close resumed>) = 0
15838 <... dup2 resumed>) = 6
15839 close(6 <unfinished ...>
15838 fcntl(6, F_GETFL <unfinished ...>
15839 <... close resumed>) = 0
15838 <... fcntl resumed>) = -1 EBADF (Bad file descriptor)
15839 close(7) = 0
15839 close(18) = 0
15839 close(15) = 0
15839 close(12) = 0
15839 close(9) = 0
15839 close(16) = 0
15838 write(2, "Backtrace:\n", 11) = 11

The sequence goes like this:

1. A child process (15837) corresponding to the subshell is created;

2. That process creates a finalization thread (15839);

3. Main thread does dup2(22, 6); finalization does close(6); main
thread does fcntl(6, F_GETFL), which fails with EBADF.

I suspect something like a wrong revealed count on the relevant ports,
possibly those created in ‘install-current-ports!’.

Ludo’.
T
T
Timothy Sample wrote on 14 Mar 22:08 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)
87senex4m0.fsf@ngyro.com
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (43 lines)
> I was able to capture an strace log of this:
>
> 15837 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb10dad7850) = 15838
> 15838 set_robust_list(0x7fb10dad7860, 24) = 0
> 15837 wait4(15838, <unfinished ...>
> 15838 close(3) = 0
> 15838 close(4) = 0
> 15838 pipe2([3, 4], O_CLOEXEC) = 0
> [...]
> 15838 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fb10beaa990, parent_tid=0x7fb10beaa990, exit_signal=0, stack=0x7fb10b51b000, stack_size=0x98ef80, tls=0x7fb10beaa6c0} => {parent_tid=[15839]}, 88) = 15839
> 15839 rseq(0x7fb10beaafe0, 0x20, 0, 0x53053053 <unfinished ...>
> 15838 rt_sigprocmask(SIG_SETMASK, [], <unfinished ...>
> [...]
> 15838 lseek(2, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
> 15839 close(10) = 0
> 15839 close(17 <unfinished ...>
> 15838 dup2(22, 6 <unfinished ...>
> 15839 <... close resumed>) = 0
> 15838 <... dup2 resumed>) = 6
> 15839 close(6 <unfinished ...>
> 15838 fcntl(6, F_GETFL <unfinished ...>
> 15839 <... close resumed>) = 0
> 15838 <... fcntl resumed>) = -1 EBADF (Bad file descriptor)
> 15839 close(7) = 0
> 15839 close(18) = 0
> 15839 close(15) = 0
> 15839 close(12) = 0
> 15839 close(9) = 0
> 15839 close(16) = 0
> 15838 write(2, "Backtrace:\n", 11) = 11
>
> The sequence goes like this:
>
> 1. A child process (15837) corresponding to the subshell is created;
>
> 2. That process creates a finalization thread (15839);
>
> 3. Main thread does dup2(22, 6); finalization does close(6); main
> thread does fcntl(6, F_GETFL), which fails with EBADF.
>
> I suspect something like a wrong revealed count on the relevant ports,
> possibly those created in ‘install-current-ports!’.

In “boot-9.scm”, we have

(define dup->port
(case-lambda
((port/fd mode)
(fdopen (dup->fdes port/fd) mode))
((port/fd mode new-fd)
(let ((port (fdopen (dup->fdes port/fd new-fd) mode)))
(set-port-revealed! port 1)
port))))

It looks like the system calls on the main thread correspond to this
code (which is called from ‘install-current-ports!’ via ‘dup’).
Specifically, ‘dup2’ is called from ‘dup->fdes’ and ‘fcntl’ is called
from ‘fdopen’.

The way that ‘dup->fdes’ works is that it first makes sure that no
existing port has the desired file descriptor (‘scm_evict_ports’), and
then calls ‘dup2‘. This should mean that the requested file descriptor
is up for grabs.

Here’s my guess as to what‘s happening. For brevity let’s call the port
with file descriptor 6 “P”.

1. The GC runs, nullifying the entry for P in the port table (weak key
hash table), and queuing its finalizer.

2. The evict ports loop runs, missing P because it was nullified (see
‘scm_internal_hash_fold’).

3. ‘dup2’ turns 22 to 6.

4. The finalizer for P runs, closing 6.

5. ‘fdopen’ calls ‘fcntl’ on 6, which results in EBADF.

And here’s a reproducer:

(let loop ()
(define fd #f)
(let ((P (open-input-file "/dev/null")))
;; Does not change the revealed count of P.
(set! fd (fileno P)))
(let ((port (open-input-file "/dev/null")))
(dup->port port "r" fd)
(close-port port)
(loop)))

This results in EBADF in seemingly exactly the same way. (I had to run
it a few times: sometimes it runs out of file descriptors first.) This
happens on bootstrap Guile (2.0.9) and modern Guile.

That’s all I have for now. I’m not sure how to avoid this without
resorting to calling “(gc)” to synchronously run the finalizers before
trying to mess with the file descriptors.


-- Tim
L
L
Ludovic Courtès wrote on 16 Mar 08:01 -0700
(name . Timothy Sample)(address . samplet@ngyro.com)
87wmcpnhol.fsf@gnu.org
Hello Timothy,

Thanks for chiming in.

Timothy Sample <samplet@ngyro.com> skribis:

Toggle quote (16 lines)
> And here’s a reproducer:
>
> (let loop ()
> (define fd #f)
> (let ((P (open-input-file "/dev/null")))
> ;; Does not change the revealed count of P.
> (set! fd (fileno P)))
> (let ((port (open-input-file "/dev/null")))
> (dup->port port "r" fd)
> (close-port port)
> (loop)))
>
> This results in EBADF in seemingly exactly the same way. (I had to run
> it a few times: sometimes it runs out of file descriptors first.) This
> happens on bootstrap Guile (2.0.9) and modern Guile.

Nice reproducer; I fully agree with your analysis.

See3in that ‘install-current-ports!’ creates temporary ports (via ‘dup’)
for no reason since nobody captures their reference and they get GC’d
soon after, I rewrote it like this:

Toggle snippet (18 lines)
(define (install-current-ports!)
"Install all current ports into their usual file descriptors. For
example, if @code{current-input-port} is a @code{file-port?}, make the
process file descriptor 0 refer to the file open for
@code{current-input-port}. If any current port is a @code{port?} but
not a @code{file-port?}, its corresponding file descriptor will refer
to @file{/dev/null}."
;; XXX: Input/output ports? Closing other FDs?
(for-each (lambda (i)
(gc) ;to trigger bugs
(let ((current-port (fd->current-port i)))
(match (current-port)
((? file-port? port)
(dup->fdes port i))
(#f #t))))
(iota *fd-count*)))

But this illustrates another problem: in the child process, right before
‘execve’, the finalization thread may be restarted, in which case it
creates a new pipe.

In the example below, the finalization pipe is on FDs 9 and 7, but
‘install-current-ports!’ blindly dups to FD 7, thereby closing one end
of the finalization pipe that was just created:

Toggle snippet (29 lines)
23647 pipe2([7, 9], O_CLOEXEC) = 0
23647 rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0
23647 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f84204b3990, parent_tid=0x7f84204b3990, exit_signal=0, stack=0x7f841fb24000, stack_size=0x98ef80, tls=0x7f84204b36c0} => {parent_tid=[23648]}, 88) = 23648
[…]
23647 write(9, "\0", 1) = 1
23648 <... read resumed>"\0", 1) = 1
23648 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23648 read(7, <unfinished ...>
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=35845839}) = 0
23647 dup2(12, 7) = 7
23647 fcntl(7, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
23647 lseek(7, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
23647 dup2(12, 7) = 7
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=35899320}) = 0
23647 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23647 madvise(0x7f842207c000, 12288, MADV_DONTNEED) = 0
23647 write(9, "\0", 1) = 1
23648 <... read resumed>"\0", 1) = 1
23648 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23648 read(7, <unfinished ...>
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=39539830}) = 0
23647 clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=39555997}) = 0
23647 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
23647 madvise(0x7f842207c000, 12288, MADV_DONTNEED) = 0
23647 madvise(0x7f8421d74000, 8192, MADV_DONTNEED) = 0
23647 write(9, "\0", 1) = -1 EPIPE (Broken pipe)
23647 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=23647, si_uid=1000} ---

After that dup2(12, 7) call, writing to the finalization pipe yields
SIGPIPE, which terminates the process (here it corresponds to a subshell
running ‘expr’).

Since we’re going to exec right after fork, we could turn off
finalization around ‘primitive-fork’ such that the child doesn’t attempt
to restart the finalization thread before exec. The Shepherd has code
like this:

Toggle snippet (27 lines)
(define %set-automatic-finalization-enabled?!
;; When using a statically-linked Guile, for instance in the initrd, we
;; cannot resolve this symbol, but most of the time we don't need it
;; anyway. Thus, delay it.
(let ((proc (delay
(pointer->procedure int
(dynamic-func
"scm_set_automatic_finalization_enabled"
(dynamic-link))
(list int)))))
(lambda (enabled?)
"Switch on or off automatic finalization in a separate thread.
Turning finalization off shuts down the finalization thread as a side effect."
(->bool ((force proc) (if enabled? 1 0))))))

(define-syntax-rule (without-automatic-finalization exp ...)
"Turn off automatic finalization within the dynamic extent of EXP."
(let ((enabled? #t))
(dynamic-wind
(lambda ()
(set! enabled? (%set-automatic-finalization-enabled?! #f)))
(lambda ()
exp ...)
(lambda ()
(%set-automatic-finalization-enabled?! enabled?)))))

Problem is, we cannot use the FFI on the statically-linked Guile.

We could implement fork+exec in C, but we don’t have a C compiler at
this early bootstrap stage.

Thoughts?

Ludo’.
L
L
Ludovic Courtès wrote on 16 Mar 14:32 -0700
(name . Timothy Sample)(address . samplet@ngyro.com)
87sencmzl8.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (8 lines)
> But this illustrates another problem: in the child process, right before
> ‘execve’, the finalization thread may be restarted, in which case it
> creates a new pipe.
>
> In the example below, the finalization pipe is on FDs 9 and 7, but
> ‘install-current-ports!’ blindly dups to FD 7, thereby closing one end
> of the finalization pipe that was just created:

The hack below addresses that (mostly) by reserving low-number file
descriptors before the signal and finalization threads create their
pipe. (In practice, we can only reserve FDs above 5; FDs 3 and 4 are
the “sleep pipe” I believe.)

It seems to be good enough though.

Thoughts?

Ludo’.
Toggle diff (90 lines)
diff --git a/gash/shell.scm b/gash/shell.scm
index 3611067..68e74e7 100644
--- a/gash/shell.scm
+++ b/gash/shell.scm
@@ -68,14 +68,13 @@ not a @code{file-port?}, its corresponding file descriptor will refer
to @file{/dev/null}."
;; XXX: Input/output ports? Closing other FDs?
(for-each (lambda (i)
- (match ((fd->current-port i))
- ((? file-port? port)
- (dup port i))
- ((? input-port? port)
- (dup (open-file "/dev/null" "r") i))
- ((? output-port? port)
- (dup (open-file "/dev/null" "w") i))
- (_ #t)))
+ (gc)
+ (let ((current-port (fd->current-port i)))
+ (match (current-port)
+ ((? file-port? port)
+ (let ((new (dup port i)))
+ (redirect-port port new)))
+ (#f #t))))
(iota *fd-count*)))
(define (exec-utility bindings path name args)
@@ -89,8 +88,14 @@ to @file{/dev/null}."
;; the buffer) produces its output.
(flush-all-ports)
(match (primitive-fork)
- (0 (install-current-ports!)
- (apply execle path utility-env name args))
+ (0
+ (dynamic-wind
+ (lambda ()
+ (install-current-ports!))
+ (lambda ()
+ (apply execle path utility-env name args))
+ (lambda ()
+ (primitive-exit 127))))
(pid (match-let (((pid . status) (waitpid pid)))
(set-status! (status:exit-val status)))))))
@@ -182,7 +187,10 @@ if it is our responsibility to close the port."
(define* (make-processed-redir fd target #:optional (open-flags 0))
(let ((port (match target
((? port?) target)
- ((? string?) (open target open-flags))
+ ((? string?)
+ (let ((port (open target open-flags)))
+ (set-port-revealed! port 10)
+ port))
;; TODO: Verify open-flags.
((? integer?) ((fd->current-port target)))
(#f #f))))
@@ -213,6 +221,7 @@ if it is our responsibility to close the port."
(make-processed-redir fd #f))
(('<< (? integer? fd) text)
(let ((port (tmpfile)))
+ (set-port-revealed! port 10)
(display text port)
(seek port 0 SEEK_SET)
(make-processed-redir fd port)))))
@@ -264,6 +273,7 @@ process."
(lambda () #t)
(lambda ()
(restore-signals)
+ (gc)
(set-atexit! #f)
;; We need to preserve the status given to 'exit', so we
;; catch the 'quit' key here.
diff --git a/scripts/gash.in b/scripts/gash.in
index f851c1d..57506ba 100644
--- a/scripts/gash.in
+++ b/scripts/gash.in
@@ -21,5 +21,13 @@
;;; along with Gash. If not, see <http://www.gnu.org/licenses/>.
(define (main args)
+ ;; Reserve file descriptors 5 to 12 (roughly) before the signal and
+ ;; finalization threads grab them so that a script willing to use
+ ;; them can do so without breaking Guile.
+ (let loop ((i 3))
+ (when (<= i 10)
+ (open-fdes "/dev/null" (logior O_RDONLY O_CLOEXEC))
+ (loop (+ i 1))))
+
(setenv "SHELL" ((compose canonicalize-path car command-line)))
- ((@ (gash gash) main) (command-line)))
+ ((module-ref (resolve-interface '(gash gash)) 'main) (command-line)))
L
L
Ludovic Courtès wrote on 19 Mar 14:20 -0700
[PATCH 0/4] Fixes for subshells and redirections
(address . gash-devel@nongnu.org)
20250319212033.4643-1-ludo@gnu.org
Hello,

This fixes issues reported at https://issues.guix.gnu.org/75658
and related I noticed while looking at the code.

Feedback welcome!

Thanks,
Ludo'.

Ludovic Courtès (4):
shell: Exit child process when ‘execle’ fails.
shell: Remove dead code in ‘install-current-ports!’.
shell: ‘install-current-ports!’ opens file descriptors, not ports.
Open low-numbered file descriptors for use by the shell.

gash/shell.scm | 29 +++++++++++++++++++++--------
scripts/gash.in | 14 +++++++++++++-
tests/exiting.org | 27 +++++++++++++++++++++++++++
3 files changed, 61 insertions(+), 9 deletions(-)


base-commit: ec9f0313190e380687da387b4207469a0a0a8cd8
--
2.48.1
L
L
Ludovic Courtès wrote on 19 Mar 14:27 -0700
Re: bug#75658: Non-deterministic Gash error in ‘gcc-mesboot-4.9.4’
(name . Timothy Sample)(address . samplet@ngyro.com)
87v7s4emoc.fsf@gnu.org
Hello Timothy,

Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> The hack below addresses that (mostly) by reserving low-number file
> descriptors before the signal and finalization threads create their
> pipe. (In practice, we can only reserve FDs above 5; FDs 3 and 4 are
> the “sleep pipe” I believe.)

I’ve just sent cleaned-up patches to gash-devel including this
fix/workaround.

It passes my tests, meaning that I cannot reproduce the original bug in
a timely fashion when running:

./pre-inst-env gash -c 'exec 2>/dev/null; while true; do echo $(sh --version) > /dev/null; done'

or when running part of the GCC 4.9.4 ‘configure’ script in a loop
(attached is the helper script I used for that; not shown here is a
manual modification of said script so that it exits after “checking for
a sed that does not truncate output”, which was sufficient to reproduce
the bug, possibly after many iterations).

It would be great to cut a Gash release soonish as this bug has been
blocking the ‘core-packages-team’ branch for a while already.

Thanks,
Ludo’.
#!/bin/sh
set -x
export COLUMNS=200
#STRACE="strace -s 100 -f -o log.strace"
PATCH=--with-patch=gash=$PWD/gash-redirect-EBADF.patch
export SHELL=$(guix build gash $PATCH)/bin/gash
export CONFIG_SHELL=$SHELL
OPTIONS="--prefix=/wherever --disable-bootstrap --disable-decimal-float --disable-libatomic --disable-libcilkrts --disable-libgomp --disable-libitm --disable-libmudflap --disable-libquadmath --disable-libsanitizer --disable-libssp --disable-libvtv --disable-lto --disable-lto-plugin --disable-multilib --disable-plugin --disable-threads --enable-languages=c,c++ --enable-static --enable-shared --enable-threads=single --disable-libstdcxx-pch --disable-build-with-cxx"

cd /data/src/gcc-4.9.4
while $STRACE $SHELL -e ./configure $OPTIONS $OPTIONS $OPTIONS
do
grep fcntl.*EBADF log.strace && break
done
L
L
Ludovic Courtès wrote on 20 Mar 07:18 -0700
Re: bug#75658: [PATCH 0/4] Fixes for subshells and redirections
(address . gash-devel@nongnu.org)
87msdfbxb2.fsf@gnu.org
Ludovic Courtès <ludo@gnu.org> skribis:

Toggle quote (5 lines)
> shell: Exit child process when ‘execle’ fails.
> shell: Remove dead code in ‘install-current-ports!’.
> shell: ‘install-current-ports!’ opens file descriptors, not ports.
> Open low-numbered file descriptors for use by the shell.

For the record, I also built this series with Guile 2.0.9, by modifying
‘guix.scm’ to refer to it instead of ‘guile-3.0’ and turning off tests
(since they require (srfi srfi-64), which 2.0.9 doesn’t have).

It appears to work fine and passes this test:

timeout 10m \
/gnu/store/3ylfablfwsdaapgk2y3x8yjchmapasxs-gash-0.3.0.6-f988cb-dirty/bin/gash -c 'exec 7>/dev/null; while true; do echo $(sh --version) > /dev/null; done'

Ludo’.
J
J
Janneke Nieuwenhuizen wrote on 27 Mar 00:16 -0700
Re: [PATCH 0/4] Fixes for subshells and redirections
(name . Ludovic Courtès)(address . ludo@gnu.org)
87zfh72bay.fsf@gnu.org
Ludovic Courtès writes:

Hi!

Toggle quote (5 lines)
> This fixes issues reported at https://issues.guix.gnu.org/75658
> and related I noticed while looking at the code.
>
> Feedback welcome!

That's awesome, what a terrible puzzle that was!

I'm hoping Timothy finds the time to review/merge/release Gash. We
could carry these patches in Guix, but yeah.

Greetings,
Janneke

--
Janneke Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond https://LilyPond.org
Freelance IT https://www.JoyOfSource.com| Avatar® https://AvatarAcademy.com
L
L
Ludovic Courtès wrote on 2 Apr 07:28 -0700
Re: bug#75658: [PATCH 0/4] Fixes for subshells and redirections
(name . Janneke Nieuwenhuizen)(address . janneke@gnu.org)
87y0wibpsx.fsf@gnu.org
Hi there!

Janneke Nieuwenhuizen <janneke@gnu.org> skribis:

Toggle quote (5 lines)
> That's awesome, what a terrible puzzle that was!
>
> I'm hoping Timothy finds the time to review/merge/release Gash. We
> could carry these patches in Guix, but yeah.

Yup, it would be great if one of you could do that. :-)

Especially since ‘core-packages-team’ has been queued for a while now
and the latest attempts to evaluate the branch have all failed due to
we were lucky on the previous ‘core-updates’ cycle, or just retried
until it would eventually work!).

Cheers,
Ludo’.
L
L
Ludovic Courtès wrote on 22 Apr 08:23 -0700
Re: [PATCH 0/4] Fixes for subshells and redirections
(address . gash-devel@nongnu.org)
87tt6gqkxs.fsf@gnu.org
Hello comrades,

Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (3 lines)
> This fixes issues reported at https://issues.guix.gnu.org/75658
> and related I noticed while looking at the code.

[...]

Toggle quote (5 lines)
> shell: Exit child process when ‘execle’ fails.
> shell: Remove dead code in ‘install-current-ports!’.
> shell: ‘install-current-ports!’ opens file descriptors, not ports.
> Open low-numbered file descriptors for use by the shell.

Timothy, Janneke: could you review/apply these Gash patches and cut a
release? The ‘core-packages-team’ has been blocked on this issue for
months.

We cannot easily apply patches to the package definitions because
‘patch’ isn’t available this early bootstrapping phase. So if there’s
no “official” Gash release, we would have to host a copy somewhere,
which is not ideal. But perhaps we can take that route if a Gash
release cannot be made by, say, May 5th?

Let me know what you think!

Ludo’.
L
L
Ludovic Courtès wrote on 6 May 01:26 -0700
(address . gash-devel@nongnu.org)
871pt216xo.fsf@gnu.org
Hello,

"Ludovic Courtès" <ludo@gnu.org> writes:

Toggle quote (10 lines)
> Timothy, Janneke: could you review/apply these Gash patches and cut a
> release? The ‘core-packages-team’ has been blocked on this issue for
> months.
>
> We cannot easily apply patches to the package definitions because
> ‘patch’ isn’t available this early bootstrapping phase. So if there’s
> no “official” Gash release, we would have to host a copy somewhere,
> which is not ideal. But perhaps we can take that route if a Gash
> release cannot be made by, say, May 5th?

I cloned the repo and pushed the branch here (I had to adjust
‘tests/existing.org’ because the new test would fail in VPATH builds, as
shown by ‘make distcheck’):


It tagged it as “3.0.1”. Feel free to eventually merge it. (In the
future, we should also make sure several people have commit rights to
the official repo.)

I verified with:

./pre-inst-env guix build -e '(@@ (gnu packages commencement) gcc-mesboot)'

(This took ~18h on a beefy machine!) The build log does not show any
funky backtrace during ./configure runs.

I pushed an update of the ‘gash’ package in commit
b4693b9d4e131a96e8491651914d6c47d7eca7af of ‘core-packages-team’.

So I think this unlocks the ‘core-packages-team’ branch.

Next up: rebasing on current ‘master’.

Ludo’.
-----BEGIN PGP SIGNATURE-----

iQJBBAEBCgArFiEEPORkVYqE/cadtAz7CQsRmT2a67UFAmgZx6gNHGx1ZG9AZ251
Lm9yZwAKCRAJCxGZPZrrtVifEACO5HTFuOQ7KT0yXej1H2ztxduA0rfUDj/lTKnQ
kU8OoxYjxPrG6MKnXePU03Skhdc/qH9zUorB2UHHKQQsO2wTzcO1w1TjgZCOUdS9
/EpQvA5J5U2reccxO055Qm7f2Ot63ENukbw/6jhqraPJMcf7ESmS590YiT19158H
B5RX1OqWLXLu8ZBEUI6yCQk1G1gM/bz9HoLXvMkDghOKI/eCO5lvTE9pvgXoXNbO
U47jo6atW99gP1YBJY8fbcEZ/YyxdZ8LQhBmDeas8Rqia6yLiFiwXAepcvnHkwp3
iQs/bYmPK45OgEsVXfD/PsVkmBCVGhKYBxlfOpwzMUMzwLySobqjrI/2cBsAIUZ9
FoEuWTN6ZtaQc5IrWWk2au1cXM8+FfrsXY3ODitY65I7xKXuDvpa8wmgGIhVXAzV
xZtorlPljjR53n3O0UpPSApnZnmtzV6xLOemOqCpt5DeRXYxL7HS+QQS9o7aeQdn
xy1vjXBSzlgp+ZmTjWsrXZQp7ugpee707dc8ObMQWlZGzMPMHB/PaUDYu/KayFH8
I0ybEzurXNnw1Tq5Q3OtcCTfOJLm5texTTK52ykqSOllr+PT6dWDuLlTdlpmctey
Vf2w3L/eXVAGT3vOkPuofLmrQ6h++QC9TW1Ajke/6nzKJukRYh7JZHQltTKyZaXf
9hLoKA==
=BCzl
-----END PGP SIGNATURE-----

A
A
Andreas Enge wrote on 6 May 01:38 -0700
(name . Ludovic Courtès)(address . ludo@gnu.org)
aBnKjmtULPj2kgSm@jurong
Am Tue, May 06, 2025 at 10:26:11AM +0200 schrieb Ludovic Courtès:
Toggle quote (2 lines)
> Next up: rebasing on current ‘master’.

I tried yesterday, but made a mistake (forgot to push the unrebased
branch as core-packages-team-old3 first). Then when reverting the rebase
and trying to push, I realised I could not push a branch without
compiling it first. And this took me one hour on my old two core
machine! Much of this was spent in PO4 and POXREF.

I will try again, but cannot say how many hours this will take...

Andreas
L
L
Ludovic Courtès wrote on 6 May 02:22 -0700
(name . Andreas Enge)(address . andreas@enge.fr)
87a57qxfdt.fsf@gnu.org
Hi,

Andreas Enge <andreas@enge.fr> writes:

Toggle quote (6 lines)
> I tried yesterday, but made a mistake (forgot to push the unrebased
> branch as core-packages-team-old3 first). Then when reverting the rebase
> and trying to push, I realised I could not push a branch without
> compiling it first. And this took me one hour on my old two core
> machine! Much of this was spent in PO4 and POXREF.

I don’t think you necessarily need to push the old branch first.

Anyhow, let me know if you need help (or support :-)).

Thanks,
Ludo’.
Z
Re: bug#75658: [PATCH 0/4] Fixes for subshells and redirections
(name . Ludovic Courtès)(address . ludo@gnu.org)
87frhfla0e.fsf@z572.online
Ludovic Courtès <ludo@gnu.org> writes:

Toggle quote (14 lines)
> Hi,
>
> Andreas Enge <andreas@enge.fr> writes:
>
>> I tried yesterday, but made a mistake (forgot to push the unrebased
>> branch as core-packages-team-old3 first). Then when reverting the rebase
>> and trying to push, I realised I could not push a branch without
>> compiling it first. And this took me one hour on my old two core
>> machine! Much of this was spent in PO4 and POXREF.
>
> I don’t think you necessarily need to push the old branch first.
>
> Anyhow, let me know if you need help (or support :-)).

on ci.guix:

building of `/gnu/store/nbb9svpwh2zzqx92xcjbpjznia2mwvbb-gcc-mesboot-4.9.4.drv' timed out after 21600 seconds

I think gcc-mesboot might be necessary to set larger timeout properties.



Toggle quote (3 lines)
>
> Thanks,
> Ludo’.
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEfr6klGDOXiwIdX/bO1qpk+Gi3/AFAmgcexEACgkQO1qpk+Gi
3/AyXBAAh08wtLNquSIIlOWk99YXNW+0kJi/RwEFUqbKZh90Zbe4Pc2ApY2LsNmF
CGaQy36OnQv/701DCRlk9tyOIJlRz+g5pqH+raYVb77lbEmaRkNWw6FPcQhEnWxf
oqcDOmWl17YyVrUa6hy14BwQq31cFEo8Bh1WsS6BfQ2ky6dl71iy6nGVISOspsp7
gCUV4nBzeBNRX94KJsZW4LvlYjSdIFQ5E/41BvY/keW5YnPi7GsBB8yfvJoe6Fbc
zHJBPVwRLhN2EvLHWOUbejXDXYhOOD3jw5n82oyc9tmt1ovqmqjv7fVXxxWOCd5i
XaIbIwEmivVny1fzWkKVh4sjbNn238947MTVRF6yjWmNOKCnzvP+4sE9yn3t+z87
WwGpLUcORopgUBB84RvZj/8fEZRKGz6sCxoR9MGl7+rWohwc3Vtcf09B9rhu1X+p
n1kjcdEBx22x6Nv4VNAIAbRkT/ew57fg2bTWmjnMMCXbxox+q2qYTxxjiyDhbXkQ
pexAQrRFXZavmb/c1ZGtM/RkuU5eWnzx5PtE9ew6ZcxUnHc3MQTqAmHsm0rTrVa6
18eHc5cqHivpt6jxzx/BJc6UUaPr9qcPIkgpjqSAZvoQym6gxs35F6Ml9UINjGq2
CrZafRtmA5wT8f5ef84vjb+pks3pmPOR3GQ6RIAVCA+vheAx5nc=
=IaJ1
-----END PGP SIGNATURE-----

A
A
Andreas Enge wrote on 8 May 02:50 -0700
(name . Z572)(address . z572@z572.online)
aBx-T-EVcH99Dg-H@jurong
Am Thu, May 08, 2025 at 05:36:17PM +0800 schrieb Z572:
Toggle quote (4 lines)
> on ci.guix:
> building of `/gnu/store/nbb9svpwh2zzqx92xcjbpjznia2mwvbb-gcc-mesboot-4.9.4.drv' timed out after 21600 seconds
> I think gcc-mesboot might be necessary to set larger timeout properties.

My suspicion is rather that something is wrong. On my own machine (with
4 cores), I saw the same timeout, but after 3600s; previous iterations
went through without a problem.

QA also has trouble evaluating this branch, but with a test failure in
git-minimal:

Andreas
Z
(name . Andreas Enge)(address . andreas@enge.fr)
87a57nl67x.fsf@z572.online
Andreas Enge <andreas@enge.fr> writes:

Toggle quote (13 lines)
> Am Thu, May 08, 2025 at 05:36:17PM +0800 schrieb Z572:
>> on ci.guix:
>> building of `/gnu/store/nbb9svpwh2zzqx92xcjbpjznia2mwvbb-gcc-mesboot-4.9.4.drv' timed out after 21600 seconds
>> I think gcc-mesboot might be necessary to set larger timeout properties.
>
> My suspicion is rather that something is wrong. On my own machine (with
> 4 cores), I saw the same timeout, but after 3600s; previous iterations
> went through without a problem.
>
> QA also has trouble evaluating this branch, but with a test failure in
> git-minimal:
> https://data.qa.guix.gnu.org/job/9798

In fact, the python-minimal test failed.

Rolling back openssl to 3.0 can lead to successful compilation.

build with openssl 3.4, but my test was not successful.

I'm going to roll back openssl first.

Toggle quote (2 lines)
>
> Andreas
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEfr6klGDOXiwIdX/bO1qpk+Gi3/AFAmgcjkIACgkQO1qpk+Gi
3/C0rw//Z69migBzUtsmNp22+56I47hM52QCYbaFGALFvpCev/DlRVa/ngPKLw2R
LY/OX258xKe/cuzPGZawx3+LxCqFBwUu5Zqma2AWMG36ftcyuZeQJLI8ajgqCC/Q
wvl874dOvqFjdU5CZJkA1Bd+CkvXbvRm6mTY/KMeIu/5N9f13JMmhIaMA34OyKIz
wQYKr9DO/BX+pu2tUORp9ZF3HZI3bU+vd+caA1js6WZ1ZbywefdZNMLmZ2KmuQge
uSjT/dE5bB8s7Yk3eE0AqK1YKtN79TlDvO9VSQyHEPWy48/Fuzukd0sQe0IgjUhO
RjXkPvYHsNRfU24QgL6VgPxtpz1M0FI2rd9KCKTnWP3WIIEuQRmvPK37X/MtSXEP
gMhv3T3Z27ekTx4gZ2LemsEFPU/dIOqbjAeghC1sZxVK1ztoBZXtrfolGpznKVTQ
JqRA5eyziO0Ye681Tq9nk9YZ+QEMhqWRCNXPjkGnzurUoSsBj1a92aKdFfSxMPhV
YtT4n4+a3nf0plBUPc5hahikYAK8rMtKSmhBVoRHxkn13bnWDir0FantzKvD5CQi
cAemxl8YdpEII0AJNZtpcEbeymE1TGtuYafF7bL5pQzHsB0PNYW/9rw33y1LoI9j
+hDIYjbW3fnFP+BgFYH3V09B4jIUXwxIeIC/ZhzOYI5HRQj5NGg=
=Ikg0
-----END PGP SIGNATURE-----

L
L
Ludovic Courtès wrote on 8 May 07:35 -0700
(name . Andreas Enge)(address . andreas@enge.fr)
87wmaruq58.fsf@gnu.org
Andreas Enge <andreas@enge.fr> writes:

Toggle quote (7 lines)
> Am Thu, May 08, 2025 at 05:36:17PM +0800 schrieb Z572:
>> on ci.guix:
>> building of `/gnu/store/nbb9svpwh2zzqx92xcjbpjznia2mwvbb-gcc-mesboot-4.9.4.drv' timed out after 21600 seconds
>> I think gcc-mesboot might be necessary to set larger timeout properties.
>
> My suspicion is rather that something is wrong.

Yes, it looks like a problem with offloading on berlin: the build seems
to stall when offloaded to build machines.

I’ve started a ‘--no-offload’ build manually so we can at least move
forward.

Ludo’.
L
L
Ludovic Courtès wrote on 8 May 07:36 -0700
(name . Z572)(address . z572@z572.online)
87r00zuq2q.fsf@gnu.org
Z572 <z572@z572.online> writes:

Toggle quote (2 lines)
> I'm going to roll back openssl first.

BTW, I’d like to ungraft everything on this branch.

We should avoid touching packages like OpenSSL anyway since they are
outside the scope of ‘core-packages’ and doing so could cause conflicts
with ungrafting etc.

Thanks,
Ludo’.
A
A
Andreas Enge wrote on 14 Jun 06:50 -0700
Unblock
(address . control@debbugs.gnu.org)
aE1-MSPdv-Gw88uh@jurong
unblock 75518 by 76311,76654,76640,75676,75658
thanks

# unblock by closed bugs; this makes no practical difference,
# but clarifies the situation
?
Your comment

Commenting via the web interface is currently disabled.

To comment on this conversation send an email to 75658@patchwise.org

To respond to this issue using the mumi CLI, first switch to it
mumi current 75658
Then, you may apply the latest patchset in this issue (with sign off)
mumi am -- -s
Or, compose a reply to this issue
mumi compose
Or, send patches to this issue
mumi send-email *.patch