diff options
author | Paulo Zanoni <paulo.r.zanoni@intel.com> | 2014-09-03 14:47:21 -0300 |
---|---|---|
committer | Daniel Vetter <daniel.vetter@ffwll.ch> | 2014-09-04 10:35:21 +0200 |
commit | a16ebccaee426ef96b0894e3af733360d88ce4d7 (patch) | |
tree | e3ee28c9fe1cb1ae92b034e12e42a16b552745e9 | |
parent | 081887504a72878fc5f54778af68ebd9404b4e04 (diff) | |
download | intel-gpu-tools-a16ebccaee426ef96b0894e3af733360d88ce4d7.tar.gz intel-gpu-tools-a16ebccaee426ef96b0894e3af733360d88ce4d7.tar.bz2 intel-gpu-tools-a16ebccaee426ef96b0894e3af733360d88ce4d7.zip |
igt_core: zero exit_handler_count before forking
If we don't reset exit_handler_count before forking, we may have a
case where the forked process is killed before it even does
"exit_handler_count = 0": in that case, it is still finishing forking.
When that happens, we may end up calling our exit handlers. On the
specific bug I'm investigating, we call igt_reset_connnectors(), which
ends up in a deadlock inside malloc_atfork. If we attach gdb to the
forked process and get a backtrace, we have:
(gdb) bt
0 __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
1 0x00007f15634d36bf in _L_lock_10524 () from /lib/x86_64-linux-gnu/libc.so.6
2 0x00007f15634d12ef in malloc_atfork (sz=139729840351352, caller=<optimized out>) at arena.c:181
3 0x00007f15640466a1 in drmMalloc () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
4 0x00007f1564049ad7 in drmModeGetResources () from /usr/lib/x86_64-linux-gnu/libdrm.so.2
5 0x0000000000408f84 in igt_reset_connectors () at igt_kms.c:1656
6 0x00000000004092dc in call_exit_handlers (sig=15) at igt_core.c:1130
7 fatal_sig_handler (sig=15) at igt_core.c:1154
8 <signal handler called>
9 0x00007f15634cce60 in ptmalloc_unlock_all2 () at arena.c:298
10 0x00007f156350ca3f in __libc_fork () at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:188
11 0x000000000040a029 in __igt_fork_helper (proc=proc@entry=0x610fc4 <signal_helper>) at igt_core.c:910
12 0x000000000040459d in igt_fork_signal_helper () at igt_aux.c:110
13 0x0000000000402ab7 in __real_main63 () at bug.c:76
14 0x000000000040296e in main (argc=<optimized out>, argv=<optimized out>) at bug.c:63
After doing some searches for "stuck at malloc_atfork", it seems to me
we probably shouldn't be doing any malloc calls at this point of the
code, so the best way to do that is to make sure we can't really run
the exit handlers.
So on this patch, instead of resetting the exit handlers after
forking, we reset them before forking, and then restore the original
value on the parent process.
I can reproduce this problem by running "./kms_flip --run-subtest
2x-flip-vs-modeset" under an infinite loop. Usually after a few
hundred calls, we end up stuck on the deadlock mentioned above. QA
says this problem happens every time, but I'm not sure what is the
difference between our environments that makes the race condition so
much easier for them.
The kms_flip.c problem can be considered a regression introduced by:
commit eef768f283466b6d7cb3f08381f72ccf3951dc99
Author: Thomas Wood <thomas.wood@intel.com>
Date: Wed Jun 18 14:28:43 2014 +0100
tests: enable extra connectors in kms_flip and kms_pipe_crc_basic
even though this commit is not the one that introduced the real
problem.
It is also possible to reproduce this problem with a few modifications
to template.c:
- Add a call to igt_enable_connectors() inside the first fixture.
- Add igt_fork_signal_helper() and igt_stop_signal_helper() calls
around subtest B.
Note that the crucial piece is that the parent actively kills helper
children, and if we skip tests this can happen _really_ fast. See e.g.
commit a031a1bf93b828585e7147f06145fc5030814547
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Fri Sep 13 16:43:22 2013 +0200
lib/drmtest: ducttape over fork race
for past hilarity in this area.
Cc: Thomas Wood <thomas.wood@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81367
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-rw-r--r-- | lib/igt_core.c | 11 |
1 files changed, 10 insertions, 1 deletions
diff --git a/lib/igt_core.c b/lib/igt_core.c index 7f14b17a..5cd7d8f8 100644 --- a/lib/igt_core.c +++ b/lib/igt_core.c @@ -895,6 +895,7 @@ bool __igt_fork_helper(struct igt_helper_process *proc) { pid_t pid; int id; + int tmp_count; assert(!proc->running); assert(helper_process_count < ARRAY_SIZE(helper_process_pids)); @@ -904,16 +905,24 @@ bool __igt_fork_helper(struct igt_helper_process *proc) igt_install_exit_handler(fork_helper_exit_handler); + /* + * Avoid races when the parent stops the child before the setup code + * had a chance to run. This happens e.g. when skipping tests wrapped in + * the signal helper. + */ + tmp_count = exit_handler_count; + exit_handler_count = 0; switch (pid = fork()) { case -1: + exit_handler_count = tmp_count; igt_assert(0); case 0: - exit_handler_count = 0; reset_helper_process_list(); oom_adjust_for_doom(); return true; default: + exit_handler_count = tmp_count; proc->running = true; proc->pid = pid; proc->id = id; |