Bug 1512 - Elf-loader falls into deadlock during relocation (Ubuntu1204-64bits/Fedora16-64bits)
Elf-loader falls into deadlock during relocation (Ubuntu1204-64bits/Fedora16-...
Status: REOPENED
Product: dce
Classification: Unclassified
Component: other
unspecified
All All
: P5 normal
Assigned To: Hajime Tazaki
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-16 02:56 EDT by Hajime Tazaki
Modified: 2014-07-28 19:58 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hajime Tazaki 2012-10-16 02:56:38 EDT
when libm.so.6 is loaded via elf-loader, the loader will be deadlocked.


#0  0x00007ffff7deee2c in machine_syscall6 (name=202, a1=140737354081064, a2=0, a3=2, a4=0, a5=0, 
    a6=0)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/x86_64/machine.c:352
#1  0x00007ffff7dea1ec in system_futex_wait (uaddr=0x7ffff7ff2328, val=2)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/system.c:101
#2  0x00007ffff7ded0c8 in futex_lock (futex=0x7ffff7ff2328)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/futex.c:34
#3  0x00007ffff7deb1d2 in vdl_reloc_index_jmprel (file=0x7ffff7ff5a98, index=3)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/vdl-reloc.c:290
#4  0x00007ffff7deee5c in machine_resolve_trampoline () from ../ldso
#5  0x0000000000000202 in ?? ()
#6  0x0000000000000012 in ?? ()
#7  0x00007ffff7ff61e8 in ?? ()
#8  0x00007ffff7df8100 in ?? () from ../ldso
#9  0x0000000000000001 in ?? ()
#10 0x00007ffff7df7f40 in ?? () from ../ldso
#11 0x00000000002f90a8 in ?? ()
#12 0x000000000000000c in ?? ()
#13 0x00007ffff753b580 in ?? ()
#14 0x00007ffff7ff5a98 in ?? ()
#15 0x0000000000000003 in ?? ()
#16 0x00007ffff753b589 in ?? ()
#17 0x00007fffffffe2f0 in ?? ()
#18 0x00007ffff7dee8d5 in machine_lazy_reloc (file=0x7ffff7ff5a98)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/x86_64/machine.c:158
#19 0x00007ffff7deb5f3 in do_reloc (file=0x7ffff7ff5a98, now=0)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/vdl-reloc.c:391
#20 0x00007ffff7deb6c4 in vdl_reloc (files=0x7ffff7ff37d8, now=0)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/vdl-reloc.c:415
#21 0x00007ffff7df5545 in dlopen_with_context (context=0x7ffff7ff2408, 
    filename=0x40081b "libm.so.6", flags=1)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/vdl-dl.c:241
#22 0x00007ffff7df56aa in vdl_dlopen (filename=0x40081b "libm.so.6", flags=1)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/vdl-dl.c:299
#23 0x00007ffff7df67fc in vdl_dlopen_public (filename=0x40081b "libm.so.6", flag=1)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/vdl-dl-public.c:6
#24 0x00007ffff7be4b00 in dlopen (filename=0x40081b "libm.so.6", flag=1)
    at /home/furbani/buildbot/bb/sandbox/slave/full-10/build/dceandco/elf-loader/libvdl.c:15
#25 0x00000000004006a4 in main (argc=1, argv=0x7fffffffe5b8) at test22.c:8


this seems to be happened when relocation type is R_X86_64_IRELATIVE.


% readelf -a -Wt /lib64/libm.so.6 (on Fedora16-64)
0000003add483030  0000000900000007 R_X86_64_JUMP_SLOT     0000000000000000 strtof + 0
0000003add483038  0000000000000025 R_X86_64_IRELATIVE                              0000003add21a9d0
0000003add483040  0000000000000025 R_X86_64_IRELATIVE                              0000003add22f0b0
0000003add483048  0000000000000025 R_X86_64_IRELATIVE                              0000003add22f6d0

strtof causes this issue.

I added test case to reproduce this.
http://code.nsnam.org/thehajime/elf-loader-outgoing/rev/d688cc7ec69e
Comment 1 Hajime Tazaki 2013-02-22 10:36:46 EST
fixed (by Mathieu).
changeset 646 f160f9a83aee
Comment 2 Hajime Tazaki 2014-07-28 19:58:37 EDT
when dce-runner use RTLD_LAZY with dlopen/dlmopen, elf-loader again deadlocks.

the reason is almost the same: when the symbol 'floor' in libm.so is going to be resoved during the lazy path (via machine_lazy_reloc()), the loader reenter the relocation, resulting double futex aquition.

the path for RTLD_NOW is no problem (as the original fix intended to).

changeset 1dbf1c14ee16 of ns-3-dce is a workaround, to avoid to use RTLD_LAZY.

http://code.nsnam.org/ns-3-dce/rev/1dbf1c14ee16