It looks worse than you can imagine! I can imagine some pretty bad things! That's why I said *worse*! | |
Terry Pratchett, Moving Pictures |
The fancy output format of The address of main was chosen for a reason. It is valid input for /bin/sh. Let's check out whether the program from The magic of the Elf has main at the same offset.
The disassembly filters below are based on TEVWH_ASM_RETURN, a regular expression in perl syntax. Unfortunately plain sed on FreeBSD 4.7 has no "\|" or anything equivalent while option -E switches to modern regular expressions with incompatible syntax. The matter is further complicated by branch delay slots. On Sparc the instruction following a ret is executed while the jump is under way. A typical instruction to put there is restore. But triggering on that is not a clean solution.
Command: pre/i386-redhat7.3-linux/magic_elf/ndisasm.sh
#!/bin/bash
. out/i386-redhat7.3-linux/magic_elf/addr_of_main
/usr/bin/ndisasm -e ${ofs} -o 0x${main_l} -U \
tmp/i386-redhat7.3-linux/magic_elf/magic_elf \
| /usr/bin/perl -ne "print $_; exit if m/\b(ret|hlt)\b/;" |
Output: out/i386-redhat7.3-linux/magic_elf/ndisasm.asm
The output of objdump includes function labels. Filtering the complete disassembly can yield the desired code without prior knowledge of the function address. But since we already have the value we use --start-address for symmetry with ndisasm. That option accepts only numeric values, not symbol names.
Command: pre/i386-redhat7.3-linux/magic_elf/objdump.sh
#!/bin/bash
. out/i386-redhat7.3-linux/magic_elf/addr_of_main
/usr/bin/objdump -d --start-address=0x${addr_main_x} \
tmp/i386-redhat7.3-linux/magic_elf/magic_elf \
| pre/i386-redhat7.3-linux/magic_elf/objdump_format.pl -start_address=${addr_main_x} |
Command: pre/i386-redhat7.3-linux/magic_elf/objdump_format.pl
#!/usr/bin/perl -sw
# Perl 5.005_03 (part of FreeBSD 4.7) does not have [:xdigit:]
$::start_address='[0-9a-fA-F]+' if (!defined($::start_address));
# skip to start address
my $pattern = '^\s*' . $::start_address . ':';
while (<>) { last if m/$pattern/; }
for(;;)
{
s/\s+$//;
my $comment = s/\s+(;)\s*(.*)// ? "$1 $2" : '';
my ( $addr, $hexdump, $asm ) = split(/ *\t/);
my $line = sprintf("%-11s %-19s ", $addr, $hexdump);
$asm = sprintf('%-7s %s', $1, $2) if ($asm =~ m/^(\S+)\s+(.*)/);
$line = sprintf("%-11s %-19s %s", $addr, $hexdump, $asm);
$line = sprintf("%-s59 %s", $line, $comment) if (length($comment) > 0);
print $line . "\n";
last if ($asm =~ m/\b(ret|hlt)\b/);
last if (!($_ = <>));
} |
Output: out/i386-redhat7.3-linux/magic_elf/objdump.asm
8048400: 55 push %ebp
8048401: 89 e5 mov %esp,%ebp
8048403: 83 ec 0c sub $0xc,%esp
8048406: 6a 03 push $0x3
8048408: 68 01 80 04 08 push $0x8048001
804840d: 6a 01 push $0x1
804840f: e8 bc fe ff ff call 80482d0 <_init+0x38>
8048414: b8 00 00 00 00 mov $0x0,%eax
8048419: c9 leave
804841a: c3 ret |
This looks like a real main. So both programs indeed have main at the same offset. Unfortunately a brief look through /bin proves this to be pure chance. And instead of a real system call for write(2) we see something strange. It resolves to a location in a shared library. But what function in what library?
Command: pre/i386-redhat7.3-linux/magic_elf/gdb_core.sh
#!/bin/bash
/usr/bin/gdb ${1} -q <<EOF 2>&1
set disassembly-flavor intel
disassemble ${2}
EOF |
Command: pre/i386-redhat7.3-linux/magic_elf/gdb_format.pl
#!/usr/bin/perl -nw
if (m/([^:]+):\s+(\S+)\s+(.*)/)
{
printf "%-26s%-13s ", $1 . ':', $2;
my $opcode = $2;
my $rest = $3;
if ($rest =~ s/\s+;\s*(.*)//)
{ printf "%-20s; %s\n", $rest, $1; }
else
{ print $rest . "\n"; }
exit(0) if ($opcode =~ m/(ret|hlt)/);
} |
Command: pre/i386-redhat7.3-linux/magic_elf/gdb.sh
#!/bin/bash
file=${1:-tmp/i386-redhat7.3-linux/magic_elf/magic_elf}
func=${2:-main}
/bin/echo "[func=${func}]"
pre/i386-redhat7.3-linux/magic_elf/gdb_core.sh ${file} ${func} \
| pre/i386-redhat7.3-linux/magic_elf/gdb_format.pl |
Output: out/i386-redhat7.3-linux/magic_elf/gdb
[func=main]
0x8048400 <main>: push ebp
0x8048401 <main+1>: mov ebp,esp
0x8048403 <main+3>: sub esp,0xc
0x8048406 <main+6>: push 0x3
0x8048408 <main+8>: push 0x8048001
0x804840d <main+13>: push 0x1
0x804840f <main+15>: call 0x80482d0 <write>
0x8048414 <main+20>: mov eax,0x0
0x8048419 <main+25>: leave
0x804841a <main+26>: ret |
Looks better. We need a way to retrieve the function name, write, from this output. Then we can feed gdb this argument for disassembly.
Command: pre/i386-redhat7.3-linux/evil_magic/first_gdb_func.sed
#!/bin/sed -nf
/.*<\(.*\)>$/ {
s//\1/
p
q
} |
Command: pre/i386-redhat7.3-linux/evil_magic/gdb_write.sh
#!/bin/bash
file=${1:-tmp/i386-redhat7.3-linux/magic_elf/magic_elf}
func=$( pre/i386-redhat7.3-linux/evil_magic/first_gdb_func.sed \
< out/i386-redhat7.3-linux/magic_elf/gdb )
/bin/echo "[func=${func}]"
pre/i386-redhat7.3-linux/magic_elf/gdb_core.sh ${file} ${func} \
| pre/i386-redhat7.3-linux/magic_elf/gdb_format.pl |
Output: out/i386-redhat7.3-linux/evil_magic/write.gdb
[func=write]
0x80482d0 <write>: jmp ds:0x8049584
0x80482d6 <write+6>: push 0x8
0x80482db <write+11>: jmp 0x80482b0 <_init+24> |
Oops. Shared libraries don't share their secrets with everyone.
We can now search for a fine manual explaining how to debug shared libraries. Or just compile the bugger static.
Command: pre/i386-redhat7.3-linux/magic_elf/cc_static.sh
#!/bin/bash
/usr/bin/gcc -Wall -O1 -I . -I out/i386-redhat7.3-linux -D NDEBUG -static \
-o tmp/i386-redhat7.3-linux/magic_elf/magic_elf_static \
pre/i386-redhat7.3-linux/magic_elf/magic_elf.c \
&& /bin/ls -l tmp/i386-redhat7.3-linux/magic_elf \
&& tmp/i386-redhat7.3-linux/magic_elf/magic_elf_static |
Output: out/i386-redhat7.3-linux/magic_elf/magic_elf_static
total 452
-rwxrwxr-x 1 alba alba 13499 Jan 8 23:08 magic_elf
-rwxrwxr-x 1 alba alba 441498 Jan 8 23:08 magic_elf_static
ELF |
Seems we found an easy way to fill up the hard disk. Anyway, what has gdb(1) to say about it?
Output: out/i386-redhat7.3-linux/evil_magic/static_main.gdb
[func=main]
0x80481e0 <main>: push ebp
0x80481e1 <main+1>: mov ebp,esp
0x80481e3 <main+3>: sub esp,0xc
0x80481e6 <main+6>: push 0x3
0x80481e8 <main+8>: push 0x8048001
0x80481ed <main+13>: push 0x1
0x80481ef <main+15>: call 0x804cd10 <write>
0x80481f4 <main+20>: mov eax,0x0
0x80481f9 <main+25>: leave
0x80481fa <main+26>: ret |
The function was called write before, it is called write now. Let's look what is behind the name.
Source: pre/i386-redhat7.3-linux/evil_magic/static_write.sh
#!/bin/bash
file=${1:-tmp/i386-redhat7.3-linux/magic_elf/magic_elf_static}
func=$( pre/i386-redhat7.3-linux/evil_magic/first_gdb_func.sed \
< out/i386-redhat7.3-linux/evil_magic/static_main.gdb )
/bin/echo "[func=${func}]"
pre/i386-redhat7.3-linux/magic_elf/gdb_core.sh ${file} ${func} \
| pre/i386-redhat7.3-linux/magic_elf/gdb_format.pl |
Output: out/i386-redhat7.3-linux/evil_magic/static_write.gdb
[func=write]
0x804cd10 <write>: push ebx
0x804cd11 <write+1>: mov edx,DWORD PTR [esp+16]
0x804cd15 <write+5>: mov ecx,DWORD PTR [esp+12]
0x804cd19 <write+9>: mov ebx,DWORD PTR [esp+8]
0x804cd1d <write+13>: mov eax,0x4
0x804cd22 <write+18>: int 0x80
0x804cd24 <write+20>: pop ebx
0x804cd25 <write+21>: cmp eax,0xfffff001
0x804cd2a <write+26>: jae 0x804d530 <__syscall_error>
0x804cd30 <write+32>: ret |
Above disassembly is not guaranteed to work. The names of symbols imported by libraries differ from one platform to the other, and from one compiler to the other. A more rational approach is to search the listing of all symbols for similar names and identical addresses.
Command: pre/i386-redhat7.3-linux/evil_magic/nm.sh
#!/bin/bash
# -p produces same output format on SunOS and GNU
/usr/bin/nm -p tmp/i386-redhat7.3-linux/magic_elf/magic_elf_static \
| /bin/grep '[^[:alnum:]]write\>' \
| /bin/sort |
Output: out/i386-redhat7.3-linux/evil_magic/nm
0804cd10 T __libc_write
0804cd10 W write
0804cd10 W __write
0804f7c4 T _IO_default_write
0805eeb0 T _IO_wdo_write
08061130 T _IO_new_do_write
08061130 W _IO_do_write
080613d0 T _IO_new_file_write
080613d0 W _IO_file_write
08061434 t new_do_write |
I suspect there is actually order behind the chaos. The symbol __write, with a varying number of leading underscores, seems to be "the real thing" on all platforms. The aliases for the value, 0x804cd10, differ a lot.
Command: pre/i386-redhat7.3-linux/evil_magic/gdb_nm.sh
#!/bin/bash
file=${1:-tmp/i386-redhat7.3-linux/magic_elf/magic_elf_static}
# \< and \> don't work on i386-freebsd4.7
func=$( /usr/bin/nm -p ${file} \
| /bin/sed -ne '/.*[tTwW] \(__*write\)/ {
s//\1/
p
q
}' )
/bin/echo "[func=${func}]"
pre/i386-redhat7.3-linux/magic_elf/gdb_core.sh ${file} ${func} \
| pre/i386-redhat7.3-linux/magic_elf/gdb_format.pl |
Output: out/i386-redhat7.3-linux/evil_magic/gdb_nm
[func=__write]
0x804cd10 <write>: push ebx
0x804cd11 <write+1>: mov edx,DWORD PTR [esp+16]
0x804cd15 <write+5>: mov ecx,DWORD PTR [esp+12]
0x804cd19 <write+9>: mov ebx,DWORD PTR [esp+8]
0x804cd1d <write+13>: mov eax,0x4
0x804cd22 <write+18>: int 0x80
0x804cd24 <write+20>: pop ebx
0x804cd25 <write+21>: cmp eax,0xfffff001
0x804cd2a <write+26>: jae 0x804d530 <__syscall_error>
0x804cd30 <write+32>: ret |
There are two man pages giving some overview of system calls, intro(2) and syscalls(2). /usr/include/unistd.h declares a traditional general purpose interface called syscall. Not all Linux system have man page syscall(2), though. Anyway, the statement mov eax,4 corresponds to the value of __NR_write in /usr/include/asm/unistd.h.