Any sufficiently advanced technology is indistinguishable from magic. | |
Arthur C. Clarke |
This document tries to cover multiple platforms through conditional compilation. There is a configure.pl that determines the host type and sets up a Makefile. The Makefile uses individual sub-directories for each platform and exports the name of these directories (and some other platform specific values) as environment variables. Most of the shell scripts invoked by make(1) are shown here. The following table should help to understand them.
Table 1. Environment variables exported by Makefile
Variable | Value on this platform |
---|---|
${ARCH} | i386 |
${CFLAGS} | -Wall -O2 -march=i586 |
${ELF_ALIGN} | 1000 |
${ELF_ADDRSIZE} | 32 |
${ASM_STYLE} | intel |
${ELF_BASE} | 08048000 |
${OBJDUMP} | /usr/bin/objdump |
${OUT} | out/i386-redhat-linux |
${READELF} | /usr/bin/readelf |
${TMP} | tmp/i386-redhat-linux |
${UNAME} | Linux |
For the first example I'll present the simplest piece of code that still gives sufficient feedback. Our aim is to implant it into /bin/sh. On practically every recent installation of Linux/i386 the following code will emit three magic letters instead of just dumping core.
Source: out/i386-redhat-linux/magic_elf/magic_elf.c
#include <unistd.h>
int main() { write(1, (void*)0x08048001, 3); return 0; } |
It is not an error that a file called magic_elf.c is located in a directory called out/i386-redhat-linux. The Makefile building this document did trivial pre-processing on the original source file. ELF is used on many architectures. And each has a different magic value.
Command: src/magic_elf/cc.sh
#!/bin/sh
gcc ${CFLAGS} ${OUT}/magic_elf/magic_elf.c \
-o ${TMP}/magic_elf/magic_elf \
&& ${TMP}/magic_elf/magic_elf |
Output: out/i386-redhat-linux/magic_elf/magic_elf
ELF |
The three letters are part of the signature of ELF files. Executables created by ld(1) are always mapped into the same memory region. That's why the program can find its own header at a predictable virtual address.
RTFM. [1] Just read all of Executable and linkable format.
0x8048000 is not a natural constant, but happens to be the default base address of ELF executables produced by ld(1). Option -Ttext ORG and --section-start SECTIONNAME=ORG of GNU ld should allow to change it, but I didn't get it working. Anyway, the layout of executables produced by ld(1) is straight forward.
One ELF header - Elf32_Ehdr
Program headers - Elf32_Phdr
Program interpreter (not if statically linked)
Code
Data
Section headers - Elf32_Shdr
Everything from the start of the file to the last byte of code is mapped into one segment (colloquially named "code" or "text") that begins at the base address. There is a whole chapter called readelf describing a command to view all these details. In the meantime I will show fancy ways to get by without.
What would you do if you knew nothing about ELF and just asked yourself how that example works? How can you go sure that the executable file really contains those three letters?
A good start for finding text in binary files is strings(1).
Command: src/magic_elf/strings.sh
#!/bin/sh
# without "-a -n 3" we don't get any output
strings -a -n 3 ${TMP}/magic_elf/magic_elf | grep -n ELF |
Output: out/i386-redhat-linux/magic_elf/strings
1:ELF |
The leading 1: is written by grep(1) and tells that our three-letter word is the first found string. This gives some help where we can find it in a hex dump. It is difficult to search strings in such a dump because of line breaks. Interactive tools like hexedit(1) or khexedit(1) might be useful.
The traditional tool for dumps is called od(1), the abbreviation for "octal dump". The classic version does provide hexadecimal output, but unfortunately not for single bytes. Option -x outputs "words", which are defined to be two bytes. This does not matter on big-endian machines like the sparc, but on i386 and alpha it is quite confusing.
On both Linux and SunOS od features option -tx1 to get byte-wise hexadecimals. FreeBSD's od has no equivalent. Another interesting option is -N count which reads no more than count bytes of input. Again it is not available on FreeBSD.
Command: src/magic_elf/od/Linux.sh
#!/bin/sh
od -N 16 -c ${TMP}/magic_elf/magic_elf | head -1
od -N 16 -x ${TMP}/magic_elf/magic_elf | head -1
od -N 16 -tx1 ${TMP}/magic_elf/magic_elf | head -1
od -N 16 -ta ${TMP}/magic_elf/magic_elf | head -1 |
Output: out/i386-redhat-linux/magic_elf/od
0000000 177 E L F 001 001 001 \0 \0 \0 \0 \0 \0 \0 \0 \0
0000000 457f 464c 0101 0001 0000 0000 0000 0000
0000000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
0000000 del E L F soh soh soh nul nul nul nul nul nul nul nul nul |
hexdump(1) features user defined formats. It is available on Linux and FreeBSD, but not SunOS.
Source: src/format.hex
"%04.4_ax " 8/1 "%02x " " " 8/1 "%02x "
" " 16/1 "%_p" "\n" |
Source: src/magic_elf/hexdump.sh
#!/bin/sh
hexdump -f src/format.hex \
< ${TMP}/magic_elf/magic_elf \
| head |
Source: out/i386-redhat-linux/magic_elf/hexdump
0000 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 .ELF............
0010 02 00 03 00 01 00 00 00 00 83 04 08 34 00 00 00 ............4...
0020 5c 29 00 00 00 00 00 00 34 00 20 00 06 00 28 00 \)......4. ...(.
0030 1d 00 1a 00 06 00 00 00 34 00 00 00 34 80 04 08 ........4...4...
0040 34 80 04 08 c0 00 00 00 c0 00 00 00 05 00 00 00 4...À...À.......
0050 04 00 00 00 03 00 00 00 f4 00 00 00 f4 80 04 08 ........ô...ô...
0060 f4 80 04 08 13 00 00 00 13 00 00 00 04 00 00 00 ô...............
0070 01 00 00 00 01 00 00 00 00 00 00 00 00 80 04 08 ................
0080 00 80 04 08 88 04 00 00 88 04 00 00 05 00 00 00 ................
0090 00 10 00 00 01 00 00 00 88 04 00 00 88 94 04 08 ................ |
At this point we can guess that file offset 1 and 0x8048000 + 1 are not coincidental. A test program might help.
Source: out/i386-redhat-linux/magic_elf/addr_of_main.c
#include <stdio.h>
int main()
{
printf("# sizeof(unsigned long)=%d\n", sizeof(unsigned long));
printf("# 08048000=%#02x\n", *(unsigned char*)0x08048000);
printf("# 08048001=%.3s\n", (char*)0x08048001);
printf("main=%p\n", main);
printf("ofs=%lu\n", (unsigned long)main - 0x08048000);
return 0;
} |
Output: out/i386-redhat-linux/magic_elf/addr_of_main
# sizeof(unsigned long)=4
# 08048000=0x7f
# 08048001=ELF
main=0x8048400
ofs=1024 |
Looks good. The byte at address 0x8048000 + 0 is equal to that at file offset 0. And 0x8048400 is a plausible address of function main.
Source: out/i386-redhat-linux/magic_elf/other_perl.pl
#!/usr/bin/perl -w
syscall 4, 1, 0x08048001, 3 |
Output: out/i386-redhat-linux/magic_elf/other_perl
ELF |
Source: out/i386-redhat-linux/magic_elf/other_exe.sh
#!/bin/sh
dd if=/proc/self/exe bs=1 skip=1 count=3 2>/dev/null |
Output: out/i386-redhat-linux/magic_elf/other_exe
ELF |
Command: out/i386-redhat-linux/magic_elf/other_mem.sh
#!/bin/sh
skip=$( echo "ibase=16; 08048001" | bc )
dd if=/proc/self/mem bs=1 skip=${skip} count=3 2>/dev/null |
Output: out/i386-redhat-linux/magic_elf/other_mem
ELF |
[1] |