| Go not to the elves for counsel, for they will say both yes and no. | |
| J.R.R. Tolkien | 
This chapter introduces a framework for both scanners and first stage infectors, written in plain C. The code is split it into many parts that manipulate a central data structure. The idea is to replace some of these parts in later chapters to implement improvements and different infection methods. Here you will find only the most basic parts shared by all programs. One step closer to the edge continues with generic but ELF specific parts. Actual infection methods accompany their descriptions in separate chapters.
The code is not insertable itself. Doing it in C[ABC] describes what would be necessary. In the meantime the infectors insert and activate one chunk of code, an array of bytes prepared by Dressing up binary code.
Parts are shown in random order and without any #include statements or prototypes. Identifiers prefixed with TEVWH are defined in generated file config.h. See Variables and packages[ABC]. If you need full details I can only recommend a look into the sources of this document. Mirrors shows where to get it.
This code really wants to be C++. Some hardware I'm working on is just too slow for big compilers, though. The heavy use of macros is a consequence of not using C++, not the reason.
A simple replacement of perror(3). If the first argument is -1 instead of errno(3) the function works just like printf(3).
Source: src/one_step_closer/print_errno.inc
| void print_errno(int error, const char* msg, ...)
{
  va_list va;
  if (error != -1)
    fprintf(stderr, "(%d) %s\n", error, strerror(error));
  va_start(va, msg); 
  vfprintf(stderr, msg, va);
  va_end(va); 
} | 
printf(3) is a great debugging tool. I use it similar to assert(3). A single pre-processor definition deactivates them all. Or rather groups of them. The conventional way to completely remove function calls through the pre-processor is to define a parameterized macro that evaluates to nothing. But then printf takes a variable number of arguments. Recent C99 standard provides a mechanism to handle this. The gcc extension for the same purpose is much older but different (pronounce as "incompatible").
Example: Macros with variable number of arguments
| /* gcc extension */ #define eprintf(format, args...) fprintf(stderr, format , ## args) /* new C99 standard */ #define eprintf(...) fprintf(stderr, __VA_ARGS__) | 
I chose obscure syntax instead.
A type name surrounded by brackets is a cast. Casting an expression to void discards the result. It is not even evaluated if the expression is free of side effects and the compiler is reasonably smart.
A more common use of brackets is to surround an expression without changing its value.
Commas are also used for a lot of things. One of the strangest is as a comma operator. It evaluates both expression to the left and to the right, returning the value of the right. >
Example:
| printf("The answer is %d\n", 42);
(void)("The answer is %d\n", 42); | 
Source: src/one_step_closer/trace_infector.h
| #define TRACE_ERROR	print_errno
#define TRACE_SCAN	(void)
#define TRACE_INFECT	print_errno
#define TRACE_DEBUG	(void) | 
Source: src/one_step_closer/trace_scanner.h
| #define TRACE_ERROR	print_errno
#define TRACE_SCAN	print_errno
#define TRACE_INFECT	(void)
#define TRACE_DEBUG	(void) | 
Using the comma operator to ignore code is rarely a desired behavior. For this reason gcc issues a special warning if either -W or -Wall is specified.
Superfluous gcc warning:
warning: left-hand operand of comma expression has no effect
I found no way to selectively switch this off. Instead the complete output of gcc is filtered with perl.
Source: src/one_step_closer/gcc-filter.pl
| #!/usr/bin/perl -w
my $queue = '';
while (<>)
{
  $queue .= $_;
  next if (m/^In file included from /
  || m/^\s+from /
  || m/^\S+: In function /);
  if (m/: left-hand operand of comma expression has no effect$/)
  { $queue = ''; next; }
  print $queue; $queue = '';
} | 
This script is used to compile all infection examples. See Off we go[ABC].
Command: src/one_step_closer/cc.sh
| #!/bin/sh
project=${1:-one_step_closer}
entry_addr=${2:-e1}
infection=${3:-i1}
main=${4}
${TEVWH_PATH_CC} ${TEVWH_CFLAGS} \
	-I ./src/one_step_closer/${entry_addr} \
	-I ${TEVWH_OUT}/one_step_closer/${infection} \
	-o ${TEVWH_TMP}/${project}/${entry_addr}${infection}/infector \
	${main} 2>&1 \
| ./src/one_step_closer/gcc-filter.pl \
| ${TEVWH_PATH_FMT} -s | 
This is the central data structure used by all examples based on this framework. Not all struct members are used by all programs, though. And obviously scanners have no use for a byte array called infection.
Source: src/one_step_closer/target.h
| /* align up to multiple of 16, will take at most 15 bytes */
#define ALIGN_UP(n)	(((n) + 15) & ~15)
#define SELF		((TEVWH_ELF_EHDR*)TEVWH_ELF_BASE)
#define SELF_PHDR	((TEVWH_ELF_PHDR*)((char*)SELF + SELF->e_phoff))
#define SELF_SHDR	((TEVWH_ELF_PHDR*)((char*)SELF + SELF->e_shoff))
typedef enum { false, true } bool;
/* the chunk of code to insert */
extern const unsigned char infection[];
typedef struct
{
  char src_file[PATH_MAX]; /* PATH_MAX is from limits.h */
  const char* clean_src; /* pointer into src_file */
  int fd_src; /* opened read-only */
  int fd_dst; /* opened write-only */
  off_t filesize;
  off_t aligned_filesize;
  /* start of memory-mapped image, b means byte */
  union { void* v; unsigned char* b; TEVWH_ELF_EHDR* ehdr; } image;
  /* pointer to first program header (in image) */
  TEVWH_ELF_PHDR* phdr;
  /* pointer to first program header of type LOAD (in image) */
  /* must not be null, and phdr_code[1] is the data segment */
  TEVWH_ELF_PHDR* phdr_code;
  /* pointer to program header of type DYNAMIC (in image, can be null) */
  TEVWH_ELF_PHDR* phdr_dynamic;
  /* pointer to program header of type NOTE (in image, can be null) */
  TEVWH_ELF_PHDR* phdr_note;
  
  /* offset to first byte after code segment (in file) */
  TEVWH_ELF_OFF end_of_cs;
  TEVWH_ELF_OFF aligned_end_of_cs;
  /* start of host code (in file) */
  TEVWH_ELF_OFF original_entry;
} Target; | 
None of these programs are ready for end-users. They are intended as a base for experiments and have a tendency to accumulate weird bugs. If everything is well, terse output is efficient. And when things get rough, exact details, including location in the source, are more important than nice presentation.
Macro CHECK evaluates an expression built from a relational operator and two operands. If this condition is false then a diagnostic message is written. The first argument of CHECK is prefixed with TRACE_ to build the name of the function that actually writes the message. Expected effective values are either print_errno or (void), just as defined by the TRACE_ macros in trace_infector.h.
The message itself consists of four lines, all prefixed with "CHECK:". First comes the name of the target file. This relies on having a variable t point to a structure of type Target. This is followed by the source file and the line number of the CHECK statement that failed. The third line is the source code of the condition. And the fourth line shows the actual result of left and right operand, formatted both as %ld and %#lx. The operands are evaluated only once and then assigned to local variables of type long inside a separate block.
In the body of a macro the character # works as "stringification" operator to macro arguments; it surrounds its operand with double quotes. Unfortunately # does not accept brackets around the operand. To maintain the precaution that arguments in macros are evaluated only inside brackets we need an intermediate macro, QUOTE_EXP.
The predefined macro __LINE__ evaluates to the current line number. Since # works only on macro arguments and not arbitrary definitions we need an intermediate macro again. Unfortunately a simple QUOTE_EXP(__LINE__) evaluates to "__LINE__". The intended result requires another level of mediation through QUOTE_NUM. It is not possible to stringify the value of symbols declared through enum or const int.
Source: src/one_step_closer/check.h
| #define QUOTE_EXP(n)	#n
#define QUOTE_NUM(n)	QUOTE_EXP(n)
#define CHECK_BEGIN(trace, left, op, right, errno, type) \
  { type _left = (left); type _right = (right); \
    if (!(_left op _right)) { \
      TRACE_##trace(errno, \
      "CHECK: %s\n" \
      "CHECK: " __FILE__ "#" QUOTE_NUM(__LINE__) "\n" \
      "CHECK: (" QUOTE_EXP(left) ") " QUOTE_EXP(op) \
        " (" QUOTE_EXP(right) ")\n" \
      "CHECK: %ld " QUOTE_EXP(op) " %ld; %#lx " QUOTE_EXP(op) " %#lx\n", \
      t->clean_src, _left, _right, _left, _right);
#define CHECK_END	} }
#define CHECK(trace, left, op, right) \
  CHECK_BEGIN(trace, left, op, right, -1, long) return false; CHECK_END
#define CHECK_ERRNO(left, op, right) \
  CHECK_BEGIN(ERROR, left, op, right, errno, long) return false; CHECK_END
/*
 * wrappers for write(2) and lseek(2) that handle errors
 * used only in infectors and only on fd_dst
 */ 
#define CHECK_WRITE(buf, count) \
  CHECK_BEGIN(INFECT, count, ==, write(t->fd_dst, buf, count), \
    errno, long) return false; CHECK_END
#define CHECK_LSEEK(offset, whence) \
  CHECK_BEGIN(INFECT, -1, !=, lseek(t->fd_dst, offset, whence), \
    errno, long) return false; CHECK_END | 
Target file names are specified through stdin, not as command line arguments. Scanning crowded places like /usr/bin would hit the limit of xargs.
The differences between infectors and scanners show in target_action #1 and print_summary #1, but not in main. On success each implementation of target_action will output a single line per target. The file name used in that case is a bit simplified. If the value of environment variable TEVWH_TMP is the prefix of a target file name then this prefix is cut off from success reports. Anyway, this is relevant only to scanners since infectors don't read files in TEVWH_TMP.
Array stat is an ugly hack to gather arbitrary statistics without having to define a different interface for every program.
Source: src/one_step_closer/main.inc
| int main()
{
  Target t;
  unsigned stat[5] = { 0 }; /* ugly hack to gather statistics */
  const char* tmp = getenv("TEVWH_TMP");
  size_t len_tmp = (tmp == 0) ? 0 : strlen(tmp);
  while(0 != fgets(t.src_file, sizeof(t.src_file), stdin))
  {
    size_t len = strlen(t.src_file);
    if (len <= 0)
      continue; /* ignore empty lines */
    /* check that all lines fit in buffer and are terminated with '\n' */
    if (t.src_file[len - 1] != '\n')
      return -1;
    t.src_file[len - 1] = 0; /* we don't need the '\n', though */
    t.clean_src = (len_tmp > 0 && 0 == memcmp(tmp, t.src_file, len_tmp))
    ? t.src_file + len_tmp + 1
    : t.src_file;
    stat[0]++; /* total number of files */
    if (target_open_src(&t) &&
        target_is_elf(&t) && 
	target_get_seg(&t) && 
	target_action(&t, stat))
    { stat[1]++; /* number of succesful files */ }
    target_close(&t);
  }
  return print_summary(stat);
} | 
Modifying a file in place, as opposed to writing a copy, is possible but difficult. And between first and final modification contents of the target is invalid. Imagine a worst-case scenario of a virus infecting /bin/sh being interrupted through a power failure (or emergency shutdown of a hectic admin).
There are a few approaches to change a file while copying.
Use lseek(2), read(2) and write(2) to load pieces of the source into memory, patch them, and write them to destination. A lot of work. Can be really inefficient.
Use read(2) to get the whole source file in one go. Requires more memory. But then even the largest executable files have only a few MB.
Use mmap(2). In my humble opinion obviously the best way.
Source: src/one_step_closer/open_src.inc
| bool target_open_src(Target* t)
{
  TRACE_DEBUG(-1, "target_open_src %s\n", t->src_file);
  /* target_close() needs a few clean values, initialize them first  */
  t->fd_dst = -1;
  t->image.v = 0;
  CHECK_ERRNO(0, <=, t->fd_src = open(t->src_file, O_RDONLY));
  CHECK_ERRNO((off_t)-1, !=, t->filesize = lseek(t->fd_src, 0, SEEK_END));
  CHECK(ERROR, t->filesize, >, sizeof(TEVWH_ELF_EHDR));
  t->aligned_filesize = ALIGN_UP(t->filesize);
  t->image.v = mmap(0, t->filesize, PROT_READ | PROT_WRITE, MAP_PRIVATE,
    t->fd_src, 0);
  CHECK_BEGIN(SCAN, t->image.v, !=, MAP_FAILED, errno, void*)
    return false;
  CHECK_END
  return true;
} |