4. Dual use technology


Go not to the elves for counsel, for they will say both yes and no.

 J.R.R. Tolkien

This chapter introduces a framework for both scanners and first stage infectors, written in plain C. The code is split it into many parts that manipulate a central data structure. The idea is to replace some of these parts in later chapters to implement improvements and different infection methods. Here you will find only the most basic parts shared by all programs. One step closer to the edge continues with generic but ELF specific parts. Actual infection methods accompany their descriptions in separate chapters.

The code is not insertable itself. Doing it in C[ABC] describes what would be necessary. In the meantime the infectors insert and activate one chunk of code, an array of bytes prepared by Dressing up binary code.

Parts are shown in random order and without any #include statements or prototypes. Identifiers prefixed with TEVWH are defined in generated file config.h. See Variables and packages[ABC]. If you need full details I can only recommend a look into the sources of this document. Mirrors shows where to get it.

This code really wants to be C++. Some hardware I'm working on is just too slow for big compilers, though. The heavy use of macros is a consequence of not using C++, not the reason.

4.1. print_errno

A simple replacement of perror(3). If the first argument is -1 instead of errno(3) the function works just like printf(3).

4.2. Conditional output

printf(3) is a great debugging tool. I use it similar to assert(3). A single pre-processor definition deactivates them all. Or rather groups of them. The conventional way to completely remove function calls through the pre-processor is to define a parameterized macro that evaluates to nothing. But then printf takes a variable number of arguments. Recent C99 standard provides a mechanism to handle this. The gcc extension for the same purpose is much older but different (pronounce as "incompatible").

I chose obscure syntax instead.

4.3. trace_infector.h

4.4. trace_scanner.h

4.5. gcc-filter.pl

Using the comma operator to ignore code is rarely a desired behavior. For this reason gcc issues a special warning if either -W or -Wall is specified.

Superfluous gcc warning:

warning: left-hand operand of comma expression has no effect

I found no way to selectively switch this off. Instead the complete output of gcc is filtered with perl.

Source: src/one_step_closer/gcc-filter.pl
#!/usr/bin/perl -w

my $queue = '';
while (<>)
  $queue .= $_;

  next if (m/^In file included from /
  || m/^\s+from /
  || m/^\S+: In function /);

  if (m/: left-hand operand of comma expression has no effect$/)
  { $queue = ''; next; }

  print $queue; $queue = '';

4.6. cc.sh

This script is used to compile all infection examples. See Off we go[ABC].

Command: src/one_step_closer/cc.sh

	-I ./src/one_step_closer/${entry_addr} \
	-I ${TEVWH_OUT}/one_step_closer/${infection} \
	-o ${TEVWH_TMP}/${project}/${entry_addr}${infection}/infector \
	${main} 2>&1 \
| ./src/one_step_closer/gcc-filter.pl \

4.7. target.h

This is the central data structure used by all examples based on this framework. Not all struct members are used by all programs, though. And obviously scanners have no use for a byte array called infection.

4.8. check.h

None of these programs are ready for end-users. They are intended as a base for experiments and have a tendency to accumulate weird bugs. If everything is well, terse output is efficient. And when things get rough, exact details, including location in the source, are more important than nice presentation.

Macro CHECK evaluates an expression built from a relational operator and two operands. If this condition is false then a diagnostic message is written. The first argument of CHECK is prefixed with TRACE_ to build the name of the function that actually writes the message. Expected effective values are either print_errno or (void), just as defined by the TRACE_ macros in trace_infector.h.

The message itself consists of four lines, all prefixed with "CHECK:". First comes the name of the target file. This relies on having a variable t point to a structure of type Target. This is followed by the source file and the line number of the CHECK statement that failed. The third line is the source code of the condition. And the fourth line shows the actual result of left and right operand, formatted both as %ld and %#lx. The operands are evaluated only once and then assigned to local variables of type long inside a separate block.

In the body of a macro the character # works as "stringification" operator to macro arguments; it surrounds its operand with double quotes. Unfortunately # does not accept brackets around the operand. To maintain the precaution that arguments in macros are evaluated only inside brackets we need an intermediate macro, QUOTE_EXP.

The predefined macro __LINE__ evaluates to the current line number. Since # works only on macro arguments and not arbitrary definitions we need an intermediate macro again. Unfortunately a simple QUOTE_EXP(__LINE__) evaluates to "__LINE__". The intended result requires another level of mediation through QUOTE_NUM. It is not possible to stringify the value of symbols declared through enum or const int.

Source: src/one_step_closer/check.h
#define QUOTE_EXP(n)	#n
#define QUOTE_NUM(n)	QUOTE_EXP(n)

#define CHECK_BEGIN(trace, left, op, right, errno, type) \
  { type _left = (left); type _right = (right); \
    if (!(_left op _right)) { \
      TRACE_##trace(errno, \
      "CHECK: %s\n" \
      "CHECK: " __FILE__ "#" QUOTE_NUM(__LINE__) "\n" \
      "CHECK: (" QUOTE_EXP(left) ") " QUOTE_EXP(op) \
        " (" QUOTE_EXP(right) ")\n" \
      "CHECK: %ld " QUOTE_EXP(op) " %ld; %#lx " QUOTE_EXP(op) " %#lx\n", \
      t->clean_src, _left, _right, _left, _right);
#define CHECK_END	} }

#define CHECK(trace, left, op, right) \
  CHECK_BEGIN(trace, left, op, right, -1, long) return false; CHECK_END

#define CHECK_ERRNO(left, op, right) \
  CHECK_BEGIN(ERROR, left, op, right, errno, long) return false; CHECK_END

 * wrappers for write(2) and lseek(2) that handle errors
 * used only in infectors and only on fd_dst
#define CHECK_WRITE(buf, count) \
  CHECK_BEGIN(INFECT, count, ==, write(t->fd_dst, buf, count), \
    errno, long) return false; CHECK_END
#define CHECK_LSEEK(offset, whence) \
  CHECK_BEGIN(INFECT, -1, !=, lseek(t->fd_dst, offset, whence), \
    errno, long) return false; CHECK_END

4.9. main

Target file names are specified through stdin, not as command line arguments. Scanning crowded places like /usr/bin would hit the limit of xargs.

The differences between infectors and scanners show in target_action #1 and print_summary #1, but not in main. On success each implementation of target_action will output a single line per target. The file name used in that case is a bit simplified. If the value of environment variable TEVWH_TMP is the prefix of a target file name then this prefix is cut off from success reports. Anyway, this is relevant only to scanners since infectors don't read files in TEVWH_TMP.

Array stat is an ugly hack to gather arbitrary statistics without having to define a different interface for every program.

Source: src/one_step_closer/main.inc
int main()
  Target t;
  unsigned stat[5] = { 0 }; /* ugly hack to gather statistics */
  const char* tmp = getenv("TEVWH_TMP");
  size_t len_tmp = (tmp == 0) ? 0 : strlen(tmp);

  while(0 != fgets(t.src_file, sizeof(t.src_file), stdin))
    size_t len = strlen(t.src_file);
    if (len <= 0)
      continue; /* ignore empty lines */

    /* check that all lines fit in buffer and are terminated with '\n' */
    if (t.src_file[len - 1] != '\n')
      return -1;
    t.src_file[len - 1] = 0; /* we don't need the '\n', though */

    t.clean_src = (len_tmp > 0 && 0 == memcmp(tmp, t.src_file, len_tmp))
    ? t.src_file + len_tmp + 1
    : t.src_file;

    stat[0]++; /* total number of files */
    if (target_open_src(&t) &&
        target_is_elf(&t) && 
	target_get_seg(&t) && 
	target_action(&t, stat))
    { stat[1]++; /* number of succesful files */ }
  return print_summary(stat);

4.10. target_open_src

Modifying a file in place, as opposed to writing a copy, is possible but difficult. And between first and final modification contents of the target is invalid. Imagine a worst-case scenario of a virus infecting /bin/sh being interrupted through a power failure (or emergency shutdown of a hectic admin).

There are a few approaches to change a file while copying.

Using MAP_PRIVATE for argument flags of mmap(2) activates copy-on-write semantics. You can read and write as if you had chosen the read-in-one-go method, but the implementation is more efficient. Unmodified pages are loaded directly from the file. On low memory conditions these pages can be discarded without saving them in swap-space.

Source: src/one_step_closer/open_src.inc
bool target_open_src(Target* t)
  TRACE_DEBUG(-1, "target_open_src %s\n", t->src_file);

  /* target_close() needs a few clean values, initialize them first  */
  t->fd_dst = -1;
  t->image.v = 0;

  CHECK_ERRNO(0, <=, t->fd_src = open(t->src_file, O_RDONLY));
  CHECK_ERRNO((off_t)-1, !=, t->filesize = lseek(t->fd_src, 0, SEEK_END));

  CHECK(ERROR, t->filesize, >, sizeof(TEVWH_ELF_EHDR));
  t->aligned_filesize = ALIGN_UP(t->filesize);

  t->image.v = mmap(0, t->filesize, PROT_READ | PROT_WRITE, MAP_PRIVATE,
    t->fd_src, 0);
  CHECK_BEGIN(SCAN, t->image.v, !=, MAP_FAILED, errno, void*)
    return false;

  return true;

4.11. target_close