Showing posts with label mathematics. Show all posts
Showing posts with label mathematics. Show all posts

Sunday, September 09, 2007

Reverse engineering a compiler-produced artifact

In our training, Pedram and I deal with some very simple compiler optimizations or artifacts. Although they represent the same semantics that the programmer defined in the original source code, those optimizations are sometimes cumbersome to convert back to a meaningful high-level representation.

The other day I was just studying a piece of code and bumped into a relatively common pattern. The code I was looking at was supposed to represent a division of a function's argument by a constant. But in the disassembled code I was studying I could only see a multiplication. This can be slightly confusing unless one has seen a bit more assembly than what is healthy and remembers some of the compiler-produced fun that goes on...

A couple of things to remember:
  • Compilers love to work with multiples of 2. The processor can can just shift registers left and right (shifting is incredibly fast, that is moving the contents of a register left or right padding with o or 1 as appropriate). Shifting to the left for multiplication by 2 and towards the right for division by 2 (this is akin to having a number in base 10 and multiplying by 10 by adding zeros to the right and dividing by by removing the rightmost digit).
  • Compilers hate to use the division instruction. The division takes a lot of steps, or cycles, for the CPU to complete. Hence they will avoid to use it at all cost.

The code looked like this:
(irrelevant interleaved code left out)

mov ecx, [esp+4+arg_4]
mov eax, 66666667h
imul ecx
sar edx, 3


  • In the snippet we can see function argument being multiplied by 0x66666667, and the result being stored as a 64 bit value in EDX:EAX (topmost 32 bits in EDX, the lower 32 in EAX)
  • Then the top 32 bits are shifted ("arithmetically") to the right. That is, divided by 2 thrice, same as 2^3 = 8. Effectively dividing the value by 8.
  • But the division is applied only to the top 32 bits, ignoring the lower 32. That could be understood to also mean that, by taking the topmost 32 bits and ignoring the bottom ones, the result of the multiplication is implicitly being divided by 2^32. (That's only guessed by the subsequent usage of the value just obtained, there's never again a reference to the lower 32bits, so I assume that they are discarded)


What do we have so far?

[ (Value * 0x66666667) / 2^32 ] / 2^3 ]

But, what's that 0x66666667? why to multiply by something so large and then divide?
The reason is that such computation allows the processor to keep most of the precision of the division it is trying to perform, still obtaining an integer in the end but without having to resort to using floating point arithmetic (which is far slower)

Let's do an example in base 10. Imagine that you only can multiply and divide by 10 (shifting numbers left and right) and we want to divide a number by 30. By shifting we can only divide by 10, 100, 1000, etc

But we have that: Value/30 = value * 1/3 * 1/10

Given that, represented as an integer, 1/3 would produce 0 we can "scale" it by multiplying by a large constant that later, once we are done, we divide by to get the value we're after. Given that the easiest for us is to multiply/divide by 10, we can "scale" 1/3 and make it 100000/3 which approximately equals 33333, which is a nice integer value. We would want to make this value as large as it fits in our registers in order to be as precise as possible. The bigger it is the more precision it will retain for subsequent operations.

Value/30 = ( Value*33333 ) / 1000000

Hence, we now have a clue now of where that 0x66666667 value might be coming from. Given that the processor works in base 2. We can assume that it's going to prefer multiples of 2. Also, given that it will try to obtain the largest value that fits in a 32bit register, that gives us an idea of the range of the power-of-two in use. We can get there with a bit of trial and error (We want to obtain an integer as a result of dividing a power of two by 0x66666667).

2.0^33/0x66666667 = 4.9999999982537702 ~= 5

Therefore:

0x66666667 ~= 2^33/5

So, in the end we get to

( [ (Value * 2^33)/5] /2^32 ) / 2^3

And with some algebra it simplifies to:

Value / (5*2^2) = Value/20

Effectively dividing the value by 20, without actually using the division instruction. That's to the extent that compilers will go to avoid using the division instruction...

Reverse engineering is fun isn't it?

Update: Given that this is a relatively old and well known optimization strategy it's only natural that it had been discussed before. It was just brought to my attention that Ilfak had blogged about a similar optimization and this chapter (PDF) from Hacker's Delight provides more details.

Friday, July 06, 2007

Scanning data for entropy anomalies II

Recently Phantal (aka Brian) left some comments on my blog and in OpenRCE on some calculations he did following up on my post Scanning data for entropy anomalies.

He develops the algorithm aiming at improving the execution speed of the entropy "scanner" example I had shown. I ran through his steps and arrived to the same conclusions he did on his latest comment. I just thought it'd be worth showing his work as a separate post rather than just a comment.

His idea is, by looking into the standard definition of entropy , to isolate all that doesn't change in the expression when the window slides and just update the entropy, instead of blindly recalculating it's value from scratch for each offset of the scan window.

Shannon' s entropy, usually represented as H, takes the following form if we work with the 256 possible byte values as the symbols :



where p(b) is the probability of the occurrence of a given byte.

H, the entropy, will tend towards its maximum value, 8, if the data has the maximum possible entropy. In such case the probability of each byte occurring would be the same which produces

Note that, although this is usually thought of as measuring the "amount of randomness", it is not that much the case. A sequence of bytes starting at 0 and increasing until 255 going through all the values in order would reach the maximum entropy value 8, even that it is all but random.

The probability of a given byte appearing in our window can also be expressed as . being the number of times the byte appears within the window and the width of the window.

The expression for the entropy can be expanded as follows


The entropy after sliding the window, , will have the same sum expansion except for two terms, the ones of the bytes going out and entering the window. We can then just update those and recalculate the expression by first removing the old values for the incoming and outgoing bytes and then adding the new values for both, after updating their count.



and that's all. Now on to some implementations in Mathematica and Python (but creating a Mathematica function with Pythonika). His implementation in C can be found in the comments of the previous post.



EntropyScan = Function[{Data, WindowScanSize},

  SummationTerm [Prob_] := If[Prob > 0, Prob Log[2, Prob], 0];

  (* Get the initial chunk and calculate the entropy *)
  CurrentChunk = Data[[ Range[1, WindowScanSize] ]];

  (* Calculate initial byte count and probabilities *)
  ByteCounts = Table[Count[CurrentChunk, i - 1], {i, 1, 256}];
  ByteProbs = Table[ByteCounts[[i]]/WindowScanSize, {i, 1, 256}];

  FilteredByteProbs = Select[ByteProbs, # > 0 &];
  H = - Total[
    Table[FilteredByteProbs[[i]] Log[2, FilteredByteProbs[[i]]],
    {i, 1, Length[FilteredByteProbs]}]];
  Entropies = {H};

  (* Slide the window and recalculate for incoming and outgoing bytes *)
  For[offset = 1, offset + WindowScanSize <= Length[Data], offset++,

    (* Get incoming and outgoing bytes *)
    ByteOut = Data[[offset]] + 1;
    ByteIn = Data[[offset + WindowScanSize]] + 1;

    (* Get the old probabilities *)
    OldValByteOut = SummationTerm[ByteProbs[[ByteOut]]];
    OldValByteIn = SummationTerm[ByteProbs[[ByteIn]]];

    (* Update counters and values *)
    ByteCounts[[ByteOut]]--;
    ByteCounts[[ByteIn]]++;
    ByteProbs[[ByteOut]] = ByteCounts[[ByteOut]]/WindowScanSize;
    ByteProbs[[ByteIn]] = ByteCounts[[ByteIn]]/WindowScanSize;

    (* Get the new probabilities *)
    ValByteOut = SummationTerm[ByteProbs[[ByteOut]]];
    ValByteIn = SummationTerm[ByteProbs[[ByteIn]]];

    (* Update the entropy *)
    H = H + OldValByteOut + OldValByteIn - ValByteIn - ValByteOut;

    Entropies = Append[Entropies, H];
  ];

  Entropies
];





Py["import math"]

EntropyScanPython = PyFunction["\<
def entropy_scan(args):
  data = args[0]
  window_size = float(args[1])

  summation_term = lambda p: p*math.log(p,2) if p>0 else 0

  current_chunk = data[:int(window_size)]
  byte_counts = [
    len(filter(lambda a:a==i, current_chunk))
    for i in range(256)]
  byte_probs = [byte_counts[i]/window_size for i in range(256)]

  H = -sum(
    [byte_probs[i]*math.log(byte_probs[i], 2)
      for i in range(256) if byte_probs[i]>0])
  entropies = [H]

  for offset in range(len(data)-window_size):
    byte_out, byte_in = data[offset], data[int(offset+window_size)]

    old_val_byte_out = summation_term(byte_probs[byte_out])
    old_val_byte_in = summation_term(byte_probs[byte_in])

    byte_counts[byte_out] -= 1;
    byte_counts[byte_in] += 1;
    byte_probs[byte_out] = byte_counts[byte_out]/window_size;
    byte_probs[byte_in] = byte_counts[byte_in]/window_size;

    val_byte_out = summation_term(byte_probs[byte_out])
    val_byte_in = summation_term(byte_probs[byte_in])

    H = H + old_val_byte_out + old_val_byte_in - val_byte_out - val_byte_in
    entropies.append(H)
  return entropies
\>"];