One possible way of achieving it is to use the TLS directory entry of the PE file format's headers. TLS stands for Thread Local Storage and it's meant to be used to allocate storage for thread-specific data. The TLS structure, IMAGE_TLS_DIRECTORY, pointed to by the TLS directory entry has a small number of fields. The one of special interest is the one pointing to a list of callbacks, AddressOfCallBacks.
During the class I got asked how one would implement the functionality required to get code to run, called from the TLS callbacks. I had a rough an idea how the implementation would go but had never tried implementing it myself. So before the training was over I started to look into it and finally, a few days later, got it to work.
So, in order to put this together the first thing I did was to dig up the definition of the IMAGE_TLS_DIRECTORY structure.
| typedef struct _IMAGE_TLS_DIRECTORY { UINT32 StartAddressOfRawData; UINT32 EndAddressOfRawData; PUINT32 AddressOfIndex; PIMAGE_TLS_CALLBACK *AddressOfCallBacks; UINT32 SizeOfZeroFill; UINT32 Characteristics; } IMAGE_TLS_DIRECTORY, *PIMAGE_TLS_DIRECTORY; |
Then I started hacking it together with a hexeditor, editing a harmless test PE file in order to have a TLS directory entry that would point to my manually hex-crafted structure.

I then added some placeholder code
| 90 90 90 C2 0C 00 | nop nop nop retn 0x0c |
which would handle the stack as a TLS callback is expected to
| typedef void (MODENTRY *PIMAGE_TLS_CALLBACK) ( PTR DllHandle, UINT32 Reason, PTR Reserved ); |
and pointed to it from the callback list I created. I then pointed the callback field, AddressOfCallBacks, in the TLS structure to my callback list.

And everything should be fine according to the plans... but nope!! and this is where I was stuck for a few days. The best I could do was to get my TLS callback code run on program unload, but never before the entry point was given control. Puzzling...
I dug out a file with a working TLS. Ilfak wrote a while ago about TLS here and had a nice, small example. (And, by the way, IDA does read and handle TLS just fine and marks them as entry points as well. Very convenient!)
I was set to get mine working, so I started looking at what his was doing different from mine (besides him just being sane and not doing it with a hex-editor ;-) )
I took a look at all the PE headers but none of the differences seemed to have anything to do with my sample not working.
I started to grow slightly uncomfortable and decided to bring in the artillery. Taking a look at how the windows loader (residing in NTDLL.DLL) handles both files and seeing what's affecting my TLS callback not being called should help. So I brought up BinNavi and traced the execution path of both binaries being loaded by Windows.
First thing was to trace the execution of Ilfak's example, I wanted to see all functions visited in the windows loader as his executable was being loaded. The TLS callbacks would have to be called by one of these.

I then recorded the execution path of my test executable and took a look at what functions were being visited in both traces. (All the nodes in the following graph are visited by the working example, the green ones are the ones visited by mine, so there's a lot of superfluous code I can skip looking at)

Eventually spotted a function called when processing both binaries, _LdrpRunInitializeRoutines, that looked like a good candidate to be the one calling the TLS callbacks and took a look at the execution traces within that specific function.
In the following graph each node represents a basic block, the red one is where the TLS callbacks are called from. That's the node reached in the working example but not in mine. The green nodes are all the ones visited in the case the execution flow reaches the red basic block. The darker ones are the execution trace of my test. Hence I need to figure out which conditions are diverting the execution flow and how they are related to things I could change in my test program.

Now I could see the common parts of the execution path and a couple of branches that were taken differently. Given the visual output, it's extremely easy to see what branches were different and I could now check what affected the flow.
The TLS callbacks were ran immediately after the following condition

Which, tracing it back, comes for an initial check at the beginning of the function

The following article helped me when I was trying to figure out what was going on. According to it, the function _LdrpClearLoadInProgress returns the number of DLLs currently loaded. That's the value that gets assigned to the variable that gets compared to zero and makes the flow of my test program diverge from Ilfak's working example. Therefore TLS callbacks only get run when a given amount of DLLs have already been loaded and that was the reason my test didn't run on load... I only needed to add one mode DLL to the import table for it to work. Fortunately it was easy to spot with BinNavi.
Thank you to cailin for the proofreading.


8 comments:
FYI: virii use TLS' entry points too.
"comparing execution paths..." respect.
Could you elaborate a bit more on that please?
According to Symantec the virus W32.Shrug was the first know to use TLS
I don't know if you have read this but here's an interesting article discussing the Shrug aka Chifton Virus LINK
http://www.amazon.com/Computer-Virus-Research-Defense-Symantec/dp/0321304543/ref=sr_1_1/002-1556081-5220863?ie=UTF8&s=books&qid=1179727152&sr=8-1
using TLS as entry point. just a quick mention - not sure if it is worth to buy the book just to read about it :)
Also, FYI, the Windows loader (which IMHO is as brainded as it can be and tries to load anything that even looks like PE without performing validation - probably in the name of "backwards compatibility") executes TLS even if in the Data Directory the size is specified as zero, but IDA doesn't show it in this case (of course you can always patch it with a hex editor ;-)).
Thanks for the info. I didn't know that the windows loader was so forgiving in that case.
Thanks for sharing this information. I was searching for the explanation of this behavior for 2 days. I thought that my program had the bug.
Post a Comment