Hiding Your Process from Sysinternals

 

Malicious.exe running, but does not show up in Procexp.exe

ProcExp_POC.gif

I was researching ways to not just do anti-analysis, but to rather run executables in spite of analysis, while evading analysis. I found ways to go invisible from a few common analysis tools, but the final boss was Sysinternals Suite, no Administrative nor SEDebugPriveledge allowed for this challenge. I figure I’d share what I found here, which also ended up being a privilege escalation.

ProcExp’s “HiddenProc” Easter Egg

I first started with Procexp. Sifting through Procexp in IDA I thought I instantly found a promising section of code and was going to call it a day.

hideprocs

This routine searches for a MULTI_SZ registry value named “HiddenProcs”. If it exists, it will parse a list of new-line delimited process names and what I assume, later filter them out of the viewing. Unfortunately, the actual routine in charge of hiding these process names doesn’t exist (probably compiled out for release mode?). This is a dead registry key. Moving on.

ProcExp Image Hijack 1 (failed)

If we can hijack procexp, then we can control it to display whatever we want to the user. When you run Procexp32.exe (or other 32-bit Sysinternal tools) on a 64-bit OS, it will often dump a 64-bit version of itself to disk, and then run the 64 version instead. Is there a way we can image hijack this, post drop?

redgreen

What we are looking at is the drop and execute code from Procmon32. This routine is responsible for writing a 64-bit version of itself and executing if on 64-bit OS. From the top green node (Drop64bitProcExp function) to the bottom green node (CreateProcessW). There is a gap. If we can ensure ProcExp32.exe spends as much time in the red while we repetitively try to write to the dropped exe, can we intercept the image before CreateProcess is called?

Below I try this, with a simple POC to set my thread to TIME_PRIORITY_TIME_CRITICAL, while attempting to write my own code to the dropped exe. The goal is to get my quantum to get executed in between the 2 green nodes.

timepri.png

When running this program, and a user attempts to open ProcExp, we get the error below. Looks like it may have only wrote partial way for the Image Hijack. This is not deterministic enough, and too SCIFI.

ProcError

 

ProcExp Image Hijack 2 (success)

Investigating the file drop code, we see ProcExp doesn’t instantly exit if wfopen_s(“ProcExp64.exe”, “wb”) fails.

wbWRite.png

This will be its flaw. It ignores the fopen error as long as “GetFileAttributes” succeeds.

fopenFail

..so the image hijack is very simple. We just write out own “ProcExp64.exe” to the temp directory, and make it “READ ONLY”. That way, fopen(“ProcExp64.exe” ,”wb”) fails, but when trying GetFileAttributes it will succeed, and execution flow will lead us right to CreateProcess.

 

Here my process drops a fake version of ProcessExplorer under the %temp% dir as “PROCEXP64.exe”. Its marked read-only (you can try this at home actually). This is a simple program that just says “Hijacked” to the console.

readOnly.png

Now when we try to run Procexp.exe, it hits this flaw and instead runs our fake “PROCEXP64.exe”.

Hijacked.png

 

This shows a hijack is possible, but I think we can do better, since this is limited to 64-bit OS running 32-bit Sysinternals.

 

The DLL Hijack (final solution)

This is final method I choose for the POC. Looking at Sysinternal RegKeys, we see one called “DbghelpPath”. This Registry key is writable for most applications since its the USER registry hive.

DbgHelp.png

DbghelpPath key just points it to a path that it trusts to hold the “dbghelp.dll”, thats loaded at runtime. I am able to modify this, so dll hijack is just a step away . My binary sets this key to point to the %TEMP% dir while dropping its own dbghelp.dll in temp (%TEMP%/DbgHelp.dll). When Procexp is runs, it will load the dll in this path. Once i get it to load my dll we hook ProcExp routines to hide my process. You can see the main code for this in

https://github.com/RISCYBusiness/Jadoube/blob/master/AntiSysInternals/AntiSysInternals/Procexp.cpp

This involved reversing ProcExp logic. After doing so, I found the best point to target the hook – OpenProcess API.

We patch OpenProcess Api, since, it is called for each process it shows to the output. Its better if I target a hook here, rather than hardcoded offset. If I do the later, it will be more volatile to different versions of ProcExp (since offsets can change, code added/removed). So instead I patch OpenProcess API, because when ProcExp calls it, theres a dangling r14 register that happens to point to the ProcExp PROCESS linked list. This PROCESS linked list structure is less likely to change between versions.

Here we patch:

Patch.png

Here we hook, to hide our process from ProcExp linked list. This involved reversing the PROCEXP process linked list structure a bit, but once done, we have a filter for filtering processes. Below this POC filters processes named “Malicious.exe”.

ProcHook.png

End Result

ProcExp_POC.gif

One cool thing about this, is that nearly all SysInternal tools have this DbgHelp path writable key, so you can theoretically do this with all Sysinternal Suite tools.

 

 

Advertisements

Side channeling user DNS cache in Chrome

I had the thought one day if one could side channel DNS lookup times in browser JS. This could theoretically determine local cache status of a domain and give us some insight into a user’s browsing or even network history. This would be a violation privacy I was eager to try. Off the bat I thought of something simple, like this:

goodPort

While I’m using a websocket for the DNS test, it doesn’t require the site to support websockets – in fact Im banking that it doesn’t so I can simply measure the time it takes to fail. I expect the first websocket try to take a few milliseconds (DNS resolve + websocket failure), but next time (since the domain should be in cache), will it be faster?

failfail

No. Well, understandably, there is a lot of overhead dirtying the signal – a web socket being constructed, an attempted request to the site through a bunch of unpredictable routers, a failed websocket response, etc.

So lets clean up this signal by getting it to fast fail. I found we can fast fail the websocket by supplying a bogus port to the websocket, and found it worked out beautifly, because Chrome does a DNS lookup of the domain before it actually sanitizes the port for validity – and then fail.

fastfail.png

dnsLookedNo Websocket connection attempt, but DNS request for domain was sent.

This means, we basically turned Chrome into a pure DNS resolver with no/low overhead for our side channel signal. So now that Im getting clean DNS lookups, surely we should see a difference between cached and non-cached DNS lookups? ..Yes and no

wtfSocket.png

From the initial “not-in-cache” to the “in-cache” timing it looked promising, but then it took an illogical turn. I expect all subsequent attempts to be fast.. So whats going on here? I admit I quit..but then came back to it, because it didn’t make sense. There are things I looked into behind the scenes to ensure how much it didn’t make sense that I will not go over here, but upon returning to the project I looked more into Chromium open source code (which is what Chrome uses). I found they are consulting a “back off” algorithm upon each Websocket lookup in WebsocketDispatchHost.cc

backoff

Perhaps this exists to avoid side channels like this or anti-DDOS? What this is doing, is returning how many milliseconds the websocket should sleep before finishing the websocket function, which is a major wrench in this side channel since its punishing us with complete arbitrary delay time. Actually, its not arbitrary, its something we can control.

If we ensure the ratio of failed websocket connections to succeed websocket connections is 1 (for the page session) — and no pending connections.., then the backoff algorithm wont hurt us, because the “std::min” will take our 1 as the “punishment weight”.

ratio1.png

We are still subjected to the “RandInt” function, but if we can control our end, the RandInt function wont hurt us. Below I calculated the worst time drift we can have.

worstcase

not bad at all…

Now all I do is interleave my failed websocket requests with successful websocket requests (to a websocket test server: websocket.org/etc).

interleave

Clean signal with interleaved sockets:

POC1*cleaned OS and Chrome’s DNS cache*

POC2

Run function with domain not in cache – 41ms time

POC3

We see domains in cache now (right chrome://net-internals) – 4ms

Pretty darn cool, looks like we can safely assume a visitor has visited a site (or at least domain) if the failure time is <10ms? seems to be a safe cut off from my repeated tests and research. The whole point of defeating the backoff algorithm is not just understanding what was going on when I was testing, but also now allows for multiple domain look ups per session without being punished.

Questions:

10 ms is an arbitrary cut off for determining if cached, how can that be universal if some machines are faster/slower?

If this were to be used, you should add a true negative and a true positive for cache test. This can be done by first measuring the current domain the visitor would be on (guaranteed to cached) – lets say its 45ms (true positive). Then fetch a domain there is no way they have in cache like XMks8732asdm.com – lets say its 180ms (true negative), then incorperate that data into your decision via standard deviation for each case. This should add some certainty to the mix.

One of the obvious shortcomings with this technique you may have already wondered: arent we putting the domain in cache when we attempt a look up?

Yes we are..and in that case its kind of a single shot musket type weapon. However false positives can be prevented through browser fingerprinting and only attempting another lookup on user after you feel the TTL has expired (whether it be hours/days).

Violating Window’s file naming rules

Going to share an interesting behavior with the “CreateFileW” functions I found this week. Its quite simple, but results in an undeletable file with some other interesting traits.

This is done by creating a file name that Windows does not accept and fails to do operations on once created. As we know, Windows does not allow certain characters in path names, the ones we all know are:

2016-05-01_12-52-25.png

So the “CreateFileW” API call will fail if you even try to create a file with the above characters, however there are other file-naming rules not so black and white which Windows abides by. One of these is that Windows does not allow trailing space in a file name. This can be seen using CreateFileA (which is what is called when you right click and create a file in Windows). CreateFileA will properly remove trailing space characters before making the file. This filtering however does not occur in the CreateFileW API call using \\?\ to prepend the path.  By naming a file with the below syntax, Windows will create a Unicode filename with a trailing space. The \\?\ prefix sends the path straight to the file system with no preprosessing, typically used when trying to get around the MAX_PATH limitation of 255 characters.

2016-05-01_12-33-55

When trying to delete the file:

2016-05-01_14-44-31.png

Try rename the file to remove the trailing space – no luck:

2016-05-01_13-43-37.png

Checking the properties of the file:

2016-05-01_13-15-41.png

2016-05-01_13-16-31

Even trying CMD prompt with the filename, I was unable to delete the file. I had to end up using “del *” in order to delete the file (which likely traverses the directory with the Unicode version of deletefile to accomplish this).
This whole thing led to some other very interesting behavior, that perhaps Ill share a later time.

Windows vs Linux vs Android Vulnerabilties

This weekend I grabbed the CVE database for the most famous Operating Systems to plot the vulnerability distribution among the past ~20 years.

vuln.png

Make what ever sense you want to make out of it using the Interactive Graph @ http://www.riscy.business/OSVulns.html

 

  • “Linux” represents the Linux Kernel vulnerabilities and not the different subtle Linux flavors nor their applications.
  • “Windows” represents the OS itself and not their applications, which would provide a much larger attack surface if represented.

 

 

Buffer Overflow in CFF Explorer

First off I love CFF Explorer, its a solid Windows PE viewer I use extensively when researching malware. Naturally when you use a program extensively you may discover a bug or two. I found a buffer overflow vulnerability in the MultiByteToWideChar call that this app makes. I found it has already has been patched (a while ago) after I was 2 days into my development (another reason to always use the latest version of things, dont waste your exploit dev time). Since this probably isnt going anywhere, im cashing in this work as a blog post to help people who want to learn exploit development or even just understanding low level Windows.

A while back, I found odd behavior when opening a dumped exe from memory with invalid Import Tables. I noticed a crash happen in the app, which I made a mental note of to go back to and investigate once I had more time (next month). Next month happens, and I’m terribly disorganized and no longer have the binary that caused it so I thought this would also be a good opportunity to write a fuzzer to reproduce the error. I already had an idea of what causes the crash: strange import table. So I tailored my fuzzer to that, making it what they call a “Smart Fuzzer”.

Planning out the fuzzer code I first just develop a way to start CFF Explorer with an argument to an exe to load. This was harder than I thought… Looking at the shell registry I can see how it normally loads exes/dlls when you right click on and load:

reg

but NO! its not that simple. CFF Explorer uses 8.3 filenames and my file had a space in the path (https://en.wikipedia.org/wiki/8.3_filename), so once I corrected that I stopped banging my head against the wall. So now lets code the first few lines of the fuzzer:

createProc.png

Im creating a process of CFF Explorer with command line argument to start a binary in my User folder, doesnt matter what binary right now because we will later be modifying it to make new input.

To make the fuzzer fast we cant be creating proccess/terminating process everytime we want to give new input, so we need to reverse engineer the CFF Explorer application to create some sort of binding for fuzzing input. Lets see how binaries are loaded by CFF Explorer:

virtalloc.png

Here is after it opens the exe file as an argument…Its basically just allocates space (creating a new memory segment) and writing the bytes of the exe to it by passing the allocated space to ReadFile. The take away here for creating a fuzzer binding is: creating a new memory segment. This is good to know because now we have some sort of thing to look for in CFF Explorer’s memory and then modify since this is essentially the application’s input.

I use VirtualQueryEx to enumerate memory segments of CFF Explorer and return the spot that the exe is located in memory. I can know where this is by seeing what memory segment starts with “MZ” and is marked “RW” not “E” (meaning a PE file in a non-executable segment).

 

I now have to get a window handle of CFF explorer so I use EnumWindows to find the main window. This is necessary because after Reverse Engineering CFF Explorer, it turns out that the “import analysis” code (which is what I suspect crashes) is only ran if the user “clicks” on the module name in the GUI. Understandable..

importclick.png

This means if I change the loaded file while its running, I will need to trigger this “import analysis” code in the application every time I change something. I originally looked at getting a “raw” binding to “import analysis” function and call it directly. I avoided this for 2 reasons:

  1. I would have had to really understand and hunt for all the object pointers and fluff to correctly pass to the function and perfectly execute it (extremely increasing potential bugs caused by me)
  2. Contrary to what developers typically strive for..fuzzing is often (not always) better automating at higher levels (in this case, going through the Windows GUI). When you work to bypass too much fluff, you miss out on exploitable fluff.

So I simply need to find a way to get Windows Messages sent to this window to emulate the “click” or “select” of the module name in order to get the “import analysis” code to run

 

I do this by enumerating windows and finding “CFF Explorer.exe” that I started.enumwindow

I then enumerate its child windows to get a Window handle on the “Import Table” window (which is the viewport in the application that shows the module names you click on).

enumchild.png

 

So we now have a handle to this child window. This will allow us to send Windows Messages to it and make it think a user is clicking or selecting the module names. (Windows Messages is how the OS tells a window that input is happening to it. We will spoof this).

win32.png

We are sending “down arrow” commands to the child window, making it select each module in the grid layout one by one (triggering the “import analysis” code of CFF Explorer for each module name selected).

Now we have the ability to get the memory segment of the exe that CFF Explorer loads in memory and the ability to trigger “import analysis” calls on in an automated quick fashion. Now we need to FUZZ.

We read the bytes in CFF Explorer’s memory segment of the loaded exe file using ReadProcessMemory at the address we obtained using our VirtualProtect search.We read the entire sement which means we now have the same exact file CFF Explorer is looking at. Being that our area of interest is “Import Tables”, I just randomize the newly read buffer’s (exe) import tables.

fuzz

I randomize each  IMAGE_IMPORT_DESCRIPTOR objects in the buffer I read and then write it back to CFF Explorer’s memory that holds the exe its analyzing at.

This code is looped with the windows message spoofing so its constantly analyzing my new input I write to its process memory.

..Thats it. Just run it and see what we catch.

….an hour passed we got something: an app crash

crash.png

This is where we reverse engineer the bug we found to see what it is..

int3.png

An actual buffer overflow. We have the “crashable file” saved off by the fuzzer. Lets load it and see what went wrong….ok we see what went wrong and it was the “Name” field of the image_import_descriptor (which points to a function name). CFF Explorer is using MultiByteToWideChar on this Name field, but if the Name field is too large, the buffer passed to the function cant support the size. The name was too long because I generated bad address bytes that pointed the “Name” field to another part in the exe that was not null terminated (soon enough). This causes a buffer overflow when processed. Lets intentionally cause it now.

AAAA.png

Load the file, crash, debug:

AAoverflow.png

This is looking good. All we need to do is the classic “overwrite EIP” which is on the stack below (technically above) all my 41414141. Unfortunately we encounter:

canary.png

Looking at this, I can tell this is a “stack canary” aka “stack cookie” aka “/GS”.

Brief explaination: Stack canaries are used when programs compile with stack canaries. The compiler will modify all functions that do “buffer modifications” to have integrity checks before and after the function executes. The before part XORs the stack pointer with a value on the stack and after the function the value is compared with the original calculated result. It essentially does a stack integrity check that will be flagged if we overwrite it (at [rsp+360]) This is a problem for us because to overwrite EIP on the stack we need to get to [rsp+370]. The following will just throw ideas at getting past canaries:

  1. Check entropy of the stack canary. If the value that XORs rsp is predictable, then we can possibly overwrite [rsp+360] with the correct canary.. Lets log the entropy with some C code… We do this by finding where the cookie is calculated. I debugged the binary and found the global variable thats initialized with the calculated cookie (near program start). I just need to be able to read this location from memory each time I create/kill the CFF Explorer process so see if there is weak entropy on each time its calculated.

.cookies Good enough amount of entropy. While the first 2 bytes look predictable, the rest are not. However, they do have somewhat of a pattern (getting larger). Its out of scope to even look further into this. After seeing how the cookies are calculated they are using GetTickCount/etc as part of the calculation. Something I cant predict at runtime.

 

2. Corrupt things BEFORE [rsp+360]. Stack canaries ONLY protect memory corruption after the cookie. It doesnt know if you corrupt things before, and with 360 bytes of stack space, thats a lot of local variables I can modify and overwrite. While we are at the end of the function (meaning local variables are about to be discarded) corruption could still occur. I have seen function stacks created that do not initialize their local variables, meaning they will just use what ever is on the stack created at the time. If we can modify these bytes, its possible a following function would allocate stack space and use our buffer data that has since been uninitialized, resulting in unexpected behavior. Or it is possible that I can modify memory that will be passed back to the caller (either by reference or value).

I can quickly check #2 possibility by making my buffer of ‘A’ just short of [rsp+360]. This means we will have overwrote the stack, but not the canary. Meaning IF anything we overwrote is used, we will possibly know if it crashes again….do it….run it…..no crash 😦

I have heard of alternate results such as overwriting the SEH. If this is possible, then when the stack cookie violation occurs, it will look for a user defined exception to handle the exception, if you’re able to register one before you trigger the overflow you can make execution return to where ever you define your handler (sounds more like a CTF challenge).

 

At face value perhaps what this is is not RCE. You can develop the flaw in other ways. Such as a way to hide an executable’s import table from analysis. Or even DOS CFF Explorer from being able to be used on your binary. This is a good example to be cautious even when opening non-executable files on your host machine. You never know whats possible.

Cant decode a malicious Base64 string? Check for custom quirks!

No fluff: Found malware starting itself with a command line argument of:

aHR0cDovL5Rvd52sb5FkLmxldHYuY55FwaXhtbC9iYXJTcHJIYWQvYmFyU=ByZWFkXzMwMTAuaHRtbA33

Figured it was base64, but no decoder would give me plain text. Analyze the argument parser…oh thats why

trickyBase64.png

Character substitution with ascii ‘5’,’2′,’=’, and ‘3’….before proceeding with Base64 decode. Aparently, this ‘is a thing’. With the info learned, you should be able to decode the string. Till next time