I had the thought one day if one could side channel DNS lookup times in browser JS. This could theoretically determine local cache status of a domain and give us some insight into a user’s browsing or even network history. This would be a violation privacy I was eager to try. Off the bat I thought of something simple, like this:
While I’m using a websocket for the DNS test, it doesn’t require the site to support websockets – in fact Im banking that it doesn’t so I can simply measure the time it takes to fail. I expect the first websocket try to take a few milliseconds (DNS resolve + websocket failure), but next time (since the domain should be in cache), will it be faster?
No. Well, understandably, there is a lot of overhead dirtying the signal – a web socket being constructed, an attempted request to the site through a bunch of unpredictable routers, a failed websocket response, etc.
So lets clean up this signal by getting it to fast fail. I found we can fast fail the websocket by supplying a bogus port to the websocket, and found it worked out beautifly, because Chrome does a DNS lookup of the domain before it actually sanitizes the port for validity – and then fail.
No Websocket connection attempt, but DNS request for domain was sent.
This means, we basically turned Chrome into a pure DNS resolver with no/low overhead for our side channel signal. So now that Im getting clean DNS lookups, surely we should see a difference between cached and non-cached DNS lookups? ..Yes and no
From the initial “not-in-cache” to the “in-cache” timing it looked promising, but then it took an illogical turn. I expect all subsequent attempts to be fast.. So whats going on here? I admit I quit..but then came back to it, because it didn’t make sense. There are things I looked into behind the scenes to ensure how much it didn’t make sense that I will not go over here, but upon returning to the project I looked more into Chromium open source code (which is what Chrome uses). I found they are consulting a “back off” algorithm upon each Websocket lookup in WebsocketDispatchHost.cc
Perhaps this exists to avoid side channels like this or anti-DDOS? What this is doing, is returning how many milliseconds the websocket should sleep before finishing the websocket function, which is a major wrench in this side channel since its punishing us with complete arbitrary delay time. Actually, its not arbitrary, its something we can control.
If we ensure the ratio of failed websocket connections to succeed websocket connections is 1 (for the page session) — and no pending connections.., then the backoff algorithm wont hurt us, because the “std::min” will take our 1 as the “punishment weight”.
We are still subjected to the “RandInt” function, but if we can control our end, the RandInt function wont hurt us. Below I calculated the worst time drift we can have.
not bad at all…
Now all I do is interleave my failed websocket requests with successful websocket requests (to a websocket test server: websocket.org/etc).
Clean signal with interleaved sockets:
*cleaned OS and Chrome’s DNS cache*
Run function with domain not in cache – 41ms time
We see domains in cache now (right chrome://net-internals) – 4ms
Pretty darn cool, looks like we can safely assume a visitor has visited a site (or at least domain) if the failure time is <10ms? seems to be a safe cut off from my repeated tests and research. The whole point of defeating the backoff algorithm is not just understanding what was going on when I was testing, but also now allows for multiple domain look ups per session without being punished.
10 ms is an arbitrary cut off for determining if cached, how can that be universal if some machines are faster/slower?
If this were to be used, you should add a true negative and a true positive for cache test. This can be done by first measuring the current domain the visitor would be on (guaranteed to cached) – lets say its 45ms (true positive). Then fetch a domain there is no way they have in cache like XMks8732asdm.com – lets say its 180ms (true negative), then incorperate that data into your decision via standard deviation for each case. This should add some certainty to the mix.
One of the obvious shortcomings with this technique you may have already wondered: arent we putting the domain in cache when we attempt a look up?
Yes we are..and in that case its kind of a single shot musket type weapon. However false positives can be prevented through browser fingerprinting and only attempting another lookup on user after you feel the TTL has expired (whether it be hours/days).