Posted by Deliverator on 29th August 2008
Almost a month ago, I posted about a major new attack which Dan Kaminsky had found that would allow a attacker to poison a DNS server’s cache allowing the attacker to redirect traffic for an arbitrary domain. When Kaminsky first discovered the attack, a target server could be compromised in as little as 10 seconds. Patches to every major DNS server package have been released, but essentially all they do is increase the number of packets (and hence time) needed to compromise a server. A physicist named Evgeniy Polyakov in Moscow decided to find out just how much time the new patches bought. He connected two computers to a fully patched DNS server running the *nix standard BIND DNS server software over gigabit ethernet and let her rip. The answer, which serves as a sort of worst case scenario for a successful attack, was a mere 10 hours. While most attackers don’t exactly have a gigabit ethernet interface to an ISP’s dns server, it is possible that an attacker could plant a trojan within an ISP where it would be able to attack the dns server at LAN speeds. The combined bandwidth of a modern botnet is not to be underestimated as well. In short, even a fully patched system is potentially vulnerable.
While Polyakov explored a worst case scenario against a properly patched BIND system, not all the patches for major DNS server software provide even 10 hours of protection. Notably, Microsoft’s patch doesn’t do nearly as good job as others. One of the chief protections that the patch should provide is to randomize the UDP port used by the DNS server to request information from top level servers. From what I’ve read, Microsoft’s patched DNS server only makes requests from a range of 2500 some odd UDP ports, vs 64k+ possible ports for a patched BIND server. This makes it far easier for an attacker to compromise a Microsoft DNS server than a *nix one. Lets take a step back and explain why, by doing a simplified rough outline of how the attack works.
For an attack to succeed an attacker has to cause the DNS server to issue a request for information pertaining to whatever domain the attacker wishes to poison. The attacker induces the DNS server to make such a request by requesting information regarding a non-existent subdomain of the domain to be poisoned. The subdomain doesn’t exist, so the DNS server couldn’t possibly already have information stored in its cache about it. So, the attacker issues a request for somerandomfoo.google.com, for example. The attacker, knowing that the DNS server has just passed the request on, switches roles and pretends to be the responder and floods the DNS server with response packets. The forged response packets contain revised DNS servers for the domain that the attacker wishes to hijack. The packets would presumably set a very high TTL (time to live) on the bogus information, so that the ISP’s DNS server keeps it in its cache for a VERY long time. In my admittedly brief research, it seems like the maximum time as per RFC is 68 years, although it is unlikely that information would be cached for that length of time in any real world scenario. At the rate that basic underlying services like DNS are showing their cracks, I would be very impressed if the current DNS system lasts the next 6.8 years, much less 68. Still, being able to cause all of a major ISP’s subscribers to visit fake google when they type google.com in their browsers, would be extremely valuable to a malicious hacker, even if the cache poisoning only lasted for a few hours.
In order for a forged response packet to be accepted by the ISP’s DNS server, it has to have the correct transaction id. The transaction id is a pseudo-random 16 bit number that is generated by the DNS server when it makes a request. The server only accepts the response as genuine if the response packet contains the same transaction id# it used when it issued the request. It used to be that these transaction id #’s were issued in a decidedly non-random fashion…1, 2, 3, 4, etc. Which made it decided easy to guess what transaction id # the server would want next. DNS hijacking is far from new. Eventually, DNS server software wised up to the fact that it might be a good idea to randomize the transaction id to make it harder to guess. Still, there are only 2^16 possible values it can take on and you can flood a server with a LOT of packets. Another factor was needed to make DNS harder to hijack, WITHOUT fundamentally breaking the way DNS works. One way to add a factor is to issue each request from a unique UDP port#. Then an attacker not only needs to hit upon the right transaction id, but he has to deliver it to the right port. Up until recently, many DNS servers issued their requests from a static, unchanging port, or issued requests in sequence or in some other way which would allow an attacker to guess what port was going to be used to issue the next request. The primary thing the recent batch of patches to DNS software was designed to do is to increase the number of possible ports from which a DNS server might issue a request and increase the randomness by which a DNS server decides which port to use. Microsoft’s failure when compared to BIND’s implementation is that they only add 2500 ports worth of randomness while BIND adds some 64 thousand.
One problem that has cropped up with this source port randomization approach is that many DNS servers are behind some form of NAT router/firewall device, in order to shield them from attack. The problem is that many NAT routers de-randomize the outbound requests and issue them from sequential port numbers instead, effectively making a patched server no more effective in combating the attack than an unpatched server. Dan Kaminsky’s site has a neat video created by Clarified Networks from his source data, which graphically shows the patching of servers over time. One of many interesting trends which can be seen in the video is that while the number of patched servers goes up markedly over time to about 65% at the end of the video on 08/03/08, the numbers of patched servers behind some form of derandomizing NAT also goes up with time, so it appears that some ISPs are taking action in applying the patch, but don’t understand exactly what it does or how their infrastructure is effectively making the patch worthless. You can check out the video at the end of this post or download a much higher quality version.
I think the upshot of all this is that the patches currently going around might buy your ISP some time, but that it is not meant as a final solution to DNS cache poisoning. At best, I think the patch helps make it clear to a network admin at the ISP that an attack is underway (suddenly receiving tens of thousands of bad response packets is kinda a dead giveaway) and hopefully gives them enough time (assuming they have planned for it in advance) to do something about it.
Near term, people far smarter than I are hard at work trying to figure out additional techniques that can be used to foil a Kaminsky style attack on a DNS server, without breaking the existing DNS standard and putting a severe lurch in the internet as we know it. Unfortunately, many of the proposed ideas that have the virtue of staying within the standard would also increase DNS traffic, particularly on the root name servers, to such an extent that you are back to that minor issue of BREAKING THE INTERNET.
One such idea, known as debouncing, stays within the DNS specification. The idea of debouncing is to issue any query twice and if you get back two different responses (indicating that one of the responses was bogus), to keep on issuing the request (with a new transaction id and port #) until you have reached some statistical confidence level that the answer is correct. There is simply no way an attacker is going to be able to get two transaction id + source port hits in that rapid a sequence. Unfortunately, this would at minimum double the load on the internet’s root nameservers as well as increasing query latency. Apparently, there simply isn’t that degree of excess capacity in the system (which instantly makes me ask why such a critical piece of internet infrastructure runs that close to capacity).
Another idea, which has long been suggested as a way to eliminate DNS hijacking skulduggery is to switch DNS from using UDP to TCP. UDP is highly efficient for simple queries because it is a connectionless protocol. You blast out a request and the other end hopefully gets the message and blasts back a response. Total number of packets sent with UDP = 2. TCP on the other hand requires a minimum of 6 or possibly 7 packets, depending on how willing you are to trust that all parties have implemented the RFC defined version of TCP. With UDP, it is possible to forge the source IP address, as no attempt is made by the recipient to verify the sender of the packet before acting on the data. The ability to spoof arbitrary UDP packets is ultimately what makes an attack like this possible. An attacker can blast out UDP packets all day long, without the receiving server being able to easily block them, because the attacker’s true IP isn’t in a single one of those packets. With TCP, however, the client and server must do a “three way handshake” before exchanging data, so it is pretty much impossible for joe random hacker to easily pretend to be someone else. Unfortunately, this method would increase traffic on the root nameservers even more than debouncing.
The neatest near term fix is being termed the 0x20 hack. It has the virtue of staying within existing DNS specifications AND not breaking the internet. Yay! The 0x20 hack relies on the fact that while the case of character in a domain name is not deemed to be significant, it is preserved during queries and replies. I.e. www.google.com is not considered to be different than wWw.GOOgle.COM. The idea as to how this applies in our case to spoil Joe Hacker’s day is as follows.
1. Hacker makes a request to the ISP’s DNS server for somebogussubdomain.google.com
2. ISP’s cleverly patched DNS server randomly changes the case of some of the letters in the request so that it now reads something like SomEboGussuBDOMaiN.GoOgle.cOm. (bear with me here, I know it looks goofy) and passes the request on to the root server. Root server responds back the information your ISPs DNS server requested and preserves the goofy casing.
3. Your ISP examines the flood of incoming requests and only accepts the one with the goofy casing that matches what it sent out, plus port number and transaction id.
Essentially, by messing with the casing of the letters, the ISP DNS server has introduced another random factor that an attacker would have to guess. It adds one bit of randomness for each letter. Unfortunately, you don’t get any benefit for number in a domain, as there is no “case” to numbers. The effectiveness of this method goes up with each additional letter, so the longer a domain name the better it works. Unfortunately, a lot of the most likely targets for cache poisoning attacks are heavily trafficked domains, which tend to have short names. Just take a look at some of the top sites in the world:
By and large, the top trafficked sites only gain 6-9 bits of protection from this method.
The 0x20 hack is certainly not the final solution, but is one of the few proposed methods which might actually mitigate the problem without breaking much in the process. Many of the Internet standards we have now, such as DNS, BGP and TCPv4 were designed literally decades ago in an inherently more trustful time. It is a small miracle that they have remained resilient and adaptable over their lifetimes, but the cracks are definitely showing. Longer term, there are going to be some real growing pains in migrating to new, incompatible Internet standards which will have security and risk mitigation as some of their main design criteria.