Thank you nscd
May 3rd, 2010Although many blogposts are available on this topic, I struggled a couple of hours on this thing.
At Netlog, my employer, we’re interfacing with Gatcha, our new game-distribution platform, which runs entirely on Amazon EC2. Amazon has a great Elastic Loadbalancer which you can easily CNAME your DNS records to for public use. Our loadbalancer is currently gatchalb-154894459.eu-west-1.elb.amazonaws.com (which is no secret), and www.gatcha.com is CNAME’d to that record.
It’s a pitty that you don’t get your own IP’s to use with the loadbalancer, I’m not quite happy with the situation for two reasons:
- You can’t CNAME the root of a domain without CNAME’ing it completely. We were unable to get gatcha.com (without the www.) CNAME’d correctly, so we had to point it to our serverpark in Brussels where we forward it to www.gatcha.com. (We wanted to use our own MX records, so that’s why the whole domain is not CNAME’d to Amazon)
- Amazon LB uses a TTL of 60 seconds, and they DO switch IP’s regularly. If the LB hits more traffic, it gets upscaled with more instances (and thus more IP’s). When the traffic drops it gets downscaled, and this is where the trick part begins.
Since we use nscd on all our servers to cache DNS, and we use Debian Lenny (which ships with glibc version 2.7-18), there are some flaws in nscd that ignores the TTL of DNS records (see http://sourceware.org/bugzilla/show_bug.cgi?id=4428). What happend is that we were sending requests to IP’s that weren’t active in our loadbalancer, since this was downscaled in low traffic periods, and we saw suddenly other content than what we expected. (The IP’s were assigned to a new loadbalancer instance). It took us a while to figure that out and the only thing we could do is get rid of nscd and install dnscache (which is more configurable).
A messy bug which annoyed us a few hours