Netflix balances traffic across multiple Amazon cloud regions to prevent outages

Netflix has had problems with outages in the past that left users unable to watch content on its streaming service. In the past, Netflix only used a single Amazon Web Services (AWS) cloud region and a failure in that region left users unable to connect. The last major outage happened on Christmas Eve of last year.

Netflix is hoping to prevent this sort of outage from happening again and is now balancing its traffic across multiple AWS regions. The key here is to provide increased availability in the event that one of the cloud regions goes down. Rather than taking all the streaming traffic with it in an outage, only an affected area would be without streaming capability.

Previously Netflix hoped to prevent any similar outages by designating its streaming service to failover from one AWS region to another if one region went down. However, Netflix has now changed and is using multiple AWS regions for its streaming at all times. Netflix engineers announced this week that the company is streaming data across the AWS US East-1 and US West-2 regions and balancing user traffic across those two.

If one of those regions happens to go down, Netflix can route all the traffic through the other. The move to using multiple AWS regions all the time did require some work on Netflix's end. It had to ditch the Asgard cloud management tool because it could only work with single regions. Netflix built a new tool called Mimir to work with the multiple AWS regions and replace Asgard. It also replaced its Simian Army tools with a new set called Chaos Kong for testing cross-region fail over scenarios.

SOURCE: Gigaom