Having an automated deployment process, or Continuous Delivery can hugely speed up the time to get a “done” software released. Blue-Green Deployment, as one of the techniques being used in it, can reduce downtime by having two identical production environments. Only one of the two environments have live traffic going in at any time. For example, if Blue is live, Green is idle. A new release can be deployed to Green. It will only be functioning when live traffic is switched to it via a router. Another benefit of doing so is that if something wrong happens on Green, it can be easily rolled back to Blue.

While working at MoJ Digital and Technology, we used our own infrastructure-as-code tool in Python to manage infrastructure and resources on AWS. By the time I joined, there was no Blue-Green feature in the tool yet. I added it using the same AWS Python library “boto3”. Some pseudo code has been included at the end of this blog to clarify my idea. Note that to keep it clean, boto3 apis are not included in the code. My pull request on this can be found under MoJ’s github.

Implementation

To have identical environments on AWS can be easily achieved by having the same infrastructure configuration or CloudFormation template. As for the “switch”, Route53, AWS DNS management service is sufficient for the job.

Shifting traffic between two environments is essentially switching between DNS endpoints. For example, in the graph below, Blue is live and Green is idle. After Green is ready, Route53 can be switched to Green. Also in the case that Blue fails, it can be switched back to Green.

 

In our implementation, each stack has a set of two DNS records, one to record its stack ID, one to point to its LoadBalancer endpoint. In order to manage all the DNS endpoints, we introduced an independent DNS set active – Route53 switch, to point to either blue or green. The incoming traffic first goes through active then whichever its records point to. Below is what they may look like in AWS Route53 depending on how you format the record names.

STACK DNS NAME TYPE RECORD VALUE
blue stack.blue.demoapp.automationlogic.com TXT 819273
demoapp-819273.automationlogic.com A [elb-endpoint-819273]
green stack.green.demoapp.automationlogic.com TXT 123456
demoapp-819273.automationlogic.com A [elb-endpoint-819273]
active stack.active.demoapp.automationlogic.com TXT 123456
demoapp.automationlogic.com A demoapp-123456.automationlogic.com

The numbers in TXT’s RECORD VALUE is stack IDs we used to identify stacks, which we append to stacks’ Load Balancer name. In the example above, Blue and Green have their own LoadBalancer DNS endpoints as demoapp-819273.automationlogic.com and demoapp-123456.automationlogic.com respectively, and the top level LoadBalancer active’s endpoint is demoapp.automationlogic.com. Active is pointing at green so green is live and blue is idle. The overall blue-green implementation is presented below in pseudo code. Note that we don’t create stacks or functions in the code, this is only to show the logic.


class StackDNS:
    def __init__(self, appname, stackname="active"):
        '''
        Constructing two sets of DNS records:
        ---------------------------------------
          stack_dns_name  |TXT|   stack_id
          elb_dns_name    |A|     elb_dns_value
        ---------------------------------------
        '''
        self.stack_dns_name = "stack.{0}.{1}.automationlogic.com".format(stackname, appname) # stack host
        if stackname == "active":
            self.stack_id = None
            self.elb_dns_name = "{}.automationlogic.com".format(appname)
            self.elb_dns_value = None
        else:
            self.stack_id = generate_uuid() # stack dns value
            self.elb_dns_name = "{}-{}.automationlogic.com".format(appname, self.stackid) # load balancer host
            self.elb_dns_value = get_elb_endpoints_from_aws()


def dns_switch(blue, active):
    '''
    route53 swifts trafftic to blue or green
    by updating "active" stack dns records to one of them
    :param blue: one of your deployment stacks
    :param active: the dns records that accepts incoming traffic
    '''
    active.stack_id = blue.stack_id
    active.elb_dns_value  = blue.elb_dns_value
    # If dnsname exists and dnstype matches, update its value to dnsvalue.
    # UpdateDNSRecord(dnsname, dnsvalue, dnstype)
    UpdateDNSRecord(active.stack_dns_name, active.stack_id, "TXT") # active points at stack id
    UpdateDNSRecord(active.elb_dns_name, active.elb_dns_value, "A")


def deploy_and_switch():
    '''
    ------------------------------------------------------------------------------------------
    blue   | stack.blue.demoapp.automationlogic.com         |TXT|     819273
               |  demoapp-819273.automationlogic.com              |A|       [elb-endpoint-819273]
    ------------------------------------------------------------------------------------------
    green  | stack.green.demoapp.automationlogic.com        |TXT|     123456
                | demoapp-123456.automationlogic.com              |A|       [elb-endpoint]
    ------------------------------------------------------------------------------------------
    active | stack.active.demoapp.automationlogic.com       |TXT|     -
                | demoapp.automationlogic.com                              |A|       -
    ------------------------------------------------------------------------------------------

    After the first switch, active is updated to blue:
    ------------------------------------------------------------------------------------------
    active | stack.active.demoapp.automationlogic.com       |TXT|     819273
                | demoapp.automationlogic.com                              |A|       [elb-endpoint-819273]
    ------------------------------------------------------------------------------------------

    After the second switch, active becomes green:
    ------------------------------------------------------------------------------------------
    active | stack.active.demoapp.automationlogic.com       |TXT|     123456
                | demoapp.automationlogic.com                              |A|       [elb-endpoint-123456]
    ------------------------------------------------------------------------------------------
    '''
    blue = StackDNS("blue", "demoapp")
    active = StackDNS("demoapp")
    dns_switch(blue, active)
    green = StackDNS("green", "demoapp")
    dns_switch(green, active)

if __name__ == "__main__":
    deploy_and_switch()
 Pseudo code implementation

Next Steps

The solution presented in this blog is only to get the switch working. For version two, it would be nice if Weighted Routing could be embedded in. Weighted Routing is a routing policy on Route53 to control traffic weights among DNS endpoints. With this policy, we could test a pre-production stack with a small amount of live traffic or increase the amount gradually until it reaches 100%. For example, if blue is live, green is idle. we could direct 90% of the traffic to blue and 10% to green. This way if green goes wrong, it hopefully won’t have a disastrous impact on the overall experience and also we can quickly roll back to blue. If green acts well, we could then increase the weights to 30%, 60% or 100% so it goes live.

In summary, to reduce downtime, we used two AWS services to implement Blue-Green Deploy a) CloudFormation to create identical environments, blue and green; b) Route53 record sets to shift traffic between blue and green stacks.