Back

Backup mysql with aws lambda

I was asked to setup a logical backup system for a MySQL databases hosted by AWS RDS. There were a few options on how to set this up, i've used EC2 in teh past but decided to utilise AWS Lambda.

Here are the steps I took to set this up

Step 1 - Dependencies

There were two libraries that I would need to use to get this up and running - aws cli and mysqldump.

The Layers feature in Lambda allows you to bundle dependencies in zip files, and upload these seperately to the actual function code.

Preparing the dependencies

Firstly I needed to prepare some dependencies on my local machine.

MySQL Client

I needed the MySQL CLI to be able to connect and backup the database.

  1. I downloaded the Mysql CLI Generic Linux binaries from (dev.mysql.com)[dev.mysql.com]. This was provided in a .tar file.
  2. Extracted the tar which gave me three other .tar.xz files mysql-8.0.16-linux-x86_64-minimal.tar.xz, mysql-test-8.0.16-linux-x86_64-minimal.tar.xz, mysql-router-8.0.16-linux-x86_64-minimal.tar.xz
  3. I extracted mysql-8.0.16-linux-x86_64-minimal.tar.xz which gave me a whole heap of files and folders including bin and lib. Initially i thought that the answer would be to zip this whole directory however Lambda layer has a 250mb limit and this directory was larger.
  4. I zipped up the bin directory zip -r /mnt/c/Users/Jake/Desktop/mysql_cli. NOTE: Be careful that if there are any sym links in that folder to copy those as well (I unknowingly broke my relative sym links as I moved the bin folder into its own folder)
  5. I double checked that the zip has a bin folder in the root with the mysql and mysqldump binaries inside.

This dependency is good to go. Now to prepare the next one.

AWS CLI

I needed to be able to send the backups to an Amazon S3 bucket.

  1. I ensured pip was installed on my local machine
  2. Created a new folder 'aws-cli'
  3. Run pip3 install awscli --target aws-cli. This installed the package into the new target folder.
  4. Zipped up the folder zip -r /mnt/c/Users/Jake/Desktop/aws_cli aws-cli, ensuring that the zip had a bin inside it with the binaries.

Uploading Dependencies as Layers

To make these dependencies available to the Lambda function, I had to upload these as Layers.

NOTE: It's important to double check that the folders have been zipped correctly and have a bin folder. Lambdas upload the layers to the /opt/ directory, and by default, have /opt/bin on their PATH.

  1. Navigate to the Layers section
  2. Create a new layer and mark it as being compatible with the Ruby 2.5 runtime
  3. Depending on the size of the zip, upload it directly or via Amazon S3

I needed to create two layers, 1 for each depedency.

Step 2 - Lambda Setup

I started creating the new lambda function and decided to use the Ruby 2.5 runtime. I also created a brand new IAM role for this Lambda as I knew there would be fair bit of permission work that I will want to keep nice and neat in a role.

Using the Layers menu, I also associated my two new layers I created in the previous step.

Testing

Next I tested that these libraries were accessible with a simple Ruby script that I expected to print out the versions of libraries.

require 'json'

def lambda_handler(event:, context:)
  `mysql --version`
  `aws --version`
end

I received the following output, though:

START RequestId: 241600f0-8f5f-470c-9c0c-5a700488805c Version: $LATEST
Traceback (most recent call last):
  File "/opt/bin/aws", line 19, in <module>
    import awscli.clidriver
ImportError: No module named 'awscli'
END RequestId: 241600f0-8f5f-470c-9c0c-5a700488805c
REPORT RequestId: 241600f0-8f5f-470c-9c0c-5a700488805c  Duration: 922.71 ms Billed Duration: 1000 ms    Memory Size: 128 MB Max Memory Used: 57 MB  

This indicated that there was a problem with the CLI driver not being found. The AWS CLI has a folder in the root called aws cli but it didn't seem the function could access it. Remembering that layers are mounted in the /opt/ directory, we actually have to tell Python to look in this directory for the library code.

After some digging I found that we needed to set a PYTHONPATH environment variable with the value pf /opt/

After running again I received a successful response.

START RequestId: 3f71c4b1-e26c-46d9-b5f1-5137defaeb97 Version: $LATEST
"mysql  Ver 8.0.16 for Linux on x86_64 (MySQL Community Server - GPL)\n"
"aws-cli/1.16.173 Python/3.4.3 Linux/4.14.114-93.126.amzn2.x86_64 exec-env/AWS_Lambda_ruby2.5 botocore/1.12.163\n"
END RequestId: 3f71c4b1-e26c-46d9-b5f1-5137defaeb97
REPORT RequestId: 3f71c4b1-e26c-46d9-b5f1-5137defaeb97  Duration: 20067.36 ms   Billed Duration: 20100 ms   Memory Size: 128 MB Max Memory Used: 101 MB 

Function Code


require 'json'
require 'date'

def lambda_handler(event:, context:)
  run_id = DateTime.now.to_s

  system(
  "mysqldump --column-statistics=0 --single-transaction --quick --host=#{ENV['RDS_HOST']} --user=#{ENV['RDS_USER']} --all-databases | gzip | " \
  "aws s3 cp - s3://#{ENV['BACKUP_BUCKET']}/#{run_id}.sql --region=ap-southeast-2 --acl 'bucket-owner-full-control'"
  )

end

This is a pretty simple script that can be used to take backups. Initially, this script was backing up to a file in /tmp and then uploading to S3 in two seperate commands. I was running into disk space problems though because the backup was over 512mb. I opted for streaming the backup straight up to S3 instead, so that data isn't persisted on the Lambda. There are a few environment varables that need to be passed in also (these can be seen in the code snippet above).

Also special notes on backing up MySQL databases - ensure you add the --single-trasaction flag so as not to block the tables on the database and cause your app to sink.

Step 3 - Permissions

VPC Access

The RDS instance that I want to backup is actually in a VPC, which means it's not accessible by default from my Lambda function. The answer is to set the function to run from inside the VPC. This can be done inside the Network configuration of the function and is pretty straight forward - just specify the same VPC & Subnets that the RDS instance lives in, along with an appropriate security group.

VPC Execution Policy

Seeing as the Lambda function needs access to a VPC, I also needed to give the Lambda IAM role the AWSLambdaVPCAccessExecutionRole policy.

RDS Access

I also ensured that the database security group allowed access to the database port (mysql is 3306) if the traffic is coming from the same security group that the Lambda function is using.

S3 Access

If the Lambda is in a VPC subnet that doesn't have access to S3 (if the subnet doesn't have a NAT for example), then you will need to create an endpoint inside the VPC that points to S3. This is the first time I've used it and it is very straightforward.

S3

I had to set the following policy on the Lambda Role, granting it access to write to my 'backups' S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:PutObjectRetention",
                "s3:PutReplicationConfiguration",
                "s3:PutObjectLegalHold",
                "s3:PutObjectAcl"
            ],
            "Resource": [
                "arn:aws:s3:::backups/*",
                "arn:aws:s3:::backups"
            ]
        }
    ]
}

Complete

And thats it! This gave me a function that I could setup to run on a schedule.