Moving a Python app from AWS to GCP
TL;DR A checklist to help migrate your Python app from AWS to GCP. You have to refactor how you access your cloud resources and I show how to trick boto into using GCP instead of AWS for S3 replacement by some compatibility mode, code in this gist.
I’ve not been coding in Python unless to support old projects or help at my day job. The few opportunities I can code nowadays I prefer to use Go but in my day job I have to support applications in other platforms and languages.
In the process of migrating this Python 3/Tornado based app the following decisions were made to avoid extensive refactors:
- Deployments would be made in a GKE cluster instead of generating automation to use GCP instances
- Adopt the local build/remote registry workflow for Docker and GKE. The app was packaged using Docker already so it would be a small Makefile change to add the proper workflow
- Data needed to kickstart the application lived on S3 and we would use GCP Storage to hold it in the same way.
- All communication to APIs that would have granted access inside a AWS VPC would have to communicate through public APIs using SSL and the proper authentication tokens.
With that in mind, GCP uses a project based approach instead of a VPC based approach initially so we created the project, enabled GKE, Storage, GCP registry and we were set. The initial GCP setup is easy too as the gcloud cli will install all that you need and setup your credentials right away.
Working up on a patch to make the application Makefiles to build the proper image and ship to the registry was a breeze.
The part where we got stuck was that Boto 3 was used to interact with S3 using instances profiles. That is a fit for the AWS ecosystem but it doesn't provides binding to other clouds — as Boto is an AWS maintained project the few products supported out of them were removed from the 2 to 3 transition.
Scoring the internet and Boto source code I managed to find a solution for the specific S3 issue:
- Create your storage and bucket in "Interoperability mode"
- Get the credentials in GCP panel or gcloud
- Change your application to receive the issued credentials, which should be stored as GKE secrets
- Use the following code snippet to guide you through the minimal code changes:
The important takeout is that Storage requires you to indicate the region using GCP format, the protocol version and to unregister the "set_list_objects_encoding_type_url" processor which adds the "?encoding" query parameters at the end of your bucket URL and breaks GCP.
I'd say that is next to impossible to not have to refactor some parts of an application if you are using more than one cloud. Even following things like 12 factor and implementing good abstractions, there is always a leak of external resources that changes between providers — in this case there was an "interoperability mode" but this is not granted for the future.