← See all notes

Database Migrations

We split migrations into “pre-deployment” and “post-deployment” migrations. Pre-deployment migrations run before deploying code changes, and were only allowed to make backwards compatible changes (e.g. adding a column). Post-deployment migrations run after deploying code changes, and were typically used for cleaning up past migrations (e.g. removing a column that’s no longer in use).

Database migrations were no longer allowed to reuse application logic (e.g. Rails models) and instead had to define the classes/methods they needed themselves. The result is that migrations are essentially a snapshot of the code they need to run, making them more reliable and isolated from the rest of the application.

For large scale data migrations (the kind of migration that can take days or weeks to run) we scheduled jobs running in the background using Sidekiq. These migrations could take days or even weeks to complete. A future deployment would then include a migration to check if all work is performed (performing it if this isn’t the case), then perform the necessary cleanup work.

While this setup allowed GitLab to migrate both small and large tables as well as data stored outside the database, it highlights several problems with the migration system provided by Rails:

  • It only provides basic primitives for making structural changes, but provides nothing to scale beyond that.
  • It doesn’t provide anything to ensure the process is timeless, i.e. there’s nothing stopping you from depending on application logic that may change in unexpected ways, breaking the migration in the process.
  • Rails provides nothing for writing tests for migrations, requiring you to roll your own solution.

https://yorickpeterse.com/articles/building-a-better-and-scalable-system-for-data-migrations