Database migrations

For the past few weeks I’ve been working on indexomator, a simple web application to track student entry and exit times during the student campus blockades. It’s the closest I’ve come working on a real project, almost all faculties in my city use it and it’s essential for them. Working on it gave me more experience and mostly confirmed my prejudices in regards to web development and programming as a whole. Today I’d like to discuss one particular aspect that gave my team a lot of trouble.

To set the scene, the application is written in typescript, uses svelte and svelte-kit for front end, PostgreSQL for the database, and drizzle-orm to access it. We use git for version control, to put it simply a new branch is created for every feature, and when the work on them is finished they are merged back into main.

Another thing I need to mention is that we get many feature requests for indexomator, and pretty much every single one requires changing the database schema. Essentially, that requires altering tables in the database on every update, and for that we generate database migrations, scripts that are run to apply appropriate changes. These migrations also have to be applied in a certain sequence, and old migrations mustn’t be rerun.

This is very annoying and finicky, but works as long as the changes are made and applied one after another. It fails hard however when you have multiple branches and they all change the schema.

What happens is that there are multiple features being worked on, each making its own separate changes to the schema. Drizzle generates a new file called something like 0009-random-name.sql, where the random name changes each time a migration is created. Drizzle also updates a file called _journal.json to keep track of the new migration. Let’s say there are 3 features being worked on, there were 6 migrations by this point, so each feature made a new 7th one, each calling the file something different. One feature gets finished and merged before the others, and now main has 7 migrations. Second feature now gets finished and is about to be merged in, but there’s a problem, you have 2 7th migrations. Git is ok with having 2 0007-...sql files, since they probably have different names, and it only sees a problem with the _journal.json file which is machine generated and not really possible to solve the conflict by hand (other than regenerating the migration as I’ll describe later). The changes to the schema itself aren’t in conflict, they’re almost always completely independent from one another. But they way drizzle manages them makes it impossible for them to be merged like other changes to the codebase.

The way we always fix it is that the developer on new branch resets the drizzle folder to main, regenerates their migration, and then the branch can be merged. But now they’re in trouble because their local database can’t be updated since the timeline of changes changed right under it. He now needs to poke around the database to make the migrations apply properly. This only gets worse as development history gets less linear. In one case we couldn’t rebase a branch onto another because an older migration that since got reset caused a conflict in the middle of the rebasing process.

Some blame lies on drizzle for the way it manages migrations, but this issue comes up no matter how one accesses the database or generates migrations. The problem is with the relational databases for making errors out of non-issues such as removing a column that doesn’t exist, adding a column that’s exactly the same as the one already in it, and further more one can’t recreate a table with some new columns and expect the database to do the right thing.¹

This sort of thing is absolutely infuriating and I’m honestly wondering how SQL databases became so popular. There’s so much infrastructure to handle changes to the database schema, something that’s as inevitable as death and taxes. And when it fails it fails drastically and the most one can do is try to keep the damage restricted to the people who know how to fix it. I’m honestly asking how so many applications get developed with SQL databases if the underlying technology is so allergic to schema changes. Pretty much any other way to store data doesn’t have this problem (except binary file formats).

Yes, after you waste an afternoon on this bullshit it becomes tolerable and doesn’t feel so bad, but it doesn’t make it go away. It’s just another software problem in a world that’s flooded with them. And like all others, nobody will fix it because it takes more effort than putting up with it over and over again.

Drizzle-kit has a command push that basically does the right thing, makes whatever changes are necessary to make the database match the schema. From what I can find by looking this can’t be done within the application, so it doesn’t work for our needs.↩︎