Robust data migrations in rails 6

4 min readFeb 15, 2021

As your application evolves your models change and so does your database. In Rails this is solved using migrations. If you aren’t familiar with the concept of migrations I would suggest you read the rails guides about migrations first.

Data migrations are just migrations that change data in the database.

Testing data migrations

Consider the following migration. It concatenates the first and last name into just a name. Silly example but it demonstrates the purpose.

Now this should work right? I mean the syntax seems correct and you have rolled back your database and ran it again it works perfectly. So time to move on, right? No not exactly.

You have forgotten to test your migration. It’s not very common and not perfect yet but you should still do it.

Firstly you can develop your migration test driven. There is no longer a need rollback and migrate the database a bunch of times. This is especially great when you add additional migrations after your data migration.
Secondly you can have the certainly of TDD that your migration is correct.

So let’s write a test for the the migration above. I will be using RSpec but it should work with most testing frameworks.

Create a new file in spec/migrations:

When your code is merged you can either delete the test or keep it for documentation purposes.

SQL for data migrations

There are some advantages and some disadvantages to writing your data migration in sql.

Advantages

First let’s look at the reason why this is even considered when you have the power of rails models.

After you developed your migration it will run fine with the current state of your model but after more changes by you or other developers you may run into issues.

Let’s say you took the above migration and used it in an actual project:

Again it works and it is even tested now.

A week into the future your coworker sees what you did and wants to correct your obvious mistake which is not separating them using a semicolon. His data migration will most likely look like this:

def change
  User.all.each do |user|
    user.update(name: user.name.gsub(' ', ';'))
  end
end

Additionally he creates a validation that ensures all names will be stored this way from now on:

validates :name, format: { with: /;/ }

An other week later all code changes are deployed and you sadly realise that the migration you wrote fails. The error message will be:

ActiveRecord::RecordInvalid (Validation failed: Name is invalid)

The reason the migration failed is because of a validation you didn’t know existed because it didn’t when you wrote the migration.

To prevent this from happening in the future you could use SQL in you data migration:

def change
  User.all.each do |user|
    execute "UPDATE users
             SET name='#{user.first_name} #{user.last_name}'
             WHERE user_id = #{user.id}"
  end
end

It won’t make your migration invincible from failing but it will be protected from model changes.

All statements should be written in SQL rather than using model. In the example above the User model is used. This can still cause issues. If you were to use a scope in the query and it changes it might fail or do something unexpected or the user table might get deleted all together and with it the User model. Then it will certainly fail. But SQL will still happily perform the queries as the database still exists in the same way as when you wrote the migration. So lets do it:

def change
  execute "UPDATE users 
           SET first_name =
           CONCAT(users.first_name, ' ', users.last_name)"
end

Even less code.

Disadvantages

Well to be honest. It’s kind of annoying. When using rails writing sql isn’t very common so you are most likely not very proficient in it and it will be slower than writing it using models.

Also it’s less readable. This is problematic when someone is reviewing your code or wants to understand what that one migration did 4 months ago.

Timestamps and timezones are tricky and normally rails would do all of this for you but here you have to figure it out on your own.

What to use now?

Well I can’t give you the definitive answer as with everything in live: It depends. But I can give you some advice on what you should consider in your decision:

Is your migration run before it gets out of sync with the model? If the code you are working on is frequently deployed it is less likely to be an issue as less changes happen at once.
Are many different people working concurrently on the code? If lots of people work on the same code as you it increases the likeliness of your code failing.
Do you care about non development and production environments Besides development and production you may have a testing environments and every developer has one of his own. The question here is whether you care about the database. If it is tedious and time intensive to setup then you want to prevent the migrations from becoming not runnable. Else you can just drop the database and load it from the schema. But your co-workers might not know this and it can cost them valuable time.
Do you care about the migrations running from start to finish? At some point a data migration will get out of sync if it isn’t written in sql and you won’t be able to run the migrations from start to finish.
What is the norm at your company / project? If everybody is already always using sql then i would recommend you do so too. You can always discuss it’s value but don’t start a war on it. The same goes the other way around.

Thanks for reading ❤