Performing large-scale software migrations with confidence

During large migrations, we always invest in correctness tests. I'll share how we do this to compilers, interpreters, and generic data, plus some caveats.

by Toine Hartman on 18 Feb 2025

Roundtrip software changes visualized

Any substantial change to critical software should be backed by a high confidence in correctness. Testing can be difficult when working with existing software systems. Automated tests might be sparse and hard to write effectively, while manual testing practices are often error-prone and labor-intensive.
During the reverse-engineering of legacy systems, in particular, correctness can be challenging to define and components might interact in non-obvious ways. Therefore, simply “writing tests” is not as simple as it sounds.

In this post, we discuss round-trip testing, a rigid approach to test backwards compatibility and correctness of large migrations and refactorings.

Migrating with confidence

At Swat.engineering, we are often involved with large software migrations. Some examples:

Migrating from a legacy binary data format to a textual one (and migrating tens of gigabytes of data).
Replacing a large code base in a general-purpose language with an implementation in a DSL.
Extending a DSL with backwards-compatible language features (optionally rewriting old code).

Regardless of the nature and extent of the migration, there are pre-migration (legacy) and post-migration (new) states. A strong desire to gain trust in the migration should be present before adopting it. This means the states before and after the migration should be equivalent. Two important properties need to be asserted here:

Anything that could be expressed in the legacy format can be expressed in the new one (i.e. backwards compatibility).
No information is lost during migration.

These are properties that need to be checked at a fairly high level. We present our approach to testing these large software migrations.

Migration validation

We use a specific testing approach to test for these properties. The idea is as follows: given the pre-migration state (be it legacy code, data, or otherwise - ‘data’ from here on), transform it to the post-migration state (new data format, a domain-specific code base, a new version of the DSL, etc). Check that the input and output are equivalent.

This requires two things:

A transformation from pre to post. We need this anyway to migrate the existing data before we can start rolling out the migrated software to staging and production environments.
A notion of equivalence, to compare the datasets, and an implementation of this equivalence.

The exact validation approach depends on which tools operating on the data are already available.

Using external tools

If we are extending an existing DSL with new language features, for example, we can use the existing compiler (or interpreter) in our testing procedure. Our changes to the language require us to develop an updated version of the compiler as well. By comparing the outputs of the two versions of the compilers (or interpreters), we can establish equivalence.

Roundtrip with data reproduction instead of compiler output

In cases where we cannot check the equivalence of executables, a custom equivalence relation (e.g. by executing both executables and comparing outputs) can provide a solution. If we establish equivalence, we know that our migration and our updated compiler can handle any input (given enough diversity in the test data, that is).

Using reproduction

When the tools that consume the data are not suitable for use in the testing procedure (e.g. UI tools), we can use a slightly different approach. Instead of comparing new data, we compare legacy data. We enable this by implementing the inverse of the migration (i.e. ‘reproduction’) and comparing the original legacy data with the reproduced copy.

Roundtrip with data reproduction instead of compiler output

If we establish equivalence, we know that our migration and reproduction are correct and can handle any input (given enough diversity in the test data). A downside is that this requires implementing the reproduction step just to enable testing.

Does this mean that everything is okay now? Well – in practice – there are some caveats.

Caveats

Some caution should be applied when using these testing approaches.

Malformed or stale data

If some of the test data is malformed, it might not be possible to transform and reproduce it. Also, if it contains duplicate or stale data, and the migration cleans this up, it cannot be reproduced.

In both cases, equivalence might fail. Some possible solutions are to skip that chunk of data during equivalence checking or to adapt the target design to support malformation/duplication (for example, to add some kind of junk/legacy bucket).

What is equivalence?

Equivalence cannot be checked unless we define what it is. We could use byte-wise or textual equality in many cases. However, when we are dealing with the aforementioned anomalies or if our data contains metadata (e.g. time stamps), then although byte-wise equality will show changes, interpreting those differences is not straightforward.

A more robust solution is equality modulo expected changes – a custom implementation that ignores certain expected differences. Although this solves most of the issues above, it might compromise correctness claims if it is not designed carefully.

Mirrored bugs in conversions

If bugs exist in multiple stages of the round-trip procedure (in the migration and in the reproduction or updated compiler), which are the inverse of each other, equivalence could succeed while the migration or updated compiler have bugs.

We can detect these bugs automatically by introducing deliberate changes to the test data during the round-trip testing routine. Any change made should show up in both the migrated and reproduced/compiled data. If either of the conversions does not propagate the change, it indicates a bug.

Do not extend while migrating

Often, these migrations are motivated by the desire to extend the feature set of the software. In this case, it is highly advisable to split the work into two distinct stages.

Migrate to redesigned format.
Extend the new format with new (backwards compatible) features.

For each of those stages, correctness should be verified separately. If migrating to a new format is combined with the addition of features, then any faults that come up during testing (or even worse, in production) will be very hard to track down.

Key takeaways

Round-trip testing can boost the confidence in large software migrations that are hard to test otherwise.
Think carefully about which method you use to determine equivalence.
A high confidence in data migrations, refactorings or DSL extensions can be obtained using a representative set of input data and proper testing routines,

Get in touch

Are you struggling with migrations because of concerns about correctness or completeness? Or are you curious about robust testing approaches? Then reach out to us. We look forward to discussing how our solutions could help you solve the challenges you’re facing.