This week Slaughter & May’s long-awaited independent report into the issues that followed TSB’s 2018 migration to a new platform hosted by SABIS has been published. Aside from setting out the facts that led to the multi-faceted issues faced during migration, there are a number of key learnings that can be taken from the report. Here we’ve highlighted the most important learnings for any large company looking to, or already in the process of, undertaking such a large migration project.
Cheapest and fastest isn’t always best
This project went for a Big Bang migration, which while more often than not is the cheapest and fastest option, it’s also the riskiest. While the decision to go with a Big Bang isn’t necessarily the wrong one, and “bumps in the road” were expected to arise, in this instance it involved a new platform with an incredibly complex technology stack. Bumps in the road always have the potential to become mountains with this approach. Confidence of the highest order in the platform via rigorous testing would be needed to pull off this migration.
Some things are non-negotiable
This project was clearly massive in size. However, no matter the scale, there are some fundamental rules that are simply non-negotiable. Firstly, a clear understanding of all risks involved, and a plan for how to mitigate them, should always be established before setting a go live date. According to the report, the initial timetable was ambitious and unrealistic, estimated without fully knowing all of the requirements. While it’s really tempting to do this, it does put the whole project at risk of shortcuts as teams scramble to meet an unrealistic goal.
Secondly, functional and non-functional tests are not a corner that can be cut.
The review found that functional testing was separate from the development team responsible for fixing issues flagged during testing. While this isn’t necessarily unusual in such a large platform migration; this practice does make it really hard to ensure all teams are aligned and has the potential to become a “throw it over the fence” job, where accountability is lost. This is compounded when assumptions are made about consistency and accuracy. In this case only one of the two data centers were tested and it was assumed both were consistent. This proved not to be the case after the migration had failed. Even if you trust your teams implicitly to correctly configure data centers, given they are incredibly complex, failing to test all of them is not good practice.
Further, the report found that during performance testing, test targets were lowered after initial tests did not pass the original target load. This really is a non-negotiable for ensuring operational resilience.
Operational resilience is essential
One of the biggest learnings that can be taken from this report is the utmost importance of having operational resilience before going live. In this instance, after the official “go live” date, the capacity to respond to incidents simply wasn’t there. Only once there is unanimous agreement that the platform is operationally resilient, and a full plan has been put in place to mitigate any genuine bumps in the road, should a project go live.
While not the best example of how to execute a large platform migration, there are a number of positive key learnings that can be taken from the report’s findings. From ensuring absolute operational resilience prior to launch to standing firm on non-negotiable processes, by taking these key learnings into your next project you can mitigate many of the issues encountered in the TSB platform migration.
If you’d like any more information or to continue the conversation, please get in touch:
Mark Debney
Director, DevOps
mark.debney@6point6.co.uk
Originally published at https://6point6.co.uk.