Loading Jaywing website
13 January 2025 / Opinion

What is Data Version Tracking, and why do I need it?

Ben Kentzer / Head of Data Engineering

Change happens. 

Whether it’s a company policy changing due to legislation changes, a new version of a campaign selection to pick up a new audience, or a new version of a credit scorecard, it can be critical to retain the details of the previous versions. This allows you to look back and understand why particular decisions were made at the time. 

Developers, Engineers, Data Scientists all understand the importance of tracking changes in their code to help future users in a similar way. 

Documents can be set up to use “Track Changes” or similar methodology, along with retaining previous versions, either by renaming the document or by using a version aware filing system like SharePoint.  

Program code is typically stored within a version control system, like Git, which stores changes as they are committed and aids collaboration and review ahead of pushing new code into a live environment. 

But what about data? 

Often, it’s important to know why you made a particular decision about an individual, but the data has moved on, and the information about the customer has changed.  

Imagine selecting a customer for an email campaign, and they then complain that they haven’t given permission. You check the database, and they are right – there is a “No” in the email permissions box. GDPR means that you must keep the source and date for the change, but you don’t know what the value was previously. 

It’s not practical to keep a full snapshot of the database every time you perform analysis, score a model, select a list for mailing. So how do you deal with it? 

Back in the 1980’s, a chap called Ralph Kimball led a team that developed a standard method of updating data. This framework – known as “Slowly Changing Dimensions”, and still used today – sets out rules for how data should be updated, and the idea is you pick the best rule for your specific use case. 

The main advantage of this approach – and where we use it within Almanac at Jaywing – is that we can retain the dates and values for every change to data once we start loading. There is a concept of “current” and “historic” data. The “current” data is used for most of the everyday use of the data. The “historic” is there to help show trends about individuals, but also to capture the value of the email permissions at a point in time – along with every other change. 

Jaywing’s Almanac solution allows us to retain these updates in an efficient way, ensuring that users always have access to the latest data. The change tracking is done automatically as part of the data load, so once it’s in place it just keeps adding to the history.  

Almanac is a one-stop data gathering, storage and management system that allows us to store your data securely, whilst giving both you and our analysts access to deep dive into the data or view top level dashboards – and everything in between! 

It is custom built for you using our framework. Many sources are “plug and play”, but we can link to most other data sources with a little bit of development. 

Of course, it’s not just the data content that we’re tracking changes for – it’s the designs, the program code, the configuration information, even the database structures themselves.  

You may never need to check if that selected individual was eligible for that campaign three months ago. However, with version-controlled data, you are safe in the knowledge that you can.