Data migration

As I haven’t had too much time to focus on the cool, non-work related stuff I figured I could share my view of the task at hand: Data migration.

We have spent a lot of time waiting on the data as it’s not on our or our customer’s hand when it lands. We eventually received some test data and I drafted few points for the team to work on.

  1. When the data truly comes outside of the organization then the first thing you ask is the metadata. If the data is to be inserted into database the first question should be: How is the data structured? That means fields, their length and type, the amount of the data. You can start preparing the database and even create your own mock-up data when you know those things.
  2. The second thing to task yourself is the data mapping. That always requires business knowledge but usually the destination data structure is known so you can do a first run of the mapping without the help of business if they are not available. Get this right before you continue with the actual development. It’s always costly to fix things later.
  3. Be aware of the lookup or reference data. That data should also be maintained centrally in an MDM system or similar as it helps in a continuous process.
  4. Separate the initial/staging data from the wip data and the result data. Preferably contain the different stages of data into different databases. This is also a security issue as you don’t want to expose the initial or interim data to end-users or other systems.
  5. Test test and test. Unit test small enough data flow tasks and also test any components which transform or filter the data.
  6. Performance test the data flows with as complete data sets as possible. What works fast with 5% of the data might collapse when you are running with full data set. Again, getting this right as soon as possible is better, faster and cheaper than trying to fix the performance issues in the production.
  7. Create data quality metrics/reporting. Even though you might not own the data it’s your responsibility to address the data quality issues by letting the data owners or source data extractors of the issues. A clear and transparent DQ metrics will help the whole organization even though some people might frown and think it as “name and shame”. In a data-driven company the metadata is always appreciated and should guide the decisions and not be the news itself.

Not sure if you agree but find these to be common things which I would plan into any data migration project.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s