Which comes first: “the BI” or “the Data Cleansing”?

Filter By Topic


Nope, this is not a “chicken or egg?” question (and the answer is the egg was first by the way – email me if you want to know why).

Do you implement a BI system with data from a database that you know has terrible data? or do you clean up the data first? These are the questions faced by our clients every day. Here is one story about a CIO who regretted trying to clean his data first.

"Fred"* and his team faced the task of deliver operational reporting through a yet-to-be-selected BI tool. Fred knew the operational systems had bad data. It wasn’t so much the numbers, the inventory amounts and sales amounts, it was the naming conventions, ID’s, locations, mixed up SKU’s, inconsistent categories, and more ....

Fred hired a team of 3 to clean the data, and the project was set for six weeks. 52 weeks later, they were not yet done. Ouch! But the cost was justified: get this data right, the business gets visibility into their progress and it helps growth. They can't rely on the P&L alone because that only comes out once a month. They need daily insights so cleaning the data is crucial.

Coincidently, the Finance team had selected CALUMO for financial reporting and budgeting. Fred knew about our project. One day, I asked if I could ‘play’ with the latest data set in CALUMO. Fred thought it would be interesting, and in his mind, he was testing me. So I went ahead and loaded the data as best I could. After 52 weeks of ‘cleaning’, I was shocked at the mess. By looking at the data in CALUMO, I could see all the issues in moments.

After only a few hours of playing with the data, I showed Fred. His jaw dropped. I only spent 5 minutes with him when he called in his data cleaning team. The next two hours were spent reviewing their data in CALUMO. Fred said, “You just turned on the lights. This is like walking into a kitchen, flicking on the lights and seeing the roaches run.  And naturally, you go for the big ones first. All this time, we had the lights off and were almost randomly spraying roach spray”.

Fred openly says he is still kicking himself for not putting a BI system on their data first, so the can see where the real issues are, rather than making ‘educated’ guess.  Before CALUMO, Fred's team were eight running T-SQL statements without connecting the data and creating a business outcome data picture. The data cleansing project finish about 8 weeks after that, a little longer than the initial estimate 😊

If you want to know more about CALUMO, or comment about chickens/eggs, drop me an email wleitch@calumo.com

*name changed for this story