fbpx Skip to content

The How and Why of Data Cleansing

insightsoftware -
July 26, 2021

insightsoftware is a global provider of reporting, analytics, and performance management solutions, empowering organizations to unlock business data and transform the way finance and data teams operate.

10 2021 Calumo Blog Thehowandwhyofdatacleansing Inline

The value of data in business has grown exponentially as more business leaders understand its importance in informed decision-making. However, truly valuable insights can only be realized if the data relied on is accurate. Analysis and forecasts reflect the quality of the data that feeds them, so relying on “dirty” data can cause more harm than good.

In this blog, we explore how you can help cleanse your data by:

  • Using your “dirty” data sooner rather than later.
  • Shifting your organizational data culture.
  • Implementing a corporate performance management (CPM) solution.

How Much is “Dirty Data” Costing You?

According to The Data Warehouse Institute (TDWI), “dirty” data is costing US companies around $600 billion every year in lost revenue, missed opportunities, and ill-informed strategic decision-making. Data is “dirty” when it is incorrect, incomplete, irrelevant, duplicated, missing, or improperly formatted.

For many organisations dirty data is caused by:

  • Collecting data haphazardly over the years through multiple sources.
  • Using older technology or systems that can’t keep up with current data demands.
  • Trying to condense large, complex datasets into a more manageable form.
  • Integrating systems with duplicated or mislabelled data.
  • Linking different data sets after a business merger or acquisition.
  • Users who lack understanding of data systems and how to use them.

Air Your Dirty Laundry, Don’t Wait for “Clean Data”

You may be thinking, “When is my data ‘clean-enough’ to be useful?

The hard truth is that your data is in a constant state of decay. Achieving clean data is a time-consuming and ongoing journey. There are many tactics to help you realise this goal, but it could take several months, if not years, for your data to be considered “clean.”

That’s why we believe in using your data now, and we aren’t the only ones.

According to research completed by the University of Texas, increasing data usability by 10% would boost annual revenue for Fortune 1000 companies by more than $2 billion.

Benefits of Publishing Your “Dirty Data” Early

Don’t wait to publish your data. A significant benefit of publishing your data early, through a platform like Calumo, is that it makes people more accountable. Share the data cleansing load with those responsible for collecting it, allowing you to use your data to better effect sooner.

Increase Staff Accountability

More often than not, finance teams find it difficult to engage the business and drive accountability in the data collection process. Publishing and sharing your data, even before it is 100% clean, helps educate employees as to the importance of accurate information and increases ownership of the numbers. Automated reminders from your CPM system can highlight missing data to ensure best practice across your organisation.

Identify and Correct Inconsistencies Sooner

The use of reports, visualisations, and dashboards in sharing information helps with quick distribution of data, so that the wider business can easily identify and investigate inconsistencies or anomalies.

Teams responsible for inputting data will see the roll-on effect sooner rather than later and be able to make changes to correct current and future mistakes.

Clean Data, First Time

Longer-term, those involved in data collection will be more deliberate in their data entry, because they’ll know what information is important. The Six Sigma practitioners refer to this as “First Time Right” and the benefits of even striving for it are immense.

Before putting the data to use, you need to develop a strategic framework to determine potential data security risks and applicable compliance requirements, working with our professional consultants and a solution like CALUMO can help you address these points.

Four Tactics to Keep Your Data Clean

1.       Shift Organizational Data Culture Through Clear Communication and Leadership Buy-In

The most effective way to ensure clean data is to make it a priority across your business. Leadership buy-in is essential.

Clearly communicating why clean data is important to all staff will help shift your organisation’s data culture. Make ‘data management’ a recurring agenda item in weekly meetings or stand-ups, encouraging staff to raise issues or concerns.

Arrange meetings with data collectors to discuss best practices and provide regular updates on any process or system changes.

2.       Provide User Training and Education

Providing training and education not only encourages system adoption among staff, it also ensures complete and accurate data entry from the outset.

Training should include practical skills for how and what data needs to be entered, but also information on what constitutes clean data.

Regular, periodic training of all staff, not just new staff, eliminates any bad data habits, like corner-cutting, that may have been learned. Here you can also teach best practices, such as updating any incorrectly formatted data and checking for existing data prior to creating new entries.

3.       Configure your Database or System

Limit potential errors by configuring your database or system to only accept data that is the required type and format. You can do this using the following methods where appropriate:

  • Set up mandatory data fields to ensure all critical information is complete and accurate.
  • Create a drop down with set entries so users can’t enter irrelevant content.
  • If you know that a field requires a certain amount or type of characters, limit the field size to the type and maximum number of characters, so users can’t enter additional information.
  • If multiple users access your data, keep it secure by providing access rights for users appropriate to their role.

4.       Assign a Data Champion

Assigning a dedicated person to administer your processes will help to maintain data consistency and make your database easier to manage. As part of their role, the data champion not only monitors and cleans system data but improves data collection processes.

Your data champion can help empower staff to adopt best practices and ensure leadership buy-in.

Keep your data clean by upgrading your systems to a robust CPM solution, like Calumo, to support your data collection and analysis processes.

Which Comes First, Data Cleanse or CPM solution?

A question we hear every day from our prospective customers is: “Is it worth implementing a CPM system if our database is not in good shape and needs a cleanse?

Anyone who has undertaken a data cleanse in preparation for a database migration knows that it can be complicated, time consuming, and expensive–no small task. It needs to be thorough and process-driven, or you will just end up back where you started.

Your team needs time to understand trends, isolate errors, control data entry points, cleanse the data, redevelop strong processes, and then test and repeat. What you see as a sequential process is, in fact, an iterative, endless process. But please don’t let that deter you, we’re here to help.

What many people don’t know is that the right CPM solution can significantly expedite your data cleanse process. This is especially true in the early stages, where oversight of your current data set can really help your team understand trends and isolate errors. This early insight is essential for long-term success.

How CPM Can Help Cleanse Your Data

A CPM solution can help at different stages of your data cleanse process.

  1. Inspection

At this stage, data is inspected to detect unexpected, incorrect, missing, and inconsistent data. A CPM solution brings together disparate data into a single platform, providing a single source of truth. Your team can inspect the full data set more easily, which makes it easier to identify trends, inconsistencies, and missing data.

Data Profiling

A statistical summary of your data helps assess its quality. Not only does the process of implementing a CPM solution help with data profiling, CPM solutions can also identify, flag, and report outliers and errors.

Visualizations

Visualisations present data in a way that makes it easy to understand by everyone, not just finance experts. Interactive reports, dashboards, and other tools allow your team to analyze data quickly to find values that are unexpected and thus erroneous.

  1. Cleaning

Cleaning data comes next and involves fixing or removing the anomalies discovered. A CPM solution can automate and standardize this process. By reducing manual manipulation of data, you also reduce the likelihood of errors. Configuring your system and standardizing data entry fields can prevent duplicate data and inconsistent formatting, which can all affect the quality of your data.

A CPM solution ensures that any changes made to your data are reflected across the board, so everyone is working from the most up to date information. Real-time data entry also helps keep your data clean and up to date. Built-in Calumo features, like writeback, also allow you to update source systems, so that data is clean everywhere.

  1. Verifying

The resulting data needs to be inspected to verify correctness. A CPM solution allows you to publish reports and distribute information so that data can be analyzed and verified by your team.

Features like drill-through and drill-to-transactions make it easier for your team to verify data and investigate anomalies.

  1. Reporting

Using CPM tools, you can track and report on changes made to your data. Calumo features, including embedded commentary, allow system users to explain changes, anomalies, or their investigation findings within the one platform. You can also set automated reports to highlight missing data, which ensures that data is updated and increases the accountability of your teams.

Data cleanses are not trivial tasks. But they need not be daunting. Get on the front foot and give yourself the best chance of long-term success with a CPM platform like Calumo.

 

Contact us to speak to one of our data specialists today.

 

Additional sources

https://www.dmnews.com/data/news/13086153/study-poor-data-quality-costs-600b-yearly

https://www.wsj.com/articles/ai-efforts-at-large-companies-may-be-hindered-by-poor-quality-data-11551741634

https://www.edq.com/resources/data-management-whitepapers/2019-global-data-management-research/

https://www.pwc.com/us/en/services/consulting/cybersecurity-privacy-forensics/library/pulse-survey.html

https://www.datascienceassn.org/sites/default/files/Measuring%20Business%20Impacts%20of%20Effective%20Data%20I.pdf