Clean Data – Part 3: Tactics to keep your data clean

Filter By Topic


According to The Data Warehouse Institute (TDWI), ‘dirty’ data is costing US companies around $600 billion every year in lost revenue, missed opportunities, and ill-informed strategic decision-making. Data is ‘dirty’ when it is incorrect, incomplete, irrelevant, duplicated, missing, or improperly formatted.

Part one of this series explored how CPM can help clean your data. Part two explained why you don’t need to wait for perfection before publishing your data. Now, in part 3 of this series, we look at what can cause dirty data and tactics that you can use to keep your data clean.

For many organisations dirty data is caused by:

  • Collecting data haphazardly over the years through multiple sources.
  • Using older technology or systems that can’t keep up with current data demands.
  • Trying to condense large, complex datasets into a more manageable form.
  • Integrating systems with duplicated or mislabelled data.
  • Linking different data sets after a business merger or acquisition.
  • Users who lack understanding of data systems and how to use them.

Four tactics to keep your data clean

Shift organisational data culture through clear communication and leadership buy-in

The most effective way to ensure clean data is to make it a priority across your business. Leadership buy-in is essential.

Clearly communicating why clean data is important to all staff will help shift your organisation’s data culture. Make ‘data management’ a recurring agenda item in weekly meetings or stand-ups, encouraging staff to raise issues or concerns.

Arrange meetings with data collectors to discuss best practices and provide regular updates on any process or system changes.

Provide user training and education

Providing training and education not only encourages system adoption among staff, it also ensures complete and accurate data entry from the outset.

Training should not only include practical skills for how and what data needs to be entered, but also information on what constitutes clean data.

Regular, periodic training of all staff, not just new staff, eliminates any bad data habits, like corner-cutting, that may have been learned. Here you can also teach best practices, such as updating any incorrectly formatted data and checking for existing data prior to creating new entries.

Configure your database or system

Limit potential errors by configuring your database or system to only accept data that is the required type and format. You can do this using the following methods where appropriate:

  • Set up mandatory data fields to ensure all critical information is complete and accurate.
  • Create a drop down with set entries so users can’t enter irrelevant content.
  • If you know that a field requires a certain amount or type of characters, limit the field size to the type and maximum number of characters, so users can’t enter additional information.
  • If multiple users access your data, keep it secure by providing access rights for users appropriate to their role.

See our tech tip on Data Validation for tips on how to restrict data in Excel.

Assign a data champion

Assigning a dedicated person to administer your processes will help to maintain data consistency and make your database easier to manage. As part of their role, the data champion not only monitors and cleans system data, but improves data collection processes.

Your data champion can help empower staff to adopt best practices and ensure leadership buy-in.

Keep your data clean by upgrading your systems to a robust CPM solution, like CALUMO, to support your data collection and analysis processes. Speak to one of our data specialists today on +61 2 8985 7777 (AUS), +1 214 387 6030 (USA) or

See part one and part two of this blog series for information about how you can use a CPM solution to publish your data now, to help cleanse your data.

Additional sources