Data quality is the degree to which data meets set requirements and expectations and is suitable for use. When deploying data effectively within an organization, it is important to ensure high data quality because data that is of low quality is not reliable, can lead to erroneous or inaccurate insights and damages trust in the data analytics environment.

Data quality can be affected by factors such as erroneous or incomplete data, duplicates, inconsistency and unexpected changes in the data.

If you want to increase your data quality you will need to address the following areas:

  • Complete data

    This means that the data is complete and accurate and does not contain missing or incorrect values, for example, missing mandatory fields such as a date of birth.

  • Correct data

    This means that the data is correct and meets expectations and specifications, such as data in the correct format and with the expected values. A date of birth should always be in the past and be analyzable in the format dd-mm-yyyy.

  • Consistent data

    This means that the data is stored consistently across systems. For example, the customer or citizen number, name and address should be the same across all connected systems.

  • Unique data

    This means that the data is unique and does not contain duplicates when it should not be there. For example, consider excluding duplicate customers or orders.

  • Current Dates

    This means that the data is updated in accordance with an expected interval. A non-updated data set on which monthly billing is based can have serious financial impact.

  • Accurate data

    This means that the data must be accurate, excluding outdated dates and misspelled names as much as possible.

  • Authentic data

    This means that the data comes from a reliable source and has not been falsified. Nowadays an increasingly topical area as AI applications increasingly generate unvalidated information