Blogs

Discover the ‘data iceberg’ and 11 tips to get around it

12/09/2023 Stefan Daelemans

As with many IT projects, you may encounter an “iceberg” when using data. Of course, this is not a real iceberg, but a metaphor for all the aspects that take place ‘underwater’ and are therefore often not immediately visible to end users or less technically savvy colleagues. These ‘hidden’ elements are often underestimated, just like the size of a real iceberg.

We all talk about the tremendous opportunities presented by data and are inspired daily by amazing data products such as intelligent AI applications, predictive Machine Learning models and interactive dashboards that can magically change opinions by underpinning them with actual data. Yet organizations that exploit these capabilities on a large scale are still scarce. Only 11% of Dutch organizations claim to be successful in increasing their data maturity. So there is a clear gap between what is possible with data and what actually happens with data.

As far as we’re concerned, it’s simple:

without addressing the topics that are ‘underwater,’ successfully deploying data at scale within an organization will remain a challenge. With this article, we want to make you aware of the ‘data iceberg,’ reveal its various elements and get you started on the road to identifying the relevant topics.”

Do you know what’s underwater in successfully deploying data?

The solution ?!

Unfortunately, there is no one standard path & solution to successfully deploy data within every organization. This is because it depends on several factors, such as your current level of data maturity, your ambitions regarding data and various strategic and organizational choices.

For example, you may choose to outsource certain aspects or organize them internally. It is also important to determine which issues are most relevant to your organization and to what extent. Although we all have to deal with privacy and security laws, their impact on your organization can vary based on the type of data you process or hold, the applicable regulations and the decisions you make as an organization.

This article provides an overview of the issues that are “underwater. In separate blogs and videos, we will delve deeper into these topics and present concrete examples and solutions to help you address them.

Underwater

Below is a list of what exactly is underwater and why. In addition, we provide one tip per topic on how you might deal with it. By the way, many of these topics can also be found in a data management framework such as e.g. DMBOK. The list below contains an order and it is there for a reason. Yet it does not automatically imply that you always have to go through these points in this order. So see it as a guideline. It is different for each organization which topic has the highest priority.

Your data organization: (creative) data heroes

“A team of people from the organization who will use & promote data”

Perhaps a crazy first point, but without thinking about who will soon be working with all the data you make available, all the steps below are meaningless. Many organizations assume that people will automatically know what data is needed to make certain decisions or improve certain processes. Nothing could be further from the truth, unfortunately. So make sure you have thought in advance about who is going to be the driving force behind your data initiatives.

Our tip: Make this a multidisciplinary team of people and facilitate creativity & out of the box thinking. Compile a list of potential data products and carefully choose the one that will bring the most benefit to your organization.
Look not only at value but also at feasibility.

Data Governance

“All the processes around monitoring the data. From quality to findability”.

All the technology you deploy as an organization to support your data initiatives is only worth the investment if it is used. In addition, all the topics below do involve a ‘maintenance’ aspect to some degree. So seeing the deployment of data as a one-time project does not work. It is an continuous process where you have to find the link to the users & your organization all the time. There is data governance tooling that can support you in setting up & controlling this continuous process.

Our tip: When making these processes and responsibilities transparent, always work with names recognizable to the organization. Think of already existing department names, product groups or the usual designation for your customers.

Data Architecture

“Just like a house, the foundation is the foundation to build on.”

Just like building a house, it’s important to think carefully about your data architecture. Here you don’t have to settle every issue right away, but there are certainly some principles you want to think about beforehand:

What types of data do I have (e.g. structured, unstructured, real-time (sensor) data)
How do I make sure I stay in control of all the data I have & process (metadata management)
On what dates should I build history?
What are the requirements for data exchanges?

Our tip: Grab a data architecture framework such as e.g. DMBOK that prescribes this guidance for you. If you don’t have an in-house architect who has experience in this area, get independent guidance. Just like a house, you want the architecture to be well thought out. Small adjustments can be made afterwards, but major drastic changes (in the construction, for example) are always difficult afterwards.

Data Platform (Storage & Processing)

“Buy or build, a modern data platform you don’t have to invent & build yourself”

As an organization, if you want to get started with your data, you want it stored somewhere so you can work with it. Often this is done at one central point in a modern data platform to prevent data silos and ensure that you can manage your data. Setting up a data platform often involves (design) choices that have to be made that are often radical because everyone has to start using this platform. In addition, there can be considerable technical complexity involved in setting up a data platform yourself.

Our tip: Determine if you really need to design & implement your data platform yourself. For 95% of organizations a standard data platform in the public cloud is sufficient.

Data links

“Pay attention to manageably unlocking data from your source systems”

To move your data from your source system to your data platform, you need data links. Of course, using the data directly in your source systems is also possible, but often this creates more limitations than benefits. Maintaining & keeping these links operational is often underestimated. In particular, changes in source systems and changes to the underlying data can be a headache if you don’t take this into account properly.

Our tip: Make data links easy to maintain by using an integration platform, don’t make them too complex & assign clear responsibilities. Preferably purchase links as a service, so you are always guaranteed up-to-date data & support in case of any problems.

Data Modeling

“Use industry standards as a starting point for a future-proof & immediately deployable data model”

Bringing all your data together so you can access it is done with the help of a data model. A good data model helps you with the usability and maintainability of your business data. If you do not have a good data model, users will find it difficult to find and use the right data. In addition, maintenance becomes more difficult and complex as the amount of data grows. Creating a data model in which all of an organization’s data fits is a complex process in which many discussions will emerge about definitions. For example, consider “when is a customer a customer?” and “how exactly do we calculate our margin?”.

Our tip: Use a standard data model developed for your sector and adapt it to your specific organization. By doing so, you take a sector standard as a starting point where many discussions are already settled.

Data Quality

“Garbage in is garbage out, tackle the problem at the source & control quality”

Data must be of high quality so that users can trust the information provided. Unfortunately, in practice this turns out to be easier said than done. Many data quality issues are caused because the source systems contain incorrect information or because incorrect assumptions are made when transforming data and/or preparing calculations. The mistake that many data teams make is that they start solving these quality issues in the data platform itself instead of in the source. In addition, proper tests are not performed to see if the data quality is (still) correct by, for example, setting up data quality rules and testing them when refreshing the data.

Our tip: Always address faulty data at the source and establish a data quality framework with concrete rules that allow you to test whether the data being offered to your data platform meets the conditions you set.

Data Privacy

“Always determine if you are allowed to use data for the purpose you want to achieve, this is called purpose binding”

Privacy-sensitive data is present within every organization. As is the responsibility to handle the processing of this data with care. Yet you see that many organizations do not know which data they may and may not process and especially which sensitive (personal) data. Data including BSN or matters such as nationality and political affiliation are data that you may not always and just like that have access to. So chances are you will have to pseudonymize or anonymize this data. Another option is to replace the data with synthetic data.

Our tip: Scan within your data platform for sensitive data and – together with your privacy officer & the users of this data – make a plan what you can best do with this data. In some cases, it may actually be desirable or necessary for you to have this data.

Data Compliancy (Data Lineage)

“Build history of the (meta)data & logging so you can always trace how certain calculations came about. This is called Data Lineage”

Especially for financial service providers it is important to be able to demonstrate where certain data in e.g. reports comes from. It is important to be able to demonstrate how certain data were created (e.g. which calculations and transformations were applied to your data). Here it is important to mention that you should also be able to look back in time and explain how a report from two years ago was created.

Our tip: Use a data architecture that supports history building. We used to look directly at Data Vault for this, but that’s not always the best solution today. If you use version control, soft-delete functionality on your datalake and store & process your data in Delta format, you meet almost all requirements.

Data Security & Sharing

“Gone are the days when data never leaves your data warehouse & corporate network. Prevent ‘shadow IT’ and facilitate secure data sharing.”

You want to share the data in your data platform, both internally within your organization and increasingly externally. In doing so, you need to take into account the security aspects that your organization, the receiving party or regulatory authorities impose on your platform. Sending data via e-mail, FTP servers or WeTransfer services is strongly discouraged, as you will lose control over what happens to this data.

Our tip: Create your own data environment where you can have the data ready to go and stay in control of who has access to that environment. A service like Azure Data Share helps you do that securely.

Monitoring & DataOps

“A data platform has many moving parts. Facilitate your team with the right tools so you can be sure the data is current & accurate.”

In today’s world of tools such as Power BI, for example, it is relatively easy to extract data from a source and then present it on a dashboard. Yet, regularly reloading this data proves challenging for many organizations. This requires monitoring, for example, and an organization capable of taking action based on these signals. Moreover, proactively tracking and maintaining all the moving parts in your data platform is essential to ensure that your data products continue to deliver relevant and up-to-date information.

Our tip: Make clear agreements with a multidisciplinary team about who is responsible for monitoring & maintaining the data products. Set up a monitoring tool and collect the signals in a central dashboard and discuss this process periodically with each other to ensure that satisfaction remains high.

No doubt there are other examples of data-related issues that are “underwater” when deploying data within your organization. Feel free to leave a comment and share your experiences with us & others.

Back to overview