Data masking involves replacing certain data in a dataset with other data so that the real data is not visible & traceable. This is often done to avoid exposing confidential or personal information when the dataset is shared or used for testing or analysis purposes.

The substitutions used may be arbitrary values or other, non-confidential data that resemble the real data. The challenge is often in ensuring that the substitutions do remain realistic and that the relationships remain intact so that the dataset remains usable for analysis.