Data Accuracy and What You Can Do About It
The other day, I was scrolling and saw this photo in a Facebook cooking group.
One of the messages from the photo is - always measure, validate, and know the legimacy of the content (weight), instead of assume your quite possibly inconsistent "one cup".
Straight away, I thought, this is also such a brilliant illustration of what to be aware and careful with data as a Data Analyst, or any data professionals.
When working with data, especially with unfamiliar data, it's all too easy to hastily extract and analyze data without due diligence and thorough validation process.
Let's give it an example - say the benchmark for banana sales for 2023 is 50 milions. But, your result sums up to be 40 millions because Feb to March data has been not been loaded or deleted somehow.
If you are someone who knows or been given the benchmark, it's not an issue - some validation processes will follow to find out the root cause.
But, much like the two cups that look identical, you might think, that looks about right and miss the blind spot. I will let you image the snowball impact the incorrect data can and would bring.
So, first thing first:
Always validate your value with the benchmark. If you don't have benchmark, then ask relevant people for it.
Of course, that's an easy way out. Sometimes, no one has the benchmark.
This doesn’t mean you have no way to sense check it. The followings are what I would do to sense check the value I’m getting in the situation.
- 1. Look for historical trends to spot anomalies. If there are any, follow up with questions or applying some contexts (holiday or promotion?) in order to justify or escalate.
- 2. Check the availability (e.g. missing period or missing stores).
- 3. Is there any other report, reporting on same/similar metrics?
- 4. Compare with other tables that records same/similar metrics (any table that uses same fact table or same metric with different granularity).
- 5. Check other metrics (e.g. Goods-In / Stock / Loss / Goods-Out movements should correspond to each other in a big picture).
- 6. Manipulate the metrics to seek any inconsistency (e.g. is
avg price per unit consistent? has margin changed significantly?). - 7. Make sure you understand every aggregation/filter that happened before the table.
- 8. Make sure you document your findings and ready to answer some questions!
I understand you won't necessarily have time to do all of the above for every situation but hope it gives you some idea. And of course, you might have better check-list! Please share in comment.
I love data visualization and reporting side, but from what I have seen, the most important and the absolute priority is the quality and accuracy of the data - it's worth your time and effort.
I’m not trying to claim an expert - just jotting down my thoughts as the photo reminded me the blind spot I can have.
Have fun to all who love working with data
*photo credit: Scott M Stallings