Building a trustworthy data(team) — Part 1

Santhosh Kumar M
3 min readDec 27, 2020

Data driven business have become a necessity in the modern age because of the insights it can generate and the ROI it generates. The crucial part of the data team in how you would engineer the data for analyst or data scientists or top level executives. Slowly this has built a question in the minds of the consumers of data whether they can trust the data they are being fed. And they can’t be complained for asking this question. Instead, it is the responsibility of the data team to establish trust with the downstream consumers. This series of blog depicts my personal view on how to run a data team based on my experience.

Running a data team is same as running a bank. Both operates on trust.

We’ll take a look on how to establish trust by following some data practices.

1. Never change data that has been delivered

Its always better to not bring up any numbers rather than coming up with half baked numbers and updating them later. Violating this principle will result in failure to reproduce results when a system has read data just before you update it. This will eventually result in the downstream consumers lose trust in the data being delivered. And end users go directly to the source system to collect data and report corresponding metrics. This also had some cost implications and a clear sign that the data is not aligned with the end users’s requirement.

An exception to this keypoint is, when there is a change in the business definition on the metrics being computed. This situation demands a reprocessing of the data to run business as usual.

2. Always speak in numbers

Whenever there is a loss in the data being delivered, always come up with numbers. The word numbers here differ among business contexts. In a system that deals with payments, this denotes amount that is being processing/settled. In a system that deals with clickstreams, this denotes the number of users and number of clickevents. In a ride hailing system, this may be a combination of both amount and number of orders.

The more numbers you come up with, the more clarity you’d get on the impact of any particular event.

3. Domain knowledge is your best friend

I’ve seen people say domain knowledge is the key for data scientists. I’d say this is also applicable to anyone who is working on the data team(Some might not agree with me). Without domain knowledge, you can’t prioritize the goals of your data team. In brief, you won’t know how much are you losing and how much you are gaining from data.

I’d suggest companies to involve data team to any discussion that involves data whether is integrating a new source or rolling out a new feature.

In the next article, we’ll look at some tradeoffs for data processing and how to use them for our effective gain.

What other data practises do you follow in your organisation? Do let me know in comments.

Thanks to Thiyagarajan, Achyut Nayak, Balachandran and Vivek for reviewing this article.

Click on this link for next part on this series.

--

--