As data scientists, we always work on multiple projects throughout the year, and only a handful of them are finally used and implemented in the real world. All projects are crucial to a data scientist who uses his/her expertise and skills, a lot of research, and resources to finish the work only to have it pushed back. When it comes to your year-end review, the manager might not give you credit for the hard work you put into those long-forgotten projects. Only the projects that were used and implemented are credited.
Data science as a stream is research-based, and this approach is not entirely valid as some research will fail because of the design and nature of the study, and some hypotheses will be rejected. But our approach as data scientists should be to make sure that we are able to deliver an impact, make it a project to be proud of, and include it on our resume as an accomplishment.
Let’s examine some of the key best practices for designing a successful data science project based on my own experience.
1. GET THE FOOT IN THE DOOR
We all want to see some quick wins and get some low-hanging fruits before taking a big plunge. This is the same for business stakeholders, who hold the key to sponsoring data science projects.
So, first things first, data scientists need to get the stakeholders interested by doing ad hoc analysis in the form of understanding the as-is process, finding, and quantifying the problem statements, understanding how things have changed over time, understanding how macro-economic conditions or major business decisions changed the landscape by doing dashboarding, bi-variate analysis, boxplots, violin charts, etc. This ensures that data scientists receive a platform to explore avenues for bigger and deeper projects.
2. THERE IS NOTHING CALLED “OVER-COMMUNICATION”
It never hurts to communicate regularly and frequently with all the stakeholders along the length and width of the company. Many times the importance of communication is realised only when there is a gap in requirements gathering and business understanding, but this can be avoided if the communications channels are robust.
3. TAKE YOUR TIME TO SCOPE THE PROJECT TIGHT
Data scientists tend to jump directly to data preparation, collection, and aggregation. We fail to understand the complete scope of the project. We fail to ask crucial questions such as time considerations, whether we should build the model for all divisions or just one segment, how the dependent variable is defined, whether the target is defined tightly, and how we would use the results of the model.
These gaps lead to a situation where a stakeholder asks a basic question about the model scope, and often the data scientist has to say, “I shall double-check and get back to you”. This is a big opportunity lost to build trust and credibility.
4. ONBOARD ALL THE STAKEHOLDERS FROM THE PROJECT KICK-OFF STAGE
I have faced many situations after finishing the model where the tech team is called and they explain that they were not aware of the model that was being built and that the model cannot be deployed for the next two or three sprints.
Furthermore, the regulatory and data compliance/protection team might throw in some regulations that are contradictory to the project at hand. It’s best to get buy-in from almost all stakeholders from the very start of the project to avoid such surprises.
5. THINK THROUGH THE PROBLEM STATEMENT TILL THE IMPLEMENTATION PHASE
Often, data scientists are only concerned with the model part of their project and do not think through the implementation challenges, business rules overlaying the model predictions, or the process around how the model predictions will be implemented. But the model is only a small percentage of model building.
6. MAKE AN AGILE PROJECT PLAN
In this age, who can succeed without being agile? The requirements can change super quickly, team priorities can change, and the team themselves can change, so the project must be agile in planning and execution.
7. DON’T STOP AFTER THE MODEL IS VALIDATED
I am guilty of celebrating after I have a good model and good statistics corresponding to it. But with experience, I have learned that it’s just the tip of the iceberg. Unless the model results are tied to business KPIs, there are no takers for a good model statistic.
After the model is calibrated and finalised, there is a lot of analysis left to derive the potential business value of the model. If a data scientist can not show potential impact in terms of GBP or dollars, there is a high likelihood that your C-level executives will not even bat an eyelid.
8. KEEP THE MODEL MONITORING FRAMEWORK READY
The model is built and finalised, but if the model monitoring framework is still in the nascent phase, it cannot be used. Models can lead to harmful results if left unmonitored, as they drift over time and need to be updated frequently.
9. INCORPORATE THE FEEDBACK FROM STAKEHOLDERS AND SHARE THE RESULT
Stakeholders usually share a lot of insights about their business, and those can form the basis of the hypothesis that needs to be tested. Even though many stakeholders are not data people, the insights and understanding they have about the business should not be missed by the data scientist. Data scientists need to connect the dots between the hypothesis and the insights from the data.
10. BE DATA-DRIVEN AND DO NOT EMOTE WHEN FACED WITH CRITICISM
Last but not least, data scientists need to be data-driven in all scenarios, even when faced with criticism. Sometimes, after doing hard work and research, some data scientists can react adversely when faced with backlash. We must handle such situations with data-driven insights rather than taking them personally and forgetting the bigger goals.
11. MODEL EXPLAINABILITY IS NON-NEGOTIABLE
Models are not easily incorporated when they are black boxes. Once the model explainability feature is included in the solution, it’s very easy to trust and institutionalise the model.
Abhigya Chetna is a data science professional. Over the years, she worked to provide analytical solutions in a wide range of industries across the globe. This diverse experience gives her an edge to crossbreed her skills and deliver impactful solutions.
She enjoys being an analytical translator for non-data professionals and aims to work for data literacy.