Data Science vs. Software Engineering

Data ScienceThe work of a Software Engineer is to analyze a problem, think about a good solution, design it, implement it and then test it. Software Engineers do problem-solving. The work of the Software Engineer ends when he finishes implementing the solution for the problem.

The work of a Data Scientist is to make good predictions or classifications based on past behavior. The Data Scientist does not create solutions, he creates models that are able to generate such predictions or classifications. The Data Scientist then tries to optimize these models, so that the new models will generate better predictions or classifications than the previous models. The work of the Data Scientist never ends. In particular there is always more work when behavior is changing over time.

It may take lots of time until the Data Scientist is able to generate his/her first good model. The Data Scientist needs to analyze the data, clean the data, generate new features, select the best features, train models, try different Machine Learning algorithms, try different parameters for each Machine Learning algorithm and measure diverse offline metrics. Each one of these steps may require several weeks of work or even months.

Thus, the work of the Data Scientist should not have hard deadlines. It is very difficult for a Data Scientist to provide good estimates of when he/she may have a model that is good enough to be deployed in production.

Implementation vs. Experimentation

ExperimentSoftware Engineering teams are mostly busy implementing new features. Data Science teams are mostly busy running experiments.

A feature has a clear functionality being provided, and in general it is a solution to a problem. By the end of the implementation, the new feature becomes part of the system. The goal of implementing features is to add functionality to the system.

An experiment is a way to check a hypothesis. Depending on the results of the experiment, we may prove that the hypothesis is true or false. If we prove that the hypothesis is false, this does not mean that the experiment has failed. Our goal was to learn and we have learned. Based on what we have learned, we can make a new hypothesis and plan a new experiment.

It is possible that in most experiments we will prove that the hypothesis is false. This does not mean that we are wasting our time. It is natural that when facing new and complex problems we need to try many different approaches until we find one that works.

Data Science teams are like a laboratory. If we measure the progress of a Data Science team in the same way we measure the progress of the Software Engineering teams, we may reach the wrong conclusion that the Data Science team is not productive enough.

About Hayim Makabee

Veteran software developer, enthusiastic programmer, author of a book on Object-Oriented Programming, co-founder and CEO at KashKlik, an innovative Influencer Marketing platform.
This entry was posted in Data Science, Machine Learning, Software Engineering and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s