What is data science?
The term data science is thrown around so frequently these days that its actual definition has become elusive. On sites like LinkedIn, it has become popular for anyone with the most tenuous statistics background or any experience dealing with computer systems to refer to themselves as a data scientist. But is this an accurate description?
Many people who dedicate their lives to the study or furtherance of data science believe that the definition has expanded to the point of near meaninglessness. If someone who worked the night audit shift at a local motel is now able to call themselves a data scientist and get away with it, then anyone can call themselves a data scientist.
But the truth is that real data scientists are still doing extremely important work. And the nature of that work has little to do with what most people who could rightly be referred to as data analysts but not data scientists do. The core of data science is the use of machine learning and artificial intelligence techniques. And those techniques are employed to discover causal relationships, to predict future outcomes or to better understand relationships between things that have already occurred.
Although it is clear that some people are stretching the definition of data scientist to its breaking point in order to pad out their resumes, it is not always clear where the line should be drawn between data analyst and data scientist. Someone who performs genuinely sophisticated analyses using advanced machine learning techniques may still not qualify as a true data scientist, for example, if they are not creating the underlying tools that they use themselves.
One of the best ways to demarcate the line between data scientist and data analyst is by determining if the nature of the activities that the person does is primarily domain specific. A true data scientist is generally looking at data sets and their analysis from a high level of abstraction. Under this definition, what the data scientist does is rarely domain specific. The solutions that data scientists develop are general and can therefore be applied to a wide number of domains and problem types so long as the data conform to the rules established for using the specific tool in question.
While there are no hard-and-fast answers regarding who is a data scientist or a data analyst, these considerations should help clarify the distinction.