What is Data Science?
A statistician’s opinion on data science and what it takes for organizations to incorporate.
The term data science is currently trendy, vague, and overused (in my opinion), so much so that it causes confusion. This blog post is my opinion on what data science means and how it affects organizations. I give a working definition of data science, then opine on the value, challenges, and skills required to implement good data science in an organization.
Data science definition
Data science is currently an overused and trendy term. As a term it is ill-defined and so general that it is not helpful. For example, Wikipedia’s Data science page (15-May-2026) stated…
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms, and systems to extract or extrapolate knowledge from potentially noisy, structured, or unstructured data.
That definition, while techically accurate, includes too much jargon and is too vague to help. “Scientific visualization” and “algorithms” and “systems” are very vague terms. Later, this same Wikipedia page gets closer to a useful definition; but, it is still inadequate in my view,
A “data scientist” is a professional who creates programming code and combines it with statistical knowledge to summarize data.
Incorporating data science into an organization
In an ideal world, businesses realize they are doing data science. Businesses do data science every day (accounting involves data science, time cards are data science) and it is in their best interest to invest in and support data science. The question is not whether buisinesses perform data science, but how efficiently they perform it.
The best way for businesses to incorporate good data science into their culture is to add a data science line item to every budget. Every project budget should, in my opinion, include time and money for data compilation and husbandry (i.e., getting cleaned data into a formal database), as well as analysis. It has been my experience that data science costs approximate 10% of total project costs on average (sometimes less, say 7% to 10%), especially if done poorly. These costs are realized whether data science tasks have a dedicated line item or not. It would be better to acknowledge data science honestly and add it to project budgets as a separate line item.
Value
I see huge benefits when an organization cultivates an active and healthy data science culture. Project results are easily reproduced and hence more stable because the base data are more stable. Users place more faith in results because they trust that the data are clean and accurate. Data can be easily amalgamated across similar studies which facilitates better conclusions (i.e., better science). Compliance and data requests are simple and quick because everyone knows the final data’s location. Analysis personnel (those in the trenches) are happier and remain with organizations longer when management formally acknowledges and rewards their work.
Challenges
Data inefficiencies result when data collection, data husbandry, and statistical analysis are not focal points of management. Business cultures that say to analysts, “Just get it done by Friday. We don’t care how.” breed data inefficiency, data errors, and inaccurate reports. That is to say, the biggest challenge businesses face in implementing good data science practices is lack of management buy-in. While I think buy-in is increasing, a lack of personnel, time, training, and infrastructure (database servers and networks) are the biggest things that prevent good data science practices in organizations. Clients and organizations rightly want to minimize costs; but, these challenges all arize from the lack dedicated line items in project budgets and defined space in project timelines.
Another challenge to implementing good data science is that data storage and analysis typically occurs at the end of a project (right before report writing), when deadlines are compressed due to upstream timeline slippage. Again, dedicated timelines and proper upstream management will give data scientists adequate time to perform proper data science tasks.
Data science skills
To be a data scientist, under my definition, a professional needs skills in all three of the field’s components. From a practical point of view, a data scientist requires the following skills:
- Database skills, including Excel and SQL.
- Programming skills, R and/or Python.
- Statistical analysis skills, training in descriptive statistics and statistical reasoning at a minimum. For inferential analyses, a master’s degree in statistics (at least).
