What is Data Science?

A statistician’s opinion on data science and what it takes for organizations to incorporate.

Misc
Author

Trent McDonald

Published

May 20, 2026

Modified

May 15, 2026

The term data science is currently trendy, vague, and overused (in my opinion), so much so that it causes confusion. This blog post is my opinion on what data science means and how it affects organizations. I give a working definition of data science, then opine on the value, challenges, and skills required to implement good data science in an organization.


Data science definition

Data science is currently an overused and trendy term. As a term it is ill-defined and so general that it is not helpful. For example, Wikipedia’s Data science page (15-May-2026) stated…

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms, and systems to extract or extrapolate knowledge from potentially noisy, structured, or unstructured data.

That definition, while techically accurate, includes too much jargon and is too vague to help. “Scientific visualization” and “algorithms” and “systems” are very vague terms. Later, this same Wikipedia page gets closer to a useful definition; but, it is still inadequate in my view,

A “data scientist” is a professional who creates programming code and combines it with statistical knowledge to summarize data.

The author’s definition

My definition of data science is more pragmatic that Wikipedia’s. My internal definition includes three components:

  1. Data collection: Compiling data. Physically collecting data. Data can be collected on paper data sheets, automated sensors, video cameras, cell phones, satellites, etc.
  2. Data husbandry: QA/QC, getting the collected data into a database of some kind, and maintenance (updates and backup) of the collected data.
  3. Analysis: Involves everything we called statistics prior to 2000. Statistics, in turn, involves computation of descriptive statistics (pie charts, histograms, tabular summaries, etc.) and inferential statistical analyses (t-tests, regression, anova, bootstrapping, etc.).

Many people (~85%) I talk to, and many places online, mean only (1) and (2) when they say “data science”. In my opinion, a large majority of “data scientists” focus on data collection and data husbandry and spend very little time on inferential statistical analyses.

Incorporating data science into an organization

In an ideal world, businesses realize they are doing data science. Businesses do data science every day (accounting involves data science, time cards are data science) and it is in their best interest to invest in and support data science. The question is not whether buisinesses perform data science, but how efficiently they perform it.

The best way for businesses to incorporate good data science into their culture is to add a data science line item to every budget. Every project budget should, in my opinion, include time and money for data compilation and husbandry (i.e., getting cleaned data into a formal database), as well as analysis. It has been my experience that data science costs approximate 10% of total project costs on average (sometimes less, say 7% to 10%), especially if done poorly. These costs are realized whether data science tasks have a dedicated line item or not. It would be better to acknowledge data science honestly and add it to project budgets as a separate line item.

Value

I see huge benefits when an organization cultivates an active and healthy data science culture. Project results are easily reproduced and hence more stable because the base data are more stable. Users place more faith in results because they trust that the data are clean and accurate. Data can be easily amalgamated across similar studies which facilitates better conclusions (i.e., better science). Compliance and data requests are simple and quick because everyone knows the final data’s location. Analysis personnel (those in the trenches) are happier and remain with organizations longer when management formally acknowledges and rewards their work.

Challenges

Data inefficiencies result when data collection, data husbandry, and statistical analysis are not focal points of management. Business cultures that say to analysts, “Just get it done by Friday. We don’t care how.” breed data inefficiency, data errors, and inaccurate reports. That is to say, the biggest challenge businesses face in implementing good data science practices is lack of management buy-in. While I think buy-in is increasing, a lack of personnel, time, training, and infrastructure (database servers and networks) are the biggest things that prevent good data science practices in organizations. Clients and organizations rightly want to minimize costs; but, these challenges all arize from the lack dedicated line items in project budgets and defined space in project timelines.

Another challenge to implementing good data science is that data storage and analysis typically occurs at the end of a project (right before report writing), when deadlines are compressed due to upstream timeline slippage. Again, dedicated timelines and proper upstream management will give data scientists adequate time to perform proper data science tasks.

Data science skills

To be a data scientist, under my definition, a professional needs skills in all three of the field’s components. From a practical point of view, a data scientist requires the following skills:

  • Database skills, including Excel and SQL.
  • Programming skills, R and/or Python.
  • Statistical analysis skills, training in descriptive statistics and statistical reasoning at a minimum. For inferential analyses, a master’s degree in statistics (at least).