What is Data Science?

Misc

A statistician’s perspective on data science and what it takes for organizations to make it work.

Author

Trent McDonald

Published

May 20, 2026

Modified

May 15, 2026

Over the past decade I’ve had sevral conversations with clients, colleagues, and students about data science — what it is, who does it, and why it matters. And I’ve noticed that the term means different things to different people. That ambiguity isn’t anyone’s fault, because the field genuinely spans a wide range of activities and skill sets. But the confusion does create real problems for organizations trying to invest in it wisely.

In this post I’ll share my working definition of data science, along with some thoughts on the value it brings, the challenges organizations face in implementing it, and the skills required to do it well.

Data science definition

The term data science is broad and evolving, which makes it hard to pin down. Wikipedia’s Data science page (as of May 2026) defines it as:

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms, and systems to extract or extrapolate knowledge from potentially noisy, structured, or unstructured data.

That’s technically reasonable, but the language is abstract enough that it isn’t still very helpful for day-to-day work. The same Wikipedia page offers a narrower take: a data scientist is someone who “creates programming code and combines it with statistical knowledge to summarize data.” That’s closer to something operational, though still not very helpful in my view.

My definition

My working definition of data science breaks the field into three interconnected components:

Data collection: Compiling and physically gathering data, whether via paper data sheets, automated sensors, cameras, satellites, or other means.
Data husbandry: Quality assurance and quality control (QA/QC), getting data into a structured database, and maintaining that database over time through updates and backups.
Analysis: Everything we used to call statistics before the term “data science” took hold. This includes descriptive statistics (charts, histograms, summaries) and inferential analyses (regression, t-tests, ANOVA, bootstrapping, and so on).

In my experience, when most people say “data science,” they’re primarily thinking about components (1) and (2). Many practicing data scientists spend the bulk of their time on collection and husbandry, with less emphasis on formal inferential analysis. That’s not a criticism because those components are genuinely important. It’s worth being clear, though, about where the analytical piece fits in.

Incorporating data science into an organization

Organizations of all kinds are doing data science every day, whether or not they call it that. Accounting involves data science. Time tracking is data science. The question isn’t really whether an organization does data science. The real question is how effectively they do it.

Understanding the value data science brings — and the obstacles that get in the way — is a useful starting point for any organization looking to improve.

Value

When an organization develops a healthy data science culture, the benefits are real and wide-ranging:

Reproducibility: Results are easier to verify and reproduce when base data are well-organized and well-documented.
Trust: Stakeholders are more confident in results when they know the underlying data are clean and accurate.
Synthesis: Well-managed data can be combined across studies, enabling better science and stronger conclusions.
Efficiency: Responding to data requests or compliance requirements becomes straightforward when everyone knows where the final data live.
Retention: Analysts tend to be more satisfied and more likely to stay when their work is formally acknowledged and supported by management.

Challenges

The biggest obstacle I see in organizations isn’t technical. It’s cultural. When management’s message to analysts is “just get it done,” without dedicated time or resources for data science tasks, the result is often inefficiency, errors, and unreliable outputs.

The most common gaps I observe are:

Personnel: Not enough people with the right skills.
Time: Data science work is frequently squeezed into the end of a project, after upstream delays have compressed the timeline.
Training: Teams may lack formal statistical training, even if they’re comfortable with data wrangling.
Infrastructure: Database servers, version control, and proper data storage are often underinvested.

These challenges are all manageable, and thoughtful planning goes a long way. One of the most practical steps an organization can take is to add a data science line item to every project budget. In my experience, data science work (collection, husbandry, and analysis) tends to cost around 10% of total project costs. Those costs exist regardless of whether they’re formally budgeted, so making them explicit helps ensure the work gets the time and resources it actually needs. Dedicated budget line items and realistic project timelines give data scientists the space to do their work properly.

Data science skills

With a clearer picture of what data science involves and what it takes to support it organizationally, it’s worth thinking about the individual skills that make it possible. Under my three-part definition, a well-rounded data scientist needs competency across all three components. Practically speaking, that means:

Database skills: Proficiency in tools like Excel and SQL for storing and querying data.
Programming skills: Fluency in R and/or Python for data processing and analysis.
Statistical skills: A solid grounding in descriptive statistics and statistical reasoning at a minimum. For inferential work, graduate-level training in statistics is a real advantage.

No one is equally strong in all three areas, and that’s fine. Good data science teams often have complementary skill sets. The important thing is that all three components are represented somewhere in the organization, and that the people doing the work have the support they need to do it well.

Comments or want to discuss this further? Feel free to email me (trent at mcdonalddatasciences dot com).