The Junior Data Science Reality: You Are a SQL Janitor

This article is written for the "Kaggle Grandmaster" wannabe.

You have spent the last 6 months living in Jupyter Notebooks. You know the mathematical difference between L1 and L2 regularization. You have fine-tuned a BERT model on a dataset you found on Reddit. You dream in PyTorch and Scikit-learn.

You believe that your first job will involve "Building Models", "Training AI", or "Solving AGI".

You believe you are entering the industry as a Scientist — a thinker who will be paid to experiment, hypothesize, and optimize.

If you think your daily life will resemble an DeepMind research paper, this article is your reality check.

P. Mishra · December 2025 · Data Science
5 min read · Reviewed by Editorial Desk · Correction path: Contact
Last Reality Check: December 23, 2025

Key Takeaways

  • This piece focuses on data science realities in India, not outlier narratives.
  • Compensation numbers should be interpreted with role scope, market cycle, and switching friction.
  • Use decision frameworks and evidence checks before acting on title or salary headlines.

On This Page

The Expectation

The expectation is sold to you by EdTech influencers and Coursera certificates.

"Data is the new Oil."

You expect to walk into a company and be handed a perfectly clean, labeled dataset. You expect the Business Stakeholders to ask you for "Predictions" and "Insights".

You imagine your workflow like this:

  • Import Data
  • Train Model
  • Optimize Hyperparameters
  • Present cool 3D graphs to the CEO
  • Get promoted for increasing revenue by 20%

You think 80% of your time will be spent on Modelling and 20% on deployment.

You think SQL is "legacy tech" for backend engineers, and Excel is for finance guys.

The Reality

What Junior Data Scientists Actually Do:

📊 Data Science Job Reality

What Courses TeachWhat Juniors Actually Do
Machine learning algorithmsClean messy SQL data
Neural networksBuild dashboards
Statistical modelingAnswer ad-hoc data requests
Research papersExcel exports for business teams
Kaggle competitionsDebug data pipelines

The SQL Janitor Reality:

70-80% of junior data science work is:

  • Writing SQL queries to pull data
  • Cleaning data that's never clean
  • Building reports and dashboards
  • Answering "can you pull this data?" requests
  • Diagnosing why numbers don't match

The machine learning you studied? You'll use it on 5-10% of your tasks. And that's if you're lucky enough to have problems that need ML rather than simple analytics.

📈 Data Science Time Allocation

ActivityWhat You ExpectedReality (Junior Role)
Machine Learning50%5-10%
Data cleaning10%35%
SQL queries10%30%
Dashboards/reporting10%15%
Stakeholder requests5%10%
Meeting/communication5%10%

Case Study - The ML Dreamer:

Priya, 25, Junior Data Scientist at E-commerce Startup:

  • Masters in ML from good institute
  • Expectation: Building recommendation systems
  • Reality: "Can you pull last month's sales by category?"
  • ML projects worked on in 18 months: 1
  • SQL queries written: Hundreds
  • Dashboards built: 15+
  • Current feeling: "I'm a well-paid data analyst, not a data scientist"

Related context: Salary Reality Check, CTC Decoder, more in Data Science.

Salary and Growth Reality

Data Role Salary Clarity:

💰 Detailed Data Role Comparison

RoleYear 2Year 5Year 8ML Work %
Business AnalystRs 8 LPARs 14 LPARs 22 LPA0%
Data AnalystRs 10 LPARs 18 LPARs 28 LPA0-5%
Data Scientist (typical)Rs 12 LPARs 24 LPARs 40 LPA10-25%
ML EngineerRs 15 LPARs 32 LPARs 55 LPA50-70%
Applied ScientistRs 18 LPARs 40 LPARs 70 LPA70-90%

If you want ML work AND high salary, target ML Engineer or Applied Scientist. "Data Scientist" at most companies is analytics with occasional modeling.

Where Real ML Work Exists:

  • Tech giants (Google AI, Meta FAIR, Amazon Science)
  • AI-first startups (core product is ML)
  • Research labs (slower, academic style)
  • Specialized teams at large companies

Most companies don't have enough data quality, infrastructure, or business problems for real ML. They hire "Data Scientists" and give them analyst work.

Cross-check your take-home with the CTC Decoder and compare ranges in Salary Reality.

Where Most People Get Stuck

Where Junior Data Scientists Get Stuck:

The Analytics Trap:

You're good at SQL and dashboards now. You're valuable for that work. Company doesn't want to train you on ML—they need the reports done. You become a specialist in precisely what you didn't want to do.

The Portfolio Gap:

Your Kaggle projects are from bootcamp. Your work projects are all internal dashboards. When you interview for "real" DS roles, you can't show ML production experience.

Escape Routes:

  1. Target ML Engineering: More engineering, less ambiguity. The work is what it claims to be.
  2. Join AI-First Companies: Startups where ML is the product, not a nice-to-have.
  3. Build Open Source/Side Projects: Create ML portfolio outside of work. Prove you can do the interesting stuff.
  4. Research Roles: Academic or industry research labs. Lower pay, real ML work.
  5. Specialize in DS Infrastructure: MLOps, feature stores, model serving. Less glamorous, more real demand.

If this matches your current situation, run the Resignation Risk Analyzer before making your next move.

Who Should Avoid This Path

Data Science Is Wrong For You If:

  • You only want to build ML models: 70% of the job isn't that
  • You hate SQL and data cleaning: That's most of the work
  • You expect research-style work: Production constraints rule
  • You joined because of course hype: Reality doesn't match marketing
  • You want clear deliverables: DS projects are often ambiguous and fail

The Data Role Clarification:

📊 What Each Data Role Actually Does

TitleRealityML PortionSalary Trajectory
Data AnalystSQL + dashboards + reporting0-5%Rs 8-28 LPA
Data ScientistSQL + analysis + occasional ML10-30%Rs 12-50 LPA
ML EngineerBuilding + deploying models50-70%Rs 15-65 LPA
Data EngineerPipelines + infrastructure5-10%Rs 12-50 LPA

If you want ML, target ML Engineering. If you're okay with analytics + occasional ML, Data Scientist works. If you want pure analytics, save yourself the ML courses and own the Data Analyst identity.

Decision Framework

Use this quick framework before changing role, company, or specialization.

  • If your take-home is not compounding with experience, benchmark externally before accepting internal narratives.
  • If role expectations keep rising without title/pay movement, escalate with documented outcomes.
  • If growth path is unclear beyond 6-9 months, run a switch-or-specialize decision cycle.

Common Mistakes Checklist

  • Treating outlier salaries as planning baselines.
  • Using title changes as a substitute for capability changes.
  • Delaying market benchmarking until after compensation stagnates.
  • Over-indexing on model demos without production deployment depth.

Real Scenario Snapshot

A professional stays in-role despite rising responsibility and flat pay. Growth recovers only after external benchmarking and a deliberate switch-or-specialize decision.

Originality Lens

Contrarian thesis: Career outcomes usually degrade from quiet trade-offs, not sudden failures.

Non-obvious signal: When responsibility rises but decision rights stay flat, stagnation risk rises even before pay slows.

Evidence By Section

Claim: Popular career narratives overweight edge cases and underweight base-rate outcomes.

Evidence: AmbitionBox Salary Insights, Glassdoor India Salaries

Claim: Observed market behavior diverges from social-media compensation storytelling.

Evidence: Glassdoor India Salaries, LinkedIn Jobs (India)

Claim: Salary and growth ranges vary by company type, leverage, and cycle timing.

Evidence: AmbitionBox Salary Insights, Glassdoor India Salaries, LinkedIn Jobs (India), Naukri Jobs (India)

Claim: Career plateaus are often linked to stale scope, weak mobility planning, and evidence gaps.

Evidence: LinkedIn Jobs (India), Naukri Jobs (India), Kaggle State of Data/AI

Final Verdict

The Data Science Truth:

The "Data Science" title covers a wide spectrum of work, most of which isn't machine learning. If you joined expecting research and models, you'll find SQL and dashboards. The mismatch causes disillusionment, but it's not the field's fault—it's expectations vs. reality.

The Uncomfortable Question:

How much of your current role is actual ML vs. data manipulation? If it's 80%+ data work, you're a Data Analyst with an inflated title. Accept that, or actively seek ML Engineering roles at ML-first companies.

What Actually Works:

  1. Set realistic expectations—analytics first, ML maybe later
  2. Target companies where ML is the product, not a nice-to-have
  3. Consider ML Engineering if you want to build models professionally
  4. Build independent ML portfolio if your job doesn't provide ML opportunities
  5. Embrace the Data Analyst role if that matches your actual work
  6. Specialize in ML infrastructure (MLOps) for better positioning
Last Updated: January 13, 2026
Found a factual error? Request a correction.

What Changed

  • January 13, 2026: Reviewed salary ranges, corrected stale assumptions, and tightened internal links for related reads.
  • December 23, 2025: Revalidated core claims against current hiring and compensation signals.
  • December 23, 2025: Initial publication with baseline market framing and trade-off analysis.

Sources