The Junior Data Science Reality: You Are a SQL Janitor
This article is written for the "Kaggle Grandmaster" wannabe.
You have spent the last 6 months living in Jupyter Notebooks. You know the mathematical difference between L1 and L2 regularization. You have fine-tuned a BERT model on a dataset you found on Reddit. You dream in PyTorch and Scikit-learn.
You believe that your first job will involve "Building Models", "Training AI", or "Solving AGI".
You believe you are entering the industry as a Scientist — a thinker who will be paid to experiment, hypothesize, and optimize.
If you think your daily life will resemble an DeepMind research paper, this article is your reality check.
Key Takeaways
- This piece focuses on data science realities in India, not outlier narratives.
- Compensation numbers should be interpreted with role scope, market cycle, and switching friction.
- Use decision frameworks and evidence checks before acting on title or salary headlines.
On This Page
The Expectation
The expectation is sold to you by EdTech influencers and Coursera certificates.
"Data is the new Oil."
You expect to walk into a company and be handed a perfectly clean, labeled dataset. You expect the Business Stakeholders to ask you for "Predictions" and "Insights".
You imagine your workflow like this:
- Import Data
- Train Model
- Optimize Hyperparameters
- Present cool 3D graphs to the CEO
- Get promoted for increasing revenue by 20%
You think 80% of your time will be spent on Modelling and 20% on deployment.
You think SQL is "legacy tech" for backend engineers, and Excel is for finance guys.
The Reality
What Junior Data Scientists Actually Do:
📊 Data Science Job Reality
| What Courses Teach | What Juniors Actually Do |
|---|---|
| Machine learning algorithms | Clean messy SQL data |
| Neural networks | Build dashboards |
| Statistical modeling | Answer ad-hoc data requests |
| Research papers | Excel exports for business teams |
| Kaggle competitions | Debug data pipelines |
The SQL Janitor Reality:
70-80% of junior data science work is:
- Writing SQL queries to pull data
- Cleaning data that's never clean
- Building reports and dashboards
- Answering "can you pull this data?" requests
- Diagnosing why numbers don't match
The machine learning you studied? You'll use it on 5-10% of your tasks. And that's if you're lucky enough to have problems that need ML rather than simple analytics.
📈 Data Science Time Allocation
| Activity | What You Expected | Reality (Junior Role) |
|---|---|---|
| Machine Learning | 50% | 5-10% |
| Data cleaning | 10% | 35% |
| SQL queries | 10% | 30% |
| Dashboards/reporting | 10% | 15% |
| Stakeholder requests | 5% | 10% |
| Meeting/communication | 5% | 10% |
Case Study - The ML Dreamer:
Priya, 25, Junior Data Scientist at E-commerce Startup:
- Masters in ML from good institute
- Expectation: Building recommendation systems
- Reality: "Can you pull last month's sales by category?"
- ML projects worked on in 18 months: 1
- SQL queries written: Hundreds
- Dashboards built: 15+
- Current feeling: "I'm a well-paid data analyst, not a data scientist"
Related context: Salary Reality Check, CTC Decoder, more in Data Science.
Salary and Growth Reality
Data Role Salary Clarity:
💰 Detailed Data Role Comparison
| Role | Year 2 | Year 5 | Year 8 | ML Work % |
|---|---|---|---|---|
| Business Analyst | Rs 8 LPA | Rs 14 LPA | Rs 22 LPA | 0% |
| Data Analyst | Rs 10 LPA | Rs 18 LPA | Rs 28 LPA | 0-5% |
| Data Scientist (typical) | Rs 12 LPA | Rs 24 LPA | Rs 40 LPA | 10-25% |
| ML Engineer | Rs 15 LPA | Rs 32 LPA | Rs 55 LPA | 50-70% |
| Applied Scientist | Rs 18 LPA | Rs 40 LPA | Rs 70 LPA | 70-90% |
If you want ML work AND high salary, target ML Engineer or Applied Scientist. "Data Scientist" at most companies is analytics with occasional modeling.
Where Real ML Work Exists:
- Tech giants (Google AI, Meta FAIR, Amazon Science)
- AI-first startups (core product is ML)
- Research labs (slower, academic style)
- Specialized teams at large companies
Most companies don't have enough data quality, infrastructure, or business problems for real ML. They hire "Data Scientists" and give them analyst work.
Cross-check your take-home with the CTC Decoder and compare ranges in Salary Reality.
Where Most People Get Stuck
Where Junior Data Scientists Get Stuck:
The Analytics Trap:
You're good at SQL and dashboards now. You're valuable for that work. Company doesn't want to train you on ML—they need the reports done. You become a specialist in precisely what you didn't want to do.
The Portfolio Gap:
Your Kaggle projects are from bootcamp. Your work projects are all internal dashboards. When you interview for "real" DS roles, you can't show ML production experience.
Escape Routes:
- Target ML Engineering: More engineering, less ambiguity. The work is what it claims to be.
- Join AI-First Companies: Startups where ML is the product, not a nice-to-have.
- Build Open Source/Side Projects: Create ML portfolio outside of work. Prove you can do the interesting stuff.
- Research Roles: Academic or industry research labs. Lower pay, real ML work.
- Specialize in DS Infrastructure: MLOps, feature stores, model serving. Less glamorous, more real demand.
If this matches your current situation, run the Resignation Risk Analyzer before making your next move.
Who Should Avoid This Path
Data Science Is Wrong For You If:
- You only want to build ML models: 70% of the job isn't that
- You hate SQL and data cleaning: That's most of the work
- You expect research-style work: Production constraints rule
- You joined because of course hype: Reality doesn't match marketing
- You want clear deliverables: DS projects are often ambiguous and fail
The Data Role Clarification:
📊 What Each Data Role Actually Does
| Title | Reality | ML Portion | Salary Trajectory |
|---|---|---|---|
| Data Analyst | SQL + dashboards + reporting | 0-5% | Rs 8-28 LPA |
| Data Scientist | SQL + analysis + occasional ML | 10-30% | Rs 12-50 LPA |
| ML Engineer | Building + deploying models | 50-70% | Rs 15-65 LPA |
| Data Engineer | Pipelines + infrastructure | 5-10% | Rs 12-50 LPA |
If you want ML, target ML Engineering. If you're okay with analytics + occasional ML, Data Scientist works. If you want pure analytics, save yourself the ML courses and own the Data Analyst identity.
Decision Framework
Use this quick framework before changing role, company, or specialization.
- If your take-home is not compounding with experience, benchmark externally before accepting internal narratives.
- If role expectations keep rising without title/pay movement, escalate with documented outcomes.
- If growth path is unclear beyond 6-9 months, run a switch-or-specialize decision cycle.
Common Mistakes Checklist
- Treating outlier salaries as planning baselines.
- Using title changes as a substitute for capability changes.
- Delaying market benchmarking until after compensation stagnates.
- Over-indexing on model demos without production deployment depth.
Real Scenario Snapshot
A professional stays in-role despite rising responsibility and flat pay. Growth recovers only after external benchmarking and a deliberate switch-or-specialize decision.
Originality Lens
Contrarian thesis: Career outcomes usually degrade from quiet trade-offs, not sudden failures.
Non-obvious signal: When responsibility rises but decision rights stay flat, stagnation risk rises even before pay slows.
Evidence By Section
Claim: Popular career narratives overweight edge cases and underweight base-rate outcomes.
Evidence: AmbitionBox Salary Insights, Glassdoor India Salaries
Claim: Observed market behavior diverges from social-media compensation storytelling.
Evidence: Glassdoor India Salaries, LinkedIn Jobs (India)
Claim: Salary and growth ranges vary by company type, leverage, and cycle timing.
Evidence: AmbitionBox Salary Insights, Glassdoor India Salaries, LinkedIn Jobs (India), Naukri Jobs (India)
Claim: Career plateaus are often linked to stale scope, weak mobility planning, and evidence gaps.
Evidence: LinkedIn Jobs (India), Naukri Jobs (India), Kaggle State of Data/AI
Final Verdict
The Data Science Truth:
The "Data Science" title covers a wide spectrum of work, most of which isn't machine learning. If you joined expecting research and models, you'll find SQL and dashboards. The mismatch causes disillusionment, but it's not the field's fault—it's expectations vs. reality.
The Uncomfortable Question:
How much of your current role is actual ML vs. data manipulation? If it's 80%+ data work, you're a Data Analyst with an inflated title. Accept that, or actively seek ML Engineering roles at ML-first companies.
What Actually Works:
- Set realistic expectations—analytics first, ML maybe later
- Target companies where ML is the product, not a nice-to-have
- Consider ML Engineering if you want to build models professionally
- Build independent ML portfolio if your job doesn't provide ML opportunities
- Embrace the Data Analyst role if that matches your actual work
- Specialize in ML infrastructure (MLOps) for better positioning
What Changed
- January 13, 2026: Reviewed salary ranges, corrected stale assumptions, and tightened internal links for related reads.
- December 23, 2025: Revalidated core claims against current hiring and compensation signals.
- December 23, 2025: Initial publication with baseline market framing and trade-off analysis.
Sources
- AmbitionBox Salary Insights (checked February 22, 2026)
- Glassdoor India Salaries (checked February 22, 2026)
- LinkedIn Jobs (India) (checked February 22, 2026)
- Naukri Jobs (India) (checked February 22, 2026)
- Kaggle State of Data/AI (checked February 22, 2026)