The Junior Data Science Reality: You Are a SQL Janitor

This article is written for the "Kaggle Grandmaster" wannabe.

You have spent the last 6 months living in Jupyter Notebooks. You know the mathematical difference between L1 and L2 regularization. You have fine-tuned a BERT model on a dataset you found on Reddit. You dream in PyTorch and Scikit-learn.

You believe that your first job will involve "Building Models", "Training AI", or "Solving AGI".

You believe you are entering the industry as a Scientist — a thinker who will be paid to experiment, hypothesize, and optimize.

If you think your daily life will resemble an DeepMind research paper, this article is your reality check.

P. Mishra — December 2025

The Expectation

The expectation is sold to you by EdTech influencers and Coursera certificates.

"Data is the new Oil."

You expect to walk into a company and be handed a perfectly clean, labeled dataset. You expect the Business Stakeholders to ask you for "Predictions" and "Insights".

You imagine your workflow like this:

  • Import Data
  • Train Model
  • Optimize Hyperparameters
  • Present cool 3D graphs to the CEO
  • Get promoted for increasing revenue by 20%

You think 80% of your time will be spent on Modelling and 20% on deployment.

You think SQL is "legacy tech" for backend engineers, and Excel is for finance guys.

The Reality

The Reality: You are a glorified Plumber.

Real-world data is not a Kaggle dataset. It is a crime scene.

It lives in 50 disconnected Excel sheets, a legacy SQL database that crashes if you query more than 1 month of rows, and a random PDF on a sales manager's desktop.

Companies do not have "Modelling" problems. They have "Data Quality" problems.

Your job is not to build Neural Networks. Your job is to write ugly, 500-line SQL joins to figure out why the "Total Revenue" column in the Sales Database doesn't match the "Bank Deposit" column in the Finance Database.

You will spend 90% of your time cleaning data. Parsing dates that are formatted wrong. Fixing spelling mistakes in city names. Removing duplicates that shouldn't exist.

You will not touch an LLM. You will touch `pandas.dropna()` and `Regex`. And you will cry.

Most companies don't need AI. They need a dashboard that works.

Salary & Growth Reality

This misalignment shows up in the salary.

Unless you have a PhD or are in the top 1% of graduates from IISc or Old IITs, you are not getting the "AI Researcher" salary (₹30 LPA+).

You are getting the "Data Analyst" salary (₹6-12 LPA), even if your title says "Junior Data Scientist".

Companies know that the supply of Juniors who can "import sklearn" is infinite. The supply of potential employees who can actually clean a dirty warehouse database is low.

Role Type Reality (LPA) Actual Work
Cool AI Jobs 18.0 - 30.0 Research / LLMs
Real Jobs 5.0 - 12.0 Cleaning CSVs

*90% of openings are Mislabelled Data Analyst roles.

Where Most People Get Stuck

You get stuck because you refuse to accept your role.

You turn your nose up at Data Engineering. You think writing pipelines, configuring Airflow, and managing ETL jobs is "below you". You want to do the Math.

So you sit in your corner, building complex models on your local machine that never get deployed because the data infrastructure doesn't support them.

Meanwhile, the "Average" engineer who learned SQL, DBT, and Cloud Infrastructure is getting promoted because they are actually delivering value (clean data) to the business.

The market pays for Pipelines, not Notebooks. If you can't put your model in production, you are useless.

Who Should Avoid This Path

Avoid if: You hate cleaning up other people's messes. If you have a low tolerance for ambiguity and broken systems, you will burn out in 3 months.

This career works for: Detectives. People who enjoy the hunt. People who find satisfaction in taking a chaotic, broken mess and making it orderly.

Final Verdict

Learn SQL and MLOps.

Stop trying to be an "AI Architect" as a fresher. Be the person who can actually get clean data from Point A to Point B.

The "Sexy" part of Data Science is a luxury. The "Janitor" part is a necessity.

If you want to survive, become a Data Engineer who knows Statistics, not a Statistician who refuses to Engineering.

Last Updated: December 2025