Data Scientist

📍 hyderabad, hyderabad, India
Full-time Other-General Posted January 24, 2026
Apply Now Similar Jobs
Job Description

Senior Data Scientist: AI Training Data (2-4 Months Contract)  
Company:  BespokeLabs (VC-backed; founded by IIT & Ivy League alumni) 
Location:  Remote 
Role Type:  Contract (2-4 Months) 
Time Commitment:  40 hrs/week (Full-time availability required) 
Compensation:  Hyper-competitive hourly rate (matching top-tier Senior Data Scientist bands) Experience:  6+ years 

About BespokeLabs  
BespokeLabs is a premier, VC-backed AI Research lab with an exceptionally talent-dense team of IIT and Ivy League alumni. We don’t just build tooling around AI—we build the massive-scale data systems and reasoning architectures that directly power next-generation models. Our research shapes the frontier of AI: we’ve published breakthroughs like GEPA, driven foundational datasets like OpenThoughts, and shipped state-of-the-art models including Bespoke-MiniCheck and Bespoke-MiniChart. More on our website bespokelabs.ai :) 
Role Overview  
We are looking for a high-impact Senior Data Scientist for an intensive, 2-month sprint. You will leverage your deep expertise in production-grade machine learning and applied statistics to develop the algorithms and logic that curate and evaluate datasets for advanced AI model training. 
This is not a traditional model-building or research role. We need a seasoned practitioner who has already owned the end-to-end DS lifecycle at scale. You will use your intuition for feature engineering, statistical validity, and large-scale data processing to programmatically generate, shape, and validate AI training data. 

What You Will Do (The Contract)  
Algorithm Design:  Design and implement custom statistical models and programmatic logic (e.g., anomaly detection, active learning, similarity scoring) to evaluate data quality, complexity, and redundancy at scale. 
Hands-on At-Scale Coding:  Write scalable PySpark and Python (NumPy/Pandas) code to apply these algorithms across massive datasets, translating experimental logic into reliable, large-scale workflows. 
Metric Formulation:  Develop custom quantitative metrics and heuristic benchmarks to rigorously assess the fidelity and suitability of data subsets for specific AI training objectives. 
Validation & Iteration:  Run high-speed validation cycles, analyzing the output of data-curation algorithms to diagnose skew, bias, or noise, and iteratively refining the logic. 
High-Level Curation:  Apply Senior-level domain expertise in predictive modeling and feature engineering to ensure the final training inputs meet the strict standards required for state-of-the-art ML systems. 

What You Bring to the Table (Your Past Experience)  
To be successful in this contract, you must have a track record of: 
The End-to-End DS Lifecycle:  Framing problems, modeling, validation, production, and iteration. 
Production Ownership:  Building and deploying ML and statistical models on large-scale datasets. 
Large-Scale Data Processing:  Working with Apache Spark to develop scalable feature pipelines and offline training workflows. 
Experimentation:  Designing and analyzing rigorous experiments (A/B tests, causal inference). 
Impact:  Translating complex model outputs into clear product and business decisions. 

Required Qualifications (Non-Negotiable)  
Experience:  6+ years as a Data Scientist or Applied Scientist. 
Production Background:  Proven ownership of models running in production environments. 
Applied Statistics:  Strong background in applied statistics and experimentation frameworks. 
Core Technical Skills  
Languages:  Python (NumPy, Pandas, Scikit-learn, PyTorch / TensorFlow) and Strong SQL. 
Big Data:  Apache Spark (PySpark or Spark SQL) for large-scale data processing. 
Methodologies:  Feature engineering, model evaluation, statistical modeling, and hypothesis testing. 

Strong Signals (Highly Valued)  
Scale:  Models trained on TB-scale datasets. 
Domain Specificity:  Experience in high-complexity domains such as: Recommendations, Pricing, Fraud / risk, Search / ranking, or Growth & experimentation. 
Collaboration:  Experience deploying models alongside data engineering pipelines. 

Out of Scope (Who Should Not Apply)  
BI / reporting-only roles 
SQL-only analysts 
Research-only ML roles with no production ownership 
Early-career profiles 
Apply for this Position

Ready to join ? Click the button below to submit your application.
Submit Application
Job Details

Location
hyderabad, hyderabad, India
Job Type
Full-time