Job Description
We are building the future of healthcare analytics. Join us to design, build, and scale robust data pipelines that power nationwide analytics and support our machine learning systems. Our goal: pipelines that are reliable, observable, and continuously improving in production.
This is a fully remote role, open to candidates based in Europe or India, with periodic team gatherings in Mountain View, California.
- Design, build, and maintain scalable ETL pipelines using Python (Pandas, PySpark) and SQL, orchestrated with Airflow (MWAA).
- Develop and maintain the SAIVA Data Lake/Lakehouse on AWS, ensuring quality, governance, scalability, and accessibility.
- Run and optimize distributed data processing jobs with Spark on AWS EMR and/or EKS.
- Implement batch and streaming ingestion frameworks (APIs, databases, files, event streams).
- Enforce validation and quality checks to ensure reliable analytics and ML readiness.
Apply for this Position
Ready to join SAIVA AI? Click the button below to submit your application.
Submit Application