Job Description
Job Title: Data Engineer – Medical Imaging & Health Data
Experience Required: Minimum 3 Years
Location: Chennai
Employment Type: Full-time
About the Role
We are seeking a skilled and experienced Data Engineer with a strong background in
medical imaging and healthcare data to join our growing team. The role involves building
scalable and secure data infrastructure for powering clinical research, AI-based diagnostics,
and healthcare analytics.
This role is ideal for someone comfortable working with diverse data types, including medical
images, clinical records, and time-series data, and who has experience with modern data
lakes, streaming platforms, and distributed databases.
Key Responsibilities
● Ingest, process, and manage large-scale medical imaging datasets (e.g., MRI, CT
scans, pathology slides) in DICOM and non-DICOM formats.
● Develop robust ETL/ELT pipelines to extract and transform healthcare data from EHRs,
PACS, RIS, and medical registries using tools like Apache NiFi, Airflow, or Dagster.
● Build streaming and event-driven pipelines using Kafka, RabbitMQ, and other
messaging systems.
● Design scalable storage systems for structured, unstructured, and semi-structured data
using data lakes (e.g., Apache Hudi, Iceberg, Delta Lake) over Amazon S3 or MinIO.
● Implement distributed databases (e.g., Cassandra, ClickHouse, MongoDB,
ElasticSearch) for various analytical workloads.
● Collaborate with clinicians, researchers, and ML teams to prepare datasets for
downstream AI/ML pipelines and analytics platforms.
Integrate graph databases for modeling complex biomedical relationships.
● Ensure data security, governance, anonymization, and compliance with HIPAA, GDPR,
and related healthcare standards.
● Enable data observability, monitoring, and audit trails for all pipeline components.
● Work with query engines such as Trino for federated query access across systems.
● Support data versioning and reproducibility using DVC.
● Perform data migrations and query optimization across polyglot data systems.
Must-Have Skills & Experience
● Bachelor’s or Master’s in Computer Science, Biomedical Engineering, Health
Informatics, or a related field.
● Minimum 3 years of experience as a Data Engineer in the healthcare or biomedical
domain.
● Expertise in Python, SQL, and handling medical imaging with libraries like pydicom,
SimpleITK, or Nibabel.
● Solid understanding of healthcare interoperability standards (DICOM, HL7, FHIR,
OMOP).
● Hands-on with distributed systems and databases like Cassandra, MongoDB,
ElasticSearch, ClickHouse, TimescaleDB, and Redis.
● Experience with Apache Spark, Apache Kafka, and streaming/event-driven
architectures.
● Familiar with data lakes (Apache Hudi, Iceberg, Delta Lake) and cloud/object storage
(Amazon S3, MinIO).
Proficient with ETL orchestration using Airflow, Dagster, or Apache NiFi.
● Comfortable with messaging queues (RabbitMQ).
● Strong foundation in data security, privacy, and regulatory compliance (HIPAA, GDPR).
Good-to-Have Skills
● Experience working with time-series databases: InfluxDB, TimescaleDB, QuestDB.
● Experience with graph databases like OrientDB, RavenDB, or Neo4j.
● Exposure to SQL/NoSQL ecosystems: MySQL, PostgreSQL, MariaDB, HBase,
Bytebase.
● Familiarity with Elasticsearch for indexing and search over medical datasets.
● Prior involvement in MLOps, feature stores, or AI/ML lifecycle integration.
● Understanding of data observability and monitoring tools for pipelines.
● Experience with data migration strategies and query performance tuning.
● Exposure to clinical registries, cohort builders, or clinical trial platforms.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application