Job Description
<div>Health Care Data Engineer Job Description<br /> Mission</div> <div>Systematic collection of annual public hospital statistics from official sources<br /> (websites, activity reports, national databases). The mission includes auditing, data<br /> conversion to customer's standardized format, harmonization, and quality<br /> assurance.<br /> Timeline: Initial phase 6 months; could get extended based on mission success.<br /> Responsibilities<br /> Data Collection and Analysis<br /> Perform structured web research on official hospital databases.<br /> Register on platforms and retrieve relevant national datasets.<br /> Audit available data (scope, completeness, frequency, format).<br /> Assess subscription costs and data licensing constraints.<br /> Evaluate data usability and perform source-to-model mapping analysis.<br /> Document collection processes and maintain methodological transparency.<br /> Perform manual scraping and data normalization when necessary.<br /> Map collected datasets to the customer's data model.<br /> Maintain and update project dashboards and sourcing trackers.</div> <div>Data Harmonization<br /> Harmonize data from different sources into a coherent dataset<br /> Conduct consistency checks and rule-based quality audits.<br /> Apply data governance rules (e.g., replace n< 5 with n=5 for compliance).<br /> Enforce naming conventions and standardized variable structures.<br /> Coordinate with Data Analyst for continuous QA feedback.</div> <div>Data Integration<br /> Review validated procedures and translate them into technical workflows.<br /> Develop or support automation for data retrieval (API connections, secure<br /> transfers, or structured downloads).<br /> Ensure compatibility of formats, metadata specifications, and export<br /> standards.<br /> Implement secure data transfer protocols aligned with customer's<br /> infrastructure.<br /> Requited Skills and Knowledge<br /> 1. Good understanding of health care domain and data types<br /> 2. Familiarity with ICD10, SNOWMED<br /> 3. Web scraping skills using Python, JavaScript, Selenium<br /> 4. Ability to use Regex to extract ICD Codes and numbers and models such as<br /> SpaCy, Tesseract etc for text extraction from PDFs (if needed)<br /> 5. Ability to use workflow automation tools Eg: Airflow, Prefect, AWS<br /> Stepfunctions, Metaflow etc<br /> 6. Good communication skills and innovative, problem-solving skills</div>
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application