Job Description

Key Responsibilities

● Audit & Validation: Conduct rigorous quality checks on scraped outputs from

Streamlit applications to ensure high-fidelity extraction from source documents.

● Data Remediation: Utilize purpose-built data pipelines to manually or

programmatically overwrite inaccurate data points discovered during auditing.

● Pipeline Monitoring: Collaborate with data engineering teams to identify systemic

scraping errors and refine the logic within the ingestion layer.

● Governance Integration: Transition successful document auditing workflows into

our broader enterprise data governance practices.

● Reporting: Maintain detailed logs of data discrepancies, \"ground truth\" comparisons,

and error trends to inform future scraping strategies.


Required Skills & Qualifications

● Extreme Attention to Detail: You must have a passion for \"hunting\" for small

discrepancies in large datasets.

● Snowflake Proficiency: Hands-on experience querying and managing data within

Snowflake is required.

● Strong SQL Skills: Ability to write complex queries to validate data across multiple

tables and identify outliers.

● Analytical Mindset: Experience auditing unstructured data (PDFs, images, or web

scrapes) and comparing it against structured outputs.

● Communication: Ability to clearly document data issues and explain technical

discrepancies to both engineers and stakeholders.


Preferred Qualifications

● Python Experience: Familiarity with Python for data manipulation (Pandas) or basic

automation is a significant plus.

● Streamlit Familiarity: Understanding how Streamlit apps function to better

troubleshoot how data is being captured.

● Governance Background: Prior experience working within a formal Data

Governance framework or using data cataloging tools.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application