Job Description
Key Responsibilities
● Audit & Validation: Conduct rigorous quality checks on scraped outputs from
Streamlit applications to ensure high-fidelity extraction from source documents.
● Data Remediation: Utilize purpose-built data pipelines to manually or
programmatically overwrite inaccurate data points discovered during auditing.
● Pipeline Monitoring: Collaborate with data engineering teams to identify systemic
scraping errors and refine the logic within the ingestion layer.
● Governance Integration: Transition successful document auditing workflows into
our broader enterprise data governance practices.
● Reporting: Maintain detailed logs of data discrepancies, \"ground truth\" comparisons,
and error trends to inform future scraping strategies.
Required Skills & Qualifications
● Extreme Attention to Detail: You must have a passion for \"hunting\" for small
discrepancies in large datasets.
● Snowflake Proficiency: Hands-on experience querying and managing data within
Snowflake is required.
● Strong SQL Skills: Ability to write complex queries to validate data across multiple
tables and identify outliers.
● Analytical Mindset: Experience auditing unstructured data (PDFs, images, or web
scrapes) and comparing it against structured outputs.
● Communication: Ability to clearly document data issues and explain technical
discrepancies to both engineers and stakeholders.
Preferred Qualifications
● Python Experience: Familiarity with Python for data manipulation (Pandas) or basic
automation is a significant plus.
● Streamlit Familiarity: Understanding how Streamlit apps function to better
troubleshoot how data is being captured.
● Governance Background: Prior experience working within a formal Data
Governance framework or using data cataloging tools.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application