Job Description


Our Client, a Consumer Products and Software Services company, is looking for a Machine Learning Data Engineer for their Seattle, WA/Hybrid location.
 

Responsibilities:
  • Design and Build Data Pipelines: Create efficient, reliable, streamable, and scalable data pipelines using industry-standard tools and techniques, such as TorchData, WebDataset, Apache Parquet., Python, and SQL.
  • Data Ingestion: Develop strategies for ingesting data from data providers, ensuring data quality and consistency.
  • Data Pre-processing: Implement parallel pre-processing to clean, transform, de-duplicate, combine and normalize data.
  • Data Curation and Enrichment: Curate, augment, and enrich existing datasets to improve data quality and provide valuable insights to stakeholders.
  • Synthetic Data Generation: Collaborate with synthetic data teams to generate data and incorporate into existing pipelines.
  • Collaboration with ML Teams: Work closely with ML scientists, engineers, and product teams to understand data requirements, and collaborate on data delivery.
  • Monitoring, Maintenance & Updating: Monitor data pipelines for performance, errors, and bottlenecks, and implement regular maintenance and updates. Stay updated with the latest trends and incorporate best practices into data pipelines.
  • Technical Documentation: Document data pipelines, settings, and procedures for easy maintenance and knowledge sharing.


  • Requirements:
  • Bachelor’s degree in Computer Science, Information Technology, or a related field.
  • At least years of experience as a Software Engineer or Data Engineer.
  • Strong software engineering skills, proficiency in Python
  • Experience with data processing tools and formats such as Apache Parquet, WebDataset, TorchData, Pandas, Shell Scripting, Protobuf, TFRecord
  • Knowledge of data warehouse architectures and cloud-based systems (, AWS S).
  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration skills.
  • Master’s degree in Data Science or a related field.
  • Experience with data curation and enrichment techniques, particularly for large scale text, image and video data
  • Familiarity with natural language processing (NLP), machine learning (ML) concepts and frameworks (PyTorch)


  • Why Should You Apply?
  • Excellent growth and advancement opportunities

  • Apply for this Position

    Ready to join ? Click the button below to submit your application.

    Submit Application