Job Description

Data Engineering Manager – Web Crawling & Pipeline Architecture


Experience: 7 to 12 Years

Location: Remote / Bangalore

Engagement: Full-time

Positions: 2

Qualification: B.E / B.Tech / M.Tech / MCA / Computer Science / IT

Industry: IT / Data / AI / E-commerce / FinTech / Healthcare

Notice Period: Immediate


What We Are Looking For


  • Proven experience leading data engineering teams with strong ownership of web crawling systems and pipeline architecture.
  • Expertise in designing, building, and optimizing scalable data pipelines , preferably using workflow orchestration tools such as Airflow or Celery .
  • Hands-on proficiency in Python and SQL for data extraction, transformation, processing, and storage.
  • Experience working with cloud platforms such as AWS, GCP, or Azure for data infrastructure, deployments, and pipeline operations.
  • Deep understanding of web crawling frameworks , proxy rotation, anti-bot strategies, session handling, and compliance with global data collection standards (GDPR/CCPA-safe crawling).
  • Strong expertise in AI-driven automation , including integrating AI agents or frameworks like Crawl4ai into scraping, validation, and pipeline workflows..

Responsibilities


  • Lead and mentor data engineering and web crawling teams, ensuring high-quality delivery and adherence to best practices.
  • Architect, implement, and optimize scalable data pipelines that support high-volume data ingestion, transformation, and storage.
  • Build and maintain robust crawling systems using modern frameworks, handling IP rotation, throttling, and dynamic content extraction.
  • Establish pipeline orchestration using Airflow, Celery , or similar distributed processing technologies.
  • Define and enforce data quality, validation, and security measures across all data flows and pipelines.
  • Collaborate with product, engineering, and analytics teams to translate data requirements into scalable technical solutions.
  • Develop monitoring, logging, and performance metrics to ensure high availability and reliability of data systems.
  • Oversee cloud-based deployments, cost optimization, and infrastructure improvements on AWS/GCP/Azure.
  • Integrate AI agents or LLM-based automation for tasks such as error resolution, data validation, enrichment, and adaptive crawling


Qualifications

  • Bachelor's or master's degree in engineering, Computer Science, or related field.
  • 7–12 years of relevant experience in data engineering, pipeline design, or large-scale web crawling systems .
  • Strong expertise in Python, SQL , and modern data processing practices.
  • Experience working with Airflow, Celery , or similar workflow automation tools.
  • Solid understanding of proxy systems, anti-bot techniques , and scalable crawler architecture.
  • Hands-on experience with cloud data platforms (AWS/GCP/Azure).
  • Experience with AI/LLM frameworks (Crawl4ai, LangChain, LlamaIndex, AutoGen, OpenAI, or similar).
  • Strong analytical, architectural, and leadership skills.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application