Job Description

About Us

HG Insights is the global leader in technology intelligence, delivering actionable AI driven insights through advanced data science and scalable big data solutions.


What You’ll Do:

  • Design, build, and optimize large-scale distributed data pipelines for processing billions of unstructured documents using Databricks, Apache Spark, and cloud-native big data tools
  • Architect and scale enterprise-grade big-data systems, including data lakes, ETL/ELT workflows, and syndication platforms for customer-facing Insights-as-a-Service (InaaS) products.
  • Collaborate with product teams to develop features across databases, backend services, and frontend UIs that expose actionable intelligence from complex datasets.
  • Implement cutting-edge solutions for data ingestion, transformation, and analytics using Hadoop/Spark ecosystems, Elasticsearch, and cloud services (AWS EC2, S3, EMR).
  • Drive system reliability through automation, CI/CD pipelines (Docker, Kubernetes, Terraform), and infrastructure-as-code practices.

What You’ll Be Responsible For

  • Leading the development of our Big Data Insights Platform, ensuring scalability, performance, and cost-efficiency across distributed systems.
  • Mentoring engineers, conducting code reviews, and establishing best practices for Spark optimization, data modeling, and cluster resource management.
  • Building & Troubleshooting complex data pipeline issues, including performance tuning of Spark jobs, query optimization, and data quality enforcement.
  • Collaborating in agile workflows (daily stand-ups, sprint planning) to deliver features rapidly while maintaining system stability.
  • Ensuring security and compliance across data workflows, including access controls, encryption, and governance policies.

What You’ll Need

  • BS/MS/Ph.D. in Computer Science or related field, with 7+ years of experience building production-grade big data systems.
  • Expertise in Scala/Java for Spark development, including optimization of batch/streaming jobs and debugging distributed workflows.
  • Proven track record with:
  • Databricks, Hadoop/Spark ecosystems, and SQL/NoSQL databases (MySQL, Elasticsearch).
  • Cloud platforms (AWS EC2, S3, EMR) and infrastructure-as-code tools (Terraform, Kubernetes).
  • RESTful APIs, microservices architectures, and CI/CD automation.
  • Leadership experience as a technical lead, including mentoring engineers and driving architectural decisions.
  • Strong understanding of agile practices, distributed computing principles, and data lake architectures.
  • Airflow orchestration (DAGs, operators, sensors) and integration with Spark/Databricks
  • 7+ years of designing, modeling and building big data pipelines in an enterprise work setting.

Nice-to-Haves

  • Experience with machine learning pipelines (Spark MLlib, Databricks ML) for predictive analytics.
  • Knowledge of data governance frameworks and compliance standards (GDPR, CCPA).
  • Contributions to open-source big data projects or published technical blogs/papers.
  • DevOps proficiency in monitoring tools (Prometheus, Grafana) and serverless architectures.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application