Job Description
About Us
HG Insights is the global leader in technology intelligence, delivering actionable AI driven insights through advanced data science and scalable big data solutions.
What You’ll Do:
- Design, build, and optimize large-scale distributed data pipelines for processing billions of unstructured documents using Databricks, Apache Spark, and cloud-native big data tools
- Architect and scale enterprise-grade big-data systems, including data lakes, ETL/ELT workflows, and syndication platforms for customer-facing Insights-as-a-Service (InaaS) products.
- Collaborate with product teams to develop features across databases, backend services, and frontend UIs that expose actionable intelligence from complex datasets.
- Implement cutting-edge solutions for data ingestion, transformation, and analytics using Hadoop/Spark ecosystems, Elasticsearch, and cloud services (AWS EC2, S3, EMR).
- Drive system reliability through automation, CI/CD pipelines (Docker, Kubernetes, Terraform), and infrastructure-as-code practices.
What You’ll Be Responsible For
- Leading the development of our Big Data Insights Platform, ensuring scalability, performance, and cost-efficiency across distributed systems.
- Mentoring engineers, conducting code reviews, and establishing best practices for Spark optimization, data modeling, and cluster resource management.
- Building & Troubleshooting complex data pipeline issues, including performance tuning of Spark jobs, query optimization, and data quality enforcement.
- Collaborating in agile workflows (daily stand-ups, sprint planning) to deliver features rapidly while maintaining system stability.
- Ensuring security and compliance across data workflows, including access controls, encryption, and governance policies.
What You’ll Need
- BS/MS/Ph.D. in Computer Science or related field, with 7+ years of experience building production-grade big data systems.
- Expertise in Scala/Java for Spark development, including optimization of batch/streaming jobs and debugging distributed workflows.
- Proven track record with:
- Databricks, Hadoop/Spark ecosystems, and SQL/NoSQL databases (MySQL, Elasticsearch).
- Cloud platforms (AWS EC2, S3, EMR) and infrastructure-as-code tools (Terraform, Kubernetes).
- RESTful APIs, microservices architectures, and CI/CD automation.
- Leadership experience as a technical lead, including mentoring engineers and driving architectural decisions.
- Strong understanding of agile practices, distributed computing principles, and data lake architectures.
- Airflow orchestration (DAGs, operators, sensors) and integration with Spark/Databricks
- 7+ years of designing, modeling and building big data pipelines in an enterprise work setting.
Nice-to-Haves
- Experience with machine learning pipelines (Spark MLlib, Databricks ML) for predictive analytics.
- Knowledge of data governance frameworks and compliance standards (GDPR, CCPA).
- Contributions to open-source big data projects or published technical blogs/papers.
- DevOps proficiency in monitoring tools (Prometheus, Grafana) and serverless architectures.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application