Job Description
Role Overview
We are seeking a Senior Data / Streaming Engineer with 5–7 years of experience to design, build, and operate scalable real-time and batch data processing platforms. The role focuses on stream processing, cloud-native data pipelines, and analytics systems running on AWS.
You will work closely with data, platform, and product teams to deliver reliable, high-performance data solutions supporting real-time analytics, monitoring, and downstream consumption.
Key Responsibilities
- Design, develop, and maintain real-time stream processing applications using Apache Flink / PyFlink and Spark, including state management and event-time processing with watermarks.
- Build and optimize Python-based data pipelines using libraries such as Pandas, Polars, boto3, and PyArrow for data transformation and integration.
- Implement and manage Kafka-based streaming architectures (Apache Kafka / AWS MSK), including topic design, partitioning, and consumer/producer optimization.
- Develop and operate cloud-native data platforms on AWS, leveraging services such as S3, Managed Flink, CloudWatch, MSK, and IAM.
- Write and optimize SQL-based transformations using Flink SQL, ensuring efficient query execution and scalable data processing.
- Store, query, and analyze large datasets using ClickHouse, and build Grafana dashboards for observability, analytics, and system monitoring.
- Orchestrate batch and streaming workflows using Apache Airflow, including DAG design, scheduling, and operational monitoring.
- Containerize applications using Docker and support deployments on Kubernetes, following best practices for scalability and resilience.
- Collaborate with DevOps, platform, and analytics teams to improve system reliability, performance, and cost efficiency.
- Participate in code reviews, technical design discussions, and production support activities.
Required Skills & Experience
- 5–7 years of professional experience in data engineering, streaming platforms, or distributed systems.
- Strong hands-on experience with Apache Flink / PyFlink and/or Apache Spark for stream and batch processing.
- Proficient in Python for data engineering and automation (Pandas, Polars, boto3, PyArrow).
- Solid experience with Apache Kafka or AWS MSK, including streaming concepts such as partitions, offsets, and consumer groups.
- Strong understanding of AWS cloud services, particularly S3, MSK, Managed Flink, CloudWatch, and IAM.
- Advanced SQL skills, including data transformation and query optimization (Flink SQL preferred).
- Experience with ClickHouse or similar OLAP databases, and Grafana for dashboards and monitoring.
-
Working knowledge of Docker and Kubernetes fundamentals.
- Experience with Apache Airflow for pipeline orchestration and scheduling.
- Good understanding of distributed systems, fault tolerance, and performance tuning.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application