Job Description
**Key Responsibilities**:
Design, develop, and optimize big data processing pipelines using Apache Spark and Java.
Work on batch and real-time data processing frameworks to transform large datasets.
Write high-performance Spark jobs using RDDs, DataFrames, and Datasets.
Collaborate with data engineers, architects, and analysts to ensure seamless data integration.
Optimize Spark performance through tuning, partitioning, and efficient memory management.
Implement best practices for data governance, security, and compliance.
Work with CI/CD pipelines, version control (Git), and automation tools for continuous deployment.
**Required Skills**:
Strong experience in Java 5+, with expertise in functional programming and concurrency.
Hands-on experience with Apache Spark (Spark Core, Spark SQL, Spark Streaming).
Good understanding of Hadoop ecosystem, including HDFS, Hive, and YARN.
Experience with Big Data frameworks like Kafka, Flink, or Airf...
Apply for this Position
Ready to join Citi? Click the button below to submit your application.
Submit Application