Job Description
Responsibilities
- Design and develop highly scalable, real‑time systems using Hadoop ecosystem components such as Iceberg, Spark, Ozone, Trino, Hive, Ranger, Kafka, Flink, and NiFi.
- Build robust data ingestion and transformation frameworks using Java, Spark, Python, and shell scripting for ingesting multi‑model data (image, audio, video, unstructured documents) with both batch and real‑time processing.
- Develop full‑stack applications and internal engineering tools using Python, shell scripting, and modern web frameworks (e.g., Flask, React).
- Perform performance tuning and optimization of data applications on Hadoop to ensure optimal resource utilization.
- Collaborate closely with data scientists to operationalize machine learning models using Cloudera Machine Learning (CML).
- Experience working with ML platforms such as CML, Spark MLlib, and Python ML libraries (scikit‑learn, XGBoost), including model deployment.
Apply for this Position
Ready to join Ampstek? Click the button below to submit your application.
Submit Application