Job Description

Spark Engineer



**Req number:**



R6280



**Employment type:**



Full time



**Worksite flexibility:**



Remote



**Who we are**



CAI is a global technology services firm with over 8,500 associates worldwide and a yearly revenue of $1 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise.



**Job Summary**



As a Spark Engineer, you will design, build, and optimize large-scale data processing systems using Apache Spark. You will collaborate with data scientists, analysts, and engineers to ensure scalable, reliable, and efficient data solutions.



**Job Description**



We are looking for a **Spark Engineer** with deep expertise in distributed data processing, ETL pipelines, and performance tuning for high-volume data environments. This position will be **full-time** and **remote.**



**What You'll Do:**



+ Design, develop, and maintain big data solutions using Apache Spark (Batch and Streaming).

+ Build data pipelines for processing structured, semi-structured, and unstructured data from multiple sources.

+ Optimize Spark jobs for performance and scalability across large datasets.

+ Integrate Spark with various data storage systems (HDFS, S3, Hive, Cassandra, etc.).

+ Collaborate with data scientists and analysts to deliver robust data solutions for analytics and machine learning.

+ Implement data quality checks, monitoring, and alerting for Spark-based workflows.

+ Ensure security and compliance of data processing systems.

+ Troubleshoot and resolve data pipeline and Spar k job issues in production environments



**What You'll Need**



Required:



+ Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred).

+ 3+ years of hands-on experience with Apache Spark (Core, SQL, Streaming).

+ Strong programming skills in Scala, Java, or Python (PySpark).

+ Solid understanding of distributed computing concepts and big data ecosystems (Hadoop, YARN, HDFS).

+ Experience with data serialization formats (Parquet, ORC, Avro).

+ Familiarity with data lake and cloud environments (AWS EMR, Databricks, GCP DataProc, or Azure Synapse).

+ Knowledge of SQL and experience with data warehouses (Snowflake, Redshift, BigQuery is a plus).

+ Strong background in performance tuning and Spark job optimization.

+ Experience with CI/CD pipelines and version control (Git).

+ Familiarity with containerization (Docker, Kubernetes) is an advantage.



Preferred **:**



+ Experience with stream processing frameworks (Kafka, Flink).

+ Exposure to machine learning workflows with Spark MLlib.

+ Knowledge of workflow orchestration tools (Airflow, Luigi).



**Physical Demands**



+ Ability to safely and successfully perform the essential job functions

+ Sedentary work that involves sitting or remaining stationary most of the time with occasional need to move around the office to attend meetings, etc.

+ Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor



**Reasonable accommodation statement**



If you require a reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employment selection process, please direct your inquiries to [email protected] or (888) 824 – 8111.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application