Job Description

We are looking for a PySpark developer ( with ETL background ) to be able to design and build solution on one of our customer programs. This to build a data standardized and curation layer that will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.

Roles and Responsibilities

  • Ability to design, build and unit test the application in Spark/Pyspark
  • In-depth knowledge of Hadoop, Spark, and similar frameworks
  • Ability to understand existing ETL logic to convert into Spark/PySpark
  • Good implementation experience of oops concepts
  • Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec
  • Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources
  • Experience in working with Bitbucket and CI-CD process
  • Have knowledge of the agile methodology for delivering the projects
  • Good communication skills
  • Skills

  • Minimum 2 years of extensive experience in design, build and deployment of PySpark-based applications
  • Expertise in handling complex large-scale Big Data environments
  • Minimum 2 years of experience in the following: HIVE, YARN, HDFS
  • Experience in working in ETL products e.g. Ab Initio, Informatica, Data Stage etc.
  • Hands-on experience writing complex SQL queries, exporting, and importing large amounts of data using utilities
  • Experience: 6  to 10 Years  (EARLY JOINERS ONLY)

    Location: Pune OR Chennai OR Hyderabad

    Note: We can even consider someone with good hands-on experience in Spark and Scala.

    Apply for this Position

    Ready to join ? Click the button below to submit your application.

    Submit Application