Job Description

  • Define, Design, and Build an optimal data pipeline architecture to collect data from a variety of sources, cleanse, and organize data in SQL & NoSQL destinations (ELT & ETL Processes).
  • Define and Build business use case-specific data models that can be consumed by Data Scientists and Data Analysts to conduct discovery and drive business insights and patterns.
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS β€˜big data’ technologies.
  • Build and deploy analytical models and tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
  • Work with stakeholders including the Executive, Product, Data, and Design teams to assist with data-related technical issues and support their data infrastructure needs.
  • Define, Design, and Build Executive dashboards and reports catalogs to serve decision-making and insight generation needs.
  • Provide inputs to help keep data separated and secure across data centers – on-prem and private and public cloud environments.
  • Create data tools for analytics and data science team members that assist them in building and optimizing our product into an innovative industry leader.
  • Work with data and analytics experts to strive for greater functionality in our data systems.
  • Implement scheduled data load process and maintain and manage the data pipelines.
  • Troubleshoot, investigate, and fix failed data pipelines and prepare RCA.

Experience with a mix of the following Data Engineering Technologies

  • Python, Spark, Snowflake, Databricks, Hadoop (CDH), Hive, Sqoop, oozie
  • SQL – Postgres, MySQL, MS SQL Server
  • Azure – ADF, Synapse Analytics, SQL Server, ADLS G2
  • AWS – Redshift, EMR cluster, S3


Experience with a mix of the following Data Analytics and Visualization toolsets

  • SQL, PowerBI, Tableau, Looker, Python, R
  • Python libraries -- Pandas, Scikit-learn, Seaborn, Matplotlib, TF, Stat-Models, PySpark, Spark-SQL, R, SAS, Julia, SPSS,
  • Azure – Synapse Analytics, Azure ML studio, Azure Auto ML

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application