Job Description
- Define, Design, and Build an optimal data pipeline architecture to collect data from a variety of sources, cleanse, and organize data in SQL & NoSQL destinations (ELT & ETL Processes).
- Define and Build business use case-specific data models that can be consumed by Data Scientists and Data Analysts to conduct discovery and drive business insights and patterns.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS βbig dataβ technologies.
- Build and deploy analytical models and tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data, and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Define, Design, and Build Executive dashboards and reports catalogs to serve decision-making and insight generation needs.
- Provide inputs to help keep data separated and secure across data centers β on-prem and private and public cloud environments.
- Create data tools for analytics and data science team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
- Implement scheduled data load process and maintain and manage the data pipelines.
- Troubleshoot, investigate, and fix failed data pipelines and prepare RCA.
Experience with a mix of the following Data Engineering Technologies
- Python, Spark, Snowflake, Databricks, Hadoop (CDH), Hive, Sqoop, oozie
- SQL β Postgres, MySQL, MS SQL Server
- Azure β ADF, Synapse Analytics, SQL Server, ADLS G2
- AWS β Redshift, EMR cluster, S3
Experience with a mix of the following Data Analytics and Visualization toolsets
- SQL, PowerBI, Tableau, Looker, Python, R
- Python libraries -- Pandas, Scikit-learn, Seaborn, Matplotlib, TF, Stat-Models, PySpark, Spark-SQL, R, SAS, Julia, SPSS,
- Azure β Synapse Analytics, Azure ML studio, Azure Auto ML
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application