Job Description
Job title : Data Engineer
Location : Bangalore - Hybrid
Experience : 4+ Years
Key Responsibilities:
- Administer and manage Databricks clusters, workspaces, and resources.
- Monitor platform health, availability, and performance.
- Develop and maintain data pipelines for ingesting, transforming, and loading data into Databricks.
- Optimize ETL processes for efficiency and scalability.
- Collaborate with data scientists and analysts to build and optimize data processing and analytics workflows using Databricks notebooks.
- Develop and implement data transformation and data analysis solutions.
- Implement and manage data lakes within Databricks for structured and unstructured data storage.
- Ensure data lake security and access controls.
- Identify and address performance bottlenecks in Databricks workloads.
- Optimize queries, data pipelines, and resource allocation for efficient data processing.
- Implement security measures to protect data within Databricks, including access controls, encryption, and auditing.
- Ensure compliance with data privacy regulations and industry standards.
- Integrate Databricks with various data sources, databases, and external systems to enable seamless data flow.
- Ensure data source connectivity and data consistency.
- Manage cluster scaling and resource allocation to meet performance and cost requirements.
- Optimise cluster configurations for specific workloads.
- Identify and resolve technical issues, errors, and anomalies in Databricks workflows.
- Provide support to users encountering problems.
- Collaborate with data engineers, data scientists, and analysts to understand data requirements and provide technical guidance.
- Conduct training sessions and knowledge sharing to empower users with Databricks capabilities.
- Create and maintain documentation for Databricks workflows, configurations, and best practices.
- Promote and enforce coding and data engineering standards.
- Monitor and optimise Databricks costs, including cluster utilisation and resource allocation.
- Ensure cost-efficient data processing.
- Stay current with Databricks updates and enhancements.
- Continuously improve Databricks workflows, processes, and infrastructure.
- Establish data governance practices within Databricks, including metadata management and data cataloguing.
- Maintain data lineage and documentation for data assets.
- Implement data quality checks and validation processes to ensure data accuracy and reliability.
- Develop and enforce data quality standards.
- Plan for scalability and resilience in Databricks infrastructure to accommodate growing data volumes and ensure high availability.
- Additional duties as assigned.
Qualifications:
- Bachelor's Degree: A bachelor's degree in computer science, information technology, data engineering, or a related field is often required. Some employers may consider equivalent work experience or relevant certifications.
- A minimum of 4 years of hands-on experience working with Databricks as a data engineer, data scientist, or similar role is typically expected.
- Strong experience in data engineering, including designing and developing data pipelines, ETL processes, and data transformations.
- Familiarity with big data technologies and frameworks such as Apache Spark, Hadoop, or similar platforms commonly used in conjunction with Databricks.
- Proficiency in programming languages commonly used in Databricks, such as Python, Scala, or SQL. Knowledge of Scala is often preferred.
- Proven ability to identify and address performance bottlenecks in Databricks workloads, optimising query performance, and resource allocation.
- A commitment to staying updated with the latest Databricks updates, enhancements, and emerging technologies in the data engineering and analytics field.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application