Job Description

Job Description

Job Title: Associate Data Architect – Master Data Management (MDM)  

 

Location:  

Pune - Hybrid  

Experience:  

10+ years of experience in Data Architecture, Data Engineering/Integration with strong exposure into Data Modelling and Database (RDBMS) Management. 

 

About the Role  

We are seeking an Associate Data/Database Architect to join our core product architecture team building an enterprise-grade, multi-domain Master Data Management (MDM) product platform .  

You will play a key role in optimizing and extending the MDM data model , implementing efficient data ingestion and entity resolution mechanisms , and ensuring the system supports multiple domains such as Party (Individual/Organization), Product, Location, Policy, and Relationship in a cloud-native and scalable manner. 

 

Key Responsibilities  

Data Modeling & Architecture  

  • Enhance and extend the existing Party-based data model into a multi-domain MDM schema (Party, Product, Location, Relationship, Policy, etc.). 
  • Design and maintain canonical data models and staging-to-core mappings for multiple source systems. 
  • Implement auditability, lineage, and soft-delete frameworks within the MDM data model. 
  • Contribute to the creation of golden records , trust scores , match/merge logic , and data survivorship rules
  • Ensure the model supports real-time and batch data mastering across multiple domains. 

 

Data Engineering & Integration  

  • Help support to optimize data ingestion and ETL/ELT pipeline using Python, PySpark, SQL, and/or Informatica (or equivalent tools). 
  • Design and implement data validation, profiling, and quality checks to ensure consistent master data. 
  • Work on data harmonization , schema mapping , and standardization across multiple source systems. 
  • Help build efficient ETL mappings from canonical staging layers to MDM core data models in PostgreSQL
  • Develop REST APIs or streaming pipelines (Kafka/Spark) for real-time data processing and entity resolution. 

 

Cloud & Platform Engineering  

  • Implement and optimize data pipelines on AWS or Azure using native services (e.g., AWS Glue, Lambda, S3, Redshift, Azure Data Factory, Synapse, Data Lake). 
  • Deploy and manage data pipelines and databases following cloud-native, cost-effective, and scalable design principles. 
  • Collaborate with DevOps teams for CI/CD , infrastructure-as-code , data pipeline and database deployment/migration automation

 

Governance, Security & Compliance  

  • Implement data lineage, versioning, and stewardship processes. 
  • Ensure compliance with data privacy and security standards (GDPR, HIPAA, etc.). 
  • Partner with Data Governance teams to define data ownership, data standards, and stewardship workflows



Requirements

Technical Skills Required  

Core Skills  

  • Data Modelling: Expert-level in Relational (3NF) and Dimensional (Star/Snowflake) modelling; hands-on in Party data model , multi-domain MDM , and canonical models
  • Database: PostgreSQL (preferred), or any enterprise RDBMS. 
  • ER Modelling Tool – Erwin/ERStudio, Database Markup Language (DBML). 
  • ETL / Data Integration: Informatica, Python, PySpark, SQL, or similar tools. 
  • Cloud Platforms: AWS or Azure. 
  • Programming: Advanced SQL , Python , PySpark , and/or UNIX/Linux scripting
  • Data Quality & Governance: Familiarity with data quality rules , profiling , match/merge , and entity resolution
  • DevOps - Version Control & CI/CD: Git, Azure DevOps, Jenkins, Terraform, Redgate Flyway (preferred) 

 

Database Design & Optimization (PostgreSQL)  

  • Design and maintain normalized and denormalized models using advanced features (schemas, partitions, views, CTEs, JSONB, arrays). 
  • Build and optimize complex SQL queries , materialized views , and data marts for performance and scalability. 
  • Tune RDBMS (PostgreSQL) performance – indexes, query plans, vacuum/analyze, statistics, parallelism, and connection management. 
  • Leverage RDBMS (PostgreSQL) extensions such as: 
    • pg_trgm for fuzzy matching and probabilistic search. 
    • fuzzystrmatch, pg_vector for semantic similarity and name matching. 
    • hstore, jsonb for flexible attribute storage. 
  • Implement RBAC , row-level security , partitioning , and logical replication for scalable MDM deployment. 
  • Work with stored procedures, functions, and triggers for data quality checks and lineage automation. 
  • Implement HA/DR , backup/restore , database-level encryption (at-rest, in-transit), column-level encryption for PII/PHI data. 

 

Good to Have  

  • Knowledge of Master Data Management (MDM ) - Customer, Product etc. 
  • Experience with graph databases like Neo4j for relationship and lineage tracking. 
  • Knowledge of probabilistic and deterministic matching , ML-based entity resolution , or AI-driven data mastering
  • Experience in data cataloging , data lineage tools , or metadata management platforms
  • Familiarity with data security frameworks and Well-Architected Framework principles. 

 

Soft Skills  

  • Strong analytical, conceptual and problem-solving skills. 
  • Ability to collaborate in a cross-functional, agile environment. 
  • Excellent communication and documentation skills. 
  • Self-driven, proactive, and capable of working with minimal supervision. 
  • Strong desire to innovate and build scalable, reusable data frameworks

 

Education  

  • Bachelor’s or master’s degree in computer science, Information Technology, or related discipline. 
  • Certifications in AWS/Azure , Informatica , or Data Architecture are a plus. 



Benefits

Why Join Us  

  • Be part of a cutting-edge MDM product initiative blending data architecture, engineering, AI/ML , and cloud-native design
  • Opportunity to shape the next-generation data mastering framework for multiple industry domains. 
  • Gain deep exposure to data mastering, lineage, probabilistic search, and graph-based relationship management
  • Competitive compensation, flexible working and a technology-driven culture. 




Requirements
Requirements · Proficiency in Python programming. · Advanced knowledge in mathematics and algorithm development. · Experience in developing machine learning and deep learning models. · Strong understanding of neural network architectures, with emphasis on GenAI and LLMs. · Skilled in data processing and visualization. · Experienced in natural language processing. · Knowledgeable in AI/ML deployment, DevOps practices, and cloud services. · In-depth understanding of AI security principles and practices.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application