Job Description
Role: Vision Analytics AI Engineer
Location: Technopark, TVM
Role: Lead AI Engineer (6–10 Years) | Senior AI Engineer (3–6 Years)
About the Role
We are seeking a specialized AI Engineer to lead the development of our next-generation Vision Analytics platform. You will be responsible for building intelligent systems that can "see," "understand," and "reason" about visual data. This role requires a unique blend of traditional Computer Vision expertise and cutting-edge Generative AI (Multi-modal LLMs/VLMs) to solve complex industrial and enterprise challenges.
Key Responsibilities
1. Advanced Vision Analytics & Deep Learning
- Architect & Deploy: Design and implement state-of-the-art Computer Vision models for object detection, segmentation, tracking, and activity recognition.
- Video Intelligence: Build real-time video analytics pipelines to extract actionable insights from CCTV and industrial camera feeds.
- Optimization: Hardening and optimizing models for edge deployment (NVIDIA Jetson, OAK-D) and cloud environments.
- Synthetic Data: Use Generative AI/GANs to create synthetic datasets for training vision models in data-scarce environments.
2. Generative AI & Multi-modal Integration
- Vision-Language Models (VLMs): Implement and fine-tune models like CLIP, LLa VA, or GPT-4 V for visual Q&A and image-to-text reasoning.
- Multi-modal RAG: Develop Retrieval Augmented Generation systems that can query both text and image/video metadata using vector databases (Milvus, Pinecone, or Weaviate).
- Visual Agents: Contribute to Agentic AI systems that use vision as a primary input to trigger automated workflows or alerts.
Required Skills & Experience
1 Computer Vision Core:
- Expertise in Open CV, Media Pipe, and SOTA architectures (YOLOv8/v10, Detectron2, SAM - Segment Anything Model).
- Strong proficiency in python and ML Frameworks Py Torch, Tensor Flow/Keras, Open CV.
2 Generative AI & Multi-modality:
- Hands-on experience with Large Language Models (LLMs) and Vision Transformers (Vi T).
- Experience with RAG frameworks (Lang Chain, Llama Index) specifically applied to multi-modal data.
- Proficiency in Prompt Engineering for both text and vision models.
- Proven experience building Agentic Workflows using Popular Agentic Frameworks like Lang Graph, Auto Gen, Crew AI, or Pydantic AI.
3 Programming:
- Strong Python skills (Num Py, Pandas, Scikit-Learn).
- Familiarity with model quantization (Tensor RT, ONNX, Open VINO).
Good to Have
- Experience with MLOps (MLflow, DVC) and CI/CD pipelines for Vision models.
- Knowledge of NVIDIA Deep Stream or GStreamer for video pipeline orchestration.
- Cloud AI experience (Azure Vision, AWS Rekognition, or GCP Vertex AI).
Educational Qualification
- Bachelor’s or Master’s degree in Computer Science, AI, Robotics, or a related field.
Location: Technopark, TVM
Role: Lead AI Engineer (6–10 Years) | Senior AI Engineer (3–6 Years)
About the Role
We are seeking a specialized AI Engineer to lead the development of our next-generation Vision Analytics platform. You will be responsible for building intelligent systems that can "see," "understand," and "reason" about visual data. This role requires a unique blend of traditional Computer Vision expertise and cutting-edge Generative AI (Multi-modal LLMs/VLMs) to solve complex industrial and enterprise challenges.
Key Responsibilities
1. Advanced Vision Analytics & Deep Learning
- Architect & Deploy: Design and implement state-of-the-art Computer Vision models for object detection, segmentation, tracking, and activity recognition.
- Video Intelligence: Build real-time video analytics pipelines to extract actionable insights from CCTV and industrial camera feeds.
- Optimization: Hardening and optimizing models for edge deployment (NVIDIA Jetson, OAK-D) and cloud environments.
- Synthetic Data: Use Generative AI/GANs to create synthetic datasets for training vision models in data-scarce environments.
2. Generative AI & Multi-modal Integration
- Vision-Language Models (VLMs): Implement and fine-tune models like CLIP, LLa VA, or GPT-4 V for visual Q&A and image-to-text reasoning.
- Multi-modal RAG: Develop Retrieval Augmented Generation systems that can query both text and image/video metadata using vector databases (Milvus, Pinecone, or Weaviate).
- Visual Agents: Contribute to Agentic AI systems that use vision as a primary input to trigger automated workflows or alerts.
Required Skills & Experience
1 Computer Vision Core:
- Expertise in Open CV, Media Pipe, and SOTA architectures (YOLOv8/v10, Detectron2, SAM - Segment Anything Model).
- Strong proficiency in python and ML Frameworks Py Torch, Tensor Flow/Keras, Open CV.
2 Generative AI & Multi-modality:
- Hands-on experience with Large Language Models (LLMs) and Vision Transformers (Vi T).
- Experience with RAG frameworks (Lang Chain, Llama Index) specifically applied to multi-modal data.
- Proficiency in Prompt Engineering for both text and vision models.
- Proven experience building Agentic Workflows using Popular Agentic Frameworks like Lang Graph, Auto Gen, Crew AI, or Pydantic AI.
3 Programming:
- Strong Python skills (Num Py, Pandas, Scikit-Learn).
- Familiarity with model quantization (Tensor RT, ONNX, Open VINO).
Good to Have
- Experience with MLOps (MLflow, DVC) and CI/CD pipelines for Vision models.
- Knowledge of NVIDIA Deep Stream or GStreamer for video pipeline orchestration.
- Cloud AI experience (Azure Vision, AWS Rekognition, or GCP Vertex AI).
Educational Qualification
- Bachelor’s or Master’s degree in Computer Science, AI, Robotics, or a related field.
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application