Job Description

Role: Vision Analytics AI Engineer

Location: Technopark, TVM

Role: Lead AI Engineer (6–10 Years) | Senior AI Engineer (3–6 Years)

About the Role

We are seeking a specialized AI Engineer to lead the development of our next-generation Vision Analytics platform. You will be responsible for building intelligent systems that can "see," "understand," and "reason" about visual data. This role requires a unique blend of traditional Computer Vision expertise and cutting-edge Generative AI (Multi-modal LLMs/VLMs) to solve complex industrial and enterprise challenges.

Key Responsibilities

1. Advanced Vision Analytics & Deep Learning

  • Architect & Deploy: Design and implement state-of-the-art Computer Vision models for object detection, segmentation, tracking, and activity recognition.
  • Video Intelligence: Build real-time video analytics pipelines to extract actionable insights from CCTV and industrial camera feeds.
  • Optimization: Hardening and optimizing models for edge deployment (NVIDIA Jetson, OAK-D) and cloud environments.
  • Synthetic Data: Use Generative AI/GANs to create synthetic datasets for training vision models in data-scarce environments.

2. Generative AI & Multi-modal Integration

  • Vision-Language Models (VLMs): Implement and fine-tune models like CLIP, LLaVA, or GPT-4V for visual Q&A and image-to-text reasoning.
  • Multi-modal RAG: Develop Retrieval Augmented Generation systems that can query both text and image/video metadata using vector databases (Milvus, Pinecone, or Weaviate).
  • Visual Agents: Contribute to Agentic AI systems that use vision as a primary input to trigger automated workflows or alerts.

Required Skills & Experience

1 Computer Vision Core:

  • Expertise in OpenCV, MediaPipe, and SOTA architectures (YOLOv8/v10, Detectron2, SAM - Segment Anything Model).
  • Strong proficiency in python and ML Frameworks PyTorch , TensorFlow/Keras, OpenCV .

2 Generative AI & Multi-modality:

  • Hands-on experience with Large Language Models (LLMs) and Vision Transformers (ViT) .
  • Experience with RAG frameworks (LangChain, LlamaIndex) specifically applied to multi-modal data.
  • Proficiency in Prompt Engineering for both text and vision models.
  • Proven experience building Agentic Workflows using Popular Agentic Frameworks like LangGraph, AutoGen, CrewAI, or PydanticAI.

3 Programming:

  • Strong Python skills (NumPy, Pandas, Scikit-Learn).
  • Familiarity with model quantization (TensorRT, ONNX, OpenVINO).

Good to Have

  • Experience with MLOps (MLflow, DVC) and CI/CD pipelines for Vision models.
  • Knowledge of NVIDIA DeepStream or GStreamer for video pipeline orchestration.
  • Cloud AI experience (Azure Vision, AWS Rekognition, or GCP Vertex AI).

Educational Qualification

  • Bachelor’s or Master’s degree in Computer Science, AI, Robotics, or a related field.

Apply for this Position

Ready to join ? Click the button below to submit your application.

Submit Application