Job Description
Role Description
We are hiring a Senior Computer Vision Research Engineer to design and deploy scalable, low-latency video analytics systems for large-scale CCTV networks. Core focus: building the best-in-class Vision-Language Models (VLMs) optimized for edge deployment, enabling multimodal reasoning (VQA, semantic search, event description) in resource-constrained environments.
Key Responsibilities:
- Architect end-to-end pipelines: MOT, Re-ID, action/anomaly detection, scene understanding.
- Develop and optimize sub-2B parameter VLMs for edge (e.g., surpassing Moondream2/Qwen2-VL benchmarks) using QAT, PTQ, pruning, distillation, and efficient architectures.
- Scale real-time processing of thousands of streams with sub-second latency.
- Profile and resolve bottlenecks in video analytics and multimodal systems.
- Optimize for edge hardware (Jetson, Coral, Hailo) via TensorRT/OpenVINO/TVM.
- Design...
Apply for this Position
Ready to join Sapien Robotics? Click the button below to submit your application.
Submit Application