Job Description
**Role Number:** 200633208-3337
**Summary**
The AIML Multimodal Foundation Model Team is pioneering next-generation intelligent agent technologies that combine multimodal reasoning, tool-use, and visual understanding. Our innovative features redefine how hundreds of millions of people utilize their computers and mobile devices for search and information retrieval. Our universal search engine powers search capabilities across a range of Apple products, including Siri, Spotlight, Safari, Messages, and Lookup. Additionally, we develop cutting-edge generative AI technologies based on multimodal large language models to enable innovative features in both Apple’s devices and cloud-based services. As a member of this team, you will design new architectures for multimodal agents, explore advanced training paradigms, and build robust agentic capabilities such as planning, grounding, tool-use, and autonomous task execution. You will collaborate closely with researchers and engineers to bring cutting-edge agent research into production, transforming Apple devices into intelligent partners that help users get things done.
**Description**
As a member of our fast-paced group, you’ll have the unique and rewarding opportunity to shape upcoming products from Apple. We are looking for people with excellent applied machine learning, computer vision, multimodal LLM, and agent training experience and solid engineering skills.
This role will have the following responsibilities:
- Developing state-of-the-art multimodal foundation models for Apple Intelligence.
- Developing various agent capabilities for multimodal LLMs, including computer use agents, visual tool use, thinking with images, and multimodal web search.
- Developing, fine-tuning, and evaluating domain specific foundation models for various tasks and applications in Apple’s AI powered products
- Conducting applied research to transfer the pioneering research in generative AI to production ready technologies
- Understanding product requirements, translate them into modeling tasks and engineering tasks
**Minimum Qualifications**
+ PhD, MS or equivalent experience
+ Experience in machine learning, deep learning, computer vision, or natural language processing
+ Proficiency in one of following languages: Python, Go, Java, C+ **Preferred Qualifications**
+ Excellent data analytical skills
+ Good interpersonal skills and team player
+ PhD preferred
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf) .
**Summary**
The AIML Multimodal Foundation Model Team is pioneering next-generation intelligent agent technologies that combine multimodal reasoning, tool-use, and visual understanding. Our innovative features redefine how hundreds of millions of people utilize their computers and mobile devices for search and information retrieval. Our universal search engine powers search capabilities across a range of Apple products, including Siri, Spotlight, Safari, Messages, and Lookup. Additionally, we develop cutting-edge generative AI technologies based on multimodal large language models to enable innovative features in both Apple’s devices and cloud-based services. As a member of this team, you will design new architectures for multimodal agents, explore advanced training paradigms, and build robust agentic capabilities such as planning, grounding, tool-use, and autonomous task execution. You will collaborate closely with researchers and engineers to bring cutting-edge agent research into production, transforming Apple devices into intelligent partners that help users get things done.
**Description**
As a member of our fast-paced group, you’ll have the unique and rewarding opportunity to shape upcoming products from Apple. We are looking for people with excellent applied machine learning, computer vision, multimodal LLM, and agent training experience and solid engineering skills.
This role will have the following responsibilities:
- Developing state-of-the-art multimodal foundation models for Apple Intelligence.
- Developing various agent capabilities for multimodal LLMs, including computer use agents, visual tool use, thinking with images, and multimodal web search.
- Developing, fine-tuning, and evaluating domain specific foundation models for various tasks and applications in Apple’s AI powered products
- Conducting applied research to transfer the pioneering research in generative AI to production ready technologies
- Understanding product requirements, translate them into modeling tasks and engineering tasks
**Minimum Qualifications**
+ PhD, MS or equivalent experience
+ Experience in machine learning, deep learning, computer vision, or natural language processing
+ Proficiency in one of following languages: Python, Go, Java, C+ **Preferred Qualifications**
+ Excellent data analytical skills
+ Good interpersonal skills and team player
+ PhD preferred
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf) .
Apply for this Position
Ready to join ? Click the button below to submit your application.
Submit Application