Machine Learning Research Intern, Multi-Modal Foundation Models (Robotics)

Toyota Research Institute

Toyota Research Institute

Software Engineering
Los Altos, CA, USA
Posted on Friday, November 3, 2023
At Toyota Research Institute (TRI), we’re on a mission to improve the quality of human life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team in Robotics, Human-Centered AI, Human Interactive Driving, and Energy & Materials.
This is a Summer 2024 paid 12-week internship opportunity. Please note that this internship will be a hybrid in-office role.
The Team
Our Machine Learning Language team is embedded with our Robotics team and is looking for Research Interns for Summer 2024 in a variety of areas such as large language (LLM) (training and fine-tuning), vision-language (VLMs), vision-language-action (VLAs) and other multi-modal foundation models. We are interested in better approaches for alignment, model architectures, distillation and scalability, in addition to looking at approaches to handle new challenges in planning and compositionality where current LLMs fail.
We are aiming to make progress on some of the hardest scientific challenges around multi-modal foundation models for downstream applications in assistive robotics and across Toyota. Multi-modal modeling is a core component of our robotics architecture as the team works towards Large Behavior Models.
The Internship
As a Research Intern, you will work with a multidisciplinary team proposing and conducting pioneering research in Machine Learning. You will use large amounts of text, image, and other data to solve open problems and work towards publications at top academic venues!


  • Conduct daring research, primarily at the intersection of Natural Language Processing and Computer Vision, that solves open problems of high practical and/or ethical value, and validate it in real-world benchmarks and systems.
  • Push the boundaries of knowledge and the state of the art in areas including language and multi-modal models.
  • Partner with a multidisciplinary team including other research scientists and engineers across the Robotics Machine Learning teams.
  • Stay up to date on the state-of-the-art in Machine Learning ideas and software.
  • Present results in verbal and written communications at international conferences, internally, and via open-source contributions to the community.


  • Currently pursuing a Ph.D. in Machine Learning, Natural Language Processing, Computer Vision, Robotics or related fields.
  • Publications or desire to publish at high-impact conferences/journals (e.g., NAACL, ICLR, NeurIPS, ICML, COLM, TMLR, EMNLP, *ACL etc.) on some of the aforementioned topics.
  • Proficiency with one or more coding languages and systems, preferably Python, Unix, and a Deep Learning framework (e.g., PyTorch).
  • Proficiency in engineering best practices for model and data scaling for large-scale model training.
  • Passionate about modern natural language and multi-modal processing, including training, understanding, and aligning large language models with human values.
  • Ability to work in collaboration with other researchers and engineers to invent and develop interesting research ideas.
  • Ability to execute on research projects, working in collaboration with other members of the team.
  • A reliable teammate who loves to think big, go deeper, and strives to deliver with integrity.
Please add a link to Google Scholar and include a full list of publications when submitting your CV to this position.
The pay range for this position at commencement of employment is expected to be between $45 and $65/hour for California-based roles; however, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. Note that TRI offers a generous benefits package including vacation and sick time. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.
Please reference this Candidate Privacy Notice to inform you of the categories of personal information that we collect from individuals who inquire about and/or apply to work for Toyota Research Institute, Inc. or its subsidiaries, including Toyota A.I. Ventures GP, L.P., and the purposes for which we use such personal information.
TRI is fueled by a diverse and inclusive community of people with unique backgrounds, education and life experiences. We are dedicated to fostering an innovative and collaborative environment by living the values that are an essential part of our culture. We believe diversity makes us stronger and are proud to provide Equal Employment Opportunity for all, without regard to an applicant’s race, color, creed, gender, gender identity or expression, sexual orientation, national origin, age, physical or mental disability, medical condition, religion, marital status, genetic information, veteran status, or any other status protected under federal, state or local laws.