Listen to this text |
A analysis mission led by USC pc science pupil Sumedh A. Sontakke needs to open the door for robots to be caregivers for ageing populations. The staff claims the RoboCLIP algorithm, developed with assist from Professor Erdem Biyik and Professor Laurent Itti, permits robots to carry out new duties after only one demonstration.
RoboCLIP solely must see one video or textual demonstration of a process for it to carry out the duty two or 3 times higher than different imitation studying (IL) fashions, the staff claimed.
“To me, the most impressive thing about RoboCLIP is being able to make our robots do something based on only one video demonstration or one language description,” stated Biyik, a roboticist who joined USC Viterbi’s Thomas Lord Department of Computer Science in August 2023 and leads the Learning and Interactive Robot Autonomy Lab (Lira Lab).
The mission began two years in the past when Sontakke realized how a lot knowledge is required to have robots carry out fundamental family duties.
“I started thinking about household tasks like opening doors and cabinets,” Sontakke stated. “I didn’t like how much data I needed to collect before I could get the robot to successfully do the task I cared about. I wanted to avoid that, and that’s where this project came from.”
How does RoboCLIP work?
Most IL fashions learn to full duties by trial and error. The robotic performs the duty again and again to get a reward when it lastly completes the duty. While this may be efficient, it requires large quantities of time, knowledge, and human supervision to get the robotic to efficiently carry out a brand new process.
“The large amount of data currently required to get a robot to successfully do the task you want it to do is not feasible in the real world, where you want robots that can learn quickly with few demonstrations,” Sontakke stated in a launch.
RoboCLIP works otherwise than typical IL fashions, because it incorporates the most recent advances in generative AI and video-language fashions (VLMs). These programs are pre-trained on giant quantities of video and textual demonstrations, in response to Biyik.
The researchers claimed RoboCLIP performs properly out of the field to carry out family duties, like opening and shutting drawers or cupboards.
“The key innovation here is using the VLM to critically ‘observe’ simulations of the virtual robot babbling around while trying to perform the task, until at some point it starts getting it right – at that point, the VLM will recognize that progress and reward the virtual robot to keep trying in this direction,” Itti stated.
According to Itti, the VLM can inform it’s getting nearer to success when the textual description it creates observing the robotic comes nearer to what the person needs.
“This new kind of closed-loop interaction is very exciting to me and will likely have many more future applications in other domains,” Itti stated.
Submit your nominations for innovation awards within the 2024 RBR50 awards.
What’s subsequent?
Sontakke hopes that this system may sometime assist robots look after ageing populations, or result in different purposes that would assist anybody. The staff says that future analysis can be mandatory earlier than the system is able to tackle the actual world.
The paper, titled RoboCLIP: One Demonstration is Enough to Learn Robot Policies, was introduced by Sontakke on the thirty seventh Conference on Neural Information Processing Systems (NeurIPS), Dec. 10-16 in New Orleans.
Collaborating with Sontakke, Biyik and Itti on the RoboCLIP paper have been two USC Viterbi graduates, Sebastien M.R. Arnold, now at Google Research, and Karl Pertsch, now at UC Berkeley and Stanford University. Jesse Zhang, a fourth-year Ph.D. candidate in pc sciences at USC Viterbi, additionally labored on the RoboCLIP mission.