Two of the important senses that humans rely on for exploring the world are sight and touch. Humans can combine the senses of sight and touch to know what object they are holding and seeing. Robots and AI systems are unable to do that. Researcher Yunzhu Li and his team from MIT are working on a system to help robots to bridge their sensory gap when they have been programmed to see or feel.
The team has come up with a system that creates tactile signals from visual inputs and predicts which object and what part is being touched from those tactile inputs. Researchers used their system with a KUKA robot arm and a special tactile sensor called GelSight designed by another MIT group.
The team used a web camera to record nearly 200 objects, including tools, household products, fabrics, and others being touched more than 12,000 times. They then broke those 12,000 video clips into static frames and compiled a VisGel dataset of more than 3 million visual/tactile-paired images.
The scientists say that by looking at the scene, their model can imagine the feeling of touching a flat surface or sharp edge. Li says that by blindly touching around, the model can predict the interaction with the environment from purely tactile feelings. He notes that by bringing the two senses together, they can empower the robot and reduce the data needed for tasks involving manipulation and grasping of objects.
The system the team has developed uses generative adversarial networks or GANs. GANs use visual or tactile images to generate images in the other modality. They use a generator and discriminator that compete with each other where the generator aims to create real-looking images to fool the discriminator. Each time the discriminator catches the generator, it has to expose the internal reasoning for the decision and allows the generator to improve. In the future, the team aims to improve its system using collected data in more unstructured areas.