Neural Networks have the potential to carry out tremendous tasks with high accuracy. Thus, many data-centric teams have been actively researching how to use neural networks in the best way possible for their domain.
The robotics team from Google is one of such teams who has been able to adapt the use of neural networks in their domain for a task called Robotic Grasping. Robotic Grasping is nowadays widely used in warehouses and factories for grabbing objects and placing it where required.
This article is based around Alex Irpan’s talk on Real-World Robot Learning in the TensorFlow Dev Summit 2018. Here’s the full video of the talk:
Learning to grasp objects using an arm farm
The goal of robot learning is to use machine learning to learn robot skills in the general environment.
One such goal that Alex talks about is Robotic Grasping. The problem is to command a robot arm to grasp objects from a bin using a single-viewpoint RGB image. This means that the robot should learn how to co-ordinate hand-eye movement to select motion commands that successfully pick up objects.
Since data collection is a lot slower using only one arm, Alex explains that they used an arm farm to collect more data.
The arm farm collected over a million grasp attempts in which thousand robot hours were spent.
Making data collection faster using simulation
Since it took a lot of time to collect data through a physical arm farm, Alex’s team shifted towards using simulation for easier scalability for data collection. With this setup, they were able to record millions of grasps in hours instead of weeks.
The neural network trained on this data was able to hit 90% grasp success. However, when they used this model in the real world, it only was able to grasp successfully 23% of the time.
This was certainly not what they were hoping for.
Since the simulated model wasn’t able to perform up to par in the real world, the team started using simulated data to improve real-world sample efficiency, also called, Sim-to-Real transfer.
Two approaches that can be taken for Sim-to-Real Transfer are:
- Add more randomization in simulation by changing the texture, colours, lightings, etc.
- Use domain adaptation on data from two different domains that have a common structure but are still different.
Using Domain-Adversarial Neural Networks for feature-level domain adaptation
The domain-adversarial neural networks or DANNs (Ganin et al, JMLR 2016) uses end-to-end learning of domain-invariant features, by training a model with an adversarial domain classifier.
This means that both the simulated and the real data are taken and the same model is trained on both datasets. Then, an intermediate feature layer is added with a similarity loss and the similarity loss is responsible for affecting the behaviour of feature distribution to be same across both domains.
DANNs implement the similarity loss as a small neural network that tries to predict the domain based on the input feature it receives and the rest of the model tries to confuse the domain classifier as much as possible.
Using Generative Adversarial Networks for pixel-level domain adaptation
The basic idea is to create realistic-looking images with the help of real-world images by altering the simulation images using Generative Adversarial Networks.
GANs are responsible for generating real-like version of simulation images based on real images itself. The output of the generator is then used to train the task model.
GraspGAN: Combining feature-level and pixel-level methods
According to the presentation, ‘feature-level methods can learn domain-invariant features on data from related domains that aren’t identical and pixel-level methods can transform data to look identical to real data but they do not work perfectly’.
Using a combination of both of these methods, Alex’s team came up with GraspGAN. The results from the GraspGAN are as shown in the picture below.
The result shows that they were able to get the accuracy to about 80% with their method. They were also able to see that their 188k simulated + real-world sample data performed as good as 9.4M real-world sample data.
The full paper can be found here.
What do you think about this method for robotic grasping? Do you want to share any other methods such as this? Let us know in the comments.