Human knowledge is cumulative. We learn the alphabet so that we can learn to read and write, and we then use those (theoretical) literacy skills to write blogs on the internet. By retaining the knowledge we’ve acquired before, we are capable of greater intellectual feats.
By contrast, machine learning models have historically been trained for a single, specific task and no more. But a technique called transfer learning is changing this status quo.
In transfer learning, a machine learning model is trained on one kind of problem, and then used on a different but related problem, drawing on the knowledge it already has while learning its new task. This could be as simple as training a model to recognize giraffes in images, and then making use of this pre-existing expertise to teach the same model to recognize pictures of sheep.
Though it sounds simple, transfer learning is a powerful tool. Machine learning’s potential is often stymied by its heavy reliance on large amounts of high-quality data (for supervised learning this data must be well-labeled to boot). Unfortunately, these sorts of data sets are increasingly proprietary or prohibitively expensive to access—and that’s when the necessary data exists at all.
Transfer learning allows developers to circumvent the need for lots of new data. A model that has already been trained on a task for which labeled training data is plentiful will be able to handle a new but similar task with far less data.
There are other benefits to transfer learning as well. Using a pre-trained model often speeds up the process of training the model on a new task, and can also result in a more accurate and effective model overall.
Transfer learning is an increasingly critical component for new innovations. Sophisticated models such as deep neural networks typically require enormous resources, data, time, and computing power to create; with transfer learning, they become far more accessible. Computer vision and natural language processing—two more machine learning tasks notorious for their complexity—are also making increasing use of transfer learning.
Another growing usage of transfer learning is in learning from simulations. This is becoming the norm with self-driving cars, as allowing a completely untrained model to learn to drive with a real car poses obvious safety hazards. Transfer learning allows a model to first learn to drive in a virtual environment before ever handling an actual vehicle.
Like any technology, transfer learning is not without its challenges. Currently, one of the biggest limitations to transfer learning is the problem of negative transfer. Transfer learning only works if the initial and target problems are similar enough for the first round of training to be relevant. Developers can draw reasonable conclusions about what type of training counts as “similar enough” to the target, but the algorithm doesn’t have to agree.
If the first round of training is too far off the mark, the model may actually perform worse than if it had never been trained at all. Right now, there are still no clear standards on what types of training are sufficiently related, or how this should be measured.
Still, transfer learning holds massive potential going forward. Some have even posited that transfer learning will be the key to unlocking artificial general intelligence, or sentient, human-like AI.
They may well be right. So far, artificial intelligence has been successful only in specific, predefined tasks. But an AI that can learn cumulatively, generalizing its knowledge across domains—in other words, an AI that can learn like a human can—may be the first step towards the AI that populates our science fiction and our dreams.