Artificial Intelligence made promises that it couldn’t keep for over five decades. But that has all changed with the advent of Deep Learning, which has been delivering staggering results in the past few years.
The field of Deep Learning is accelerating as researchers try bolder and bolder experiments. Some of the applications of Deep Learning, while predictable, nevertheless shine in their now superhuman performance. Deep Learning networks are beating humans on standardized tests in automatic translation, image tagging and speech recognition.
This is just the start. More recent applications of Deep Learning give a taste of what is possible: colorizing black-and-white pictures, generating images or text in a certain style, discovering new drugs, interpreting medical images, mastering vintage Atari games and the much harder game of Go are all being accomplished successfully by Artificial Intelligence with Deep Learning.
At this point, the main limitation to the field of Artificial Intelligence is our own imagination.
In my previous post I covered the history and basic facts about Artificial Intelligence. In this post, we are diving into what you need to know to make Deep Learning work for you.
Real world problems
The first thing to understand is that Deep Learning can be applied to many real-world problems. Here are some examples of real-world problems that would lend themselves to Deep Learning solutions:
- You run a micro-lending site, you have collected a lot of data about your users and are attempting to predict which ones might default. Doing this with a set of hand-coded rules is getting harder and harder, and your success rate is not getting better.
- Users ask questions in your help center, and you’d like to automatically address some of them with canned responses, while routing the difficult ones to human operators.
- You’d like to analyze the sentiment of tweets, comments and feedback about your company. Do users like your products and services? How many complain?
- Your company sells yet another baby monitor, and you would like to differentiate yourself from all the other baby monitors on the market. You want to be able to analyze the video feed from the monitor and return a continuous narration of what the baby is doing: sleeping, waking up, crying, tossing. You want to estimate breathing rate and throw alerts if something is wrong.
In all these cases you could switch to a Deep Learning model, feed in all the data indiscriminately, and see your predictions, satisfaction scores and understanding of your users improve! The last example of the baby monitor is a real company that I know to be in stealth mode at the moment.
Choose your vertical
Many of the largest problems in computer science are being tackled with Deep Learning by some of the world’s biggest software companies: computer vision, speech recognition and face detection. Google has turned the photo album on its head with Google Photos which does a great job automatically tagging your photos. Baidu has poured a lot of effort into Deep Speech 2, which is probably the best voice recognition at the moment, because they estimated that it would positively affect the lives of so many people. Facebook has amazing face recognition used to tag friends in over one billion user photos. All of these applications are possible because the models were trained on a lot of hardware using massive amount of data. If you don’t have access to the same massive datasets, then competing directly in these areas will be very difficult.
The secret is to go into more specialized verticals. These niches can be broken down by industry, by geography, by type of user… Each niche requires specific data, and this is your chance to shine: specialize, put together the data to solve exactly your problem, and use Deep Learning to get a leg up on any competition.
5 ingredients for Deep Learning
While the theory of connecting simplistic software neurons goes back to the 1950’s, the correct math needed to train them only began to be understood in the 1980’s and was gradually refined as more computing power became available to run larger and larger experiments. This math is now embedded in the various libraries that are used universally, so you don’t need any understanding of the theory to implement Deep Learning tasks. You can safely leave that part to the researchers.
The computing power needed to train a network is high: billions of additions and multiplications per second, over and over. Training even a modest network used to take weeks on a regular computer, because a processor like the one in a laptop, desktop or server can only perform a handful of these operations at a time. It turns out that the world of gaming has the exact same requirement for high speed calculations, and a much better solution: Graphics Processing Units, or GPUs, the specialized chips that gamers have been buying for years in the form of dangerous-looking graphic cards. These processors can handle thousands of operations at the same time as they are highly parallelized. Around 2010 came the realization that these same processors can handle Deep Learning computations, and the field got an overnight performance boost of a couple of orders of magnitude. For extra convenience, cloud services like AWS now rent access to this type of machine, and you can be in business with the wave of a credit card.
Deep Learning networks are specialized pieces of software, but nobody writes this software from scratch anymore. Instead academia (the University of Montreal, Berkeley, …) and large companies (Google, Facebook, Baidu, …) are competing to release the best libraries. Some of them can be intimidating, but high-level frameworks like Keras turn it into a game of Lego. These frameworks come with examples of working neural networks to solve all kinds of problems, like sentiment analysis on text or image classification. Download a model and tinker with it until it solves your problem. All for free.
To get started, you simply need to be, or have access to, a person with moderate programming skills, say someone comfortable with Python (but no Ph.D. required). You don’t need much expertise because the Deep Learning community is vibrant, growing, and unique for its level of cross-pollination. Every day hundreds of meetups are held throughout the world to discuss just about any subject related to Deep Learning. Most research papers are immediately shared before they go through a peer-review process, and come with code and data so anyone can reproduce the results. In forums no question is too dumb, and experts who started just a few months ago jump in and take time to guide the newbies. This virtuous cycle causes newcomers to become fluent in a short amount of time and want to give back.
This is the key ingredient, as attested by the often repeated phrase “data is the new oil”. The central idea of Machine Learning is that we don’t code what we want, instead the machine learns from examples. Most commercial applications of Deep Learning are of the supervised type, and require millions of examples for training: a lot of text, many images or voice samples, or anything else that your application needs to manipulate. With more data comes more precision, and more opportunities to model an arbitrarily complex problem and cover more corner cases.
Say that you need an image detector for your new autonomous car start-up. You want to teach your vehicle to get out of the way of fire trucks (amongst other things). If you train a visual model on a few red fire trucks, it will assume that all fire trucks are red. But depending on where in the world you sell your cars, they will encounter green, yellow, blue, black and purple fire trucks. So instead you should collect many thousands of images of fire trucks, all colors and sizes and shapes will likely be represented, and the newly trained fire-truck detector will do a much better job. More data is always better.
So how do you get the data you need to train your model? It depends.
The easiest path is to find an already-trained model that does the job. Academia and large companies active in the Deep Learning space regularly release pre-trained models; just search for ‘model zoo’. One of these could work out of the box and solve your problem. Or it may require a bit more work: for example if you are developing an image classifier you could start from a complete model trained on millions of images over days or weeks, and customize it with a little more training to cover your own categories.
Another lucky strike is when the data you need already exists as a publicly available dataset. If you need a large amount of text across just about any subject, Wikipedia and news articles are a great start. Common Crawl is another amazing resource: it is a large crawl of the Web freely available to anyone. There are many other freely available datasets, and your data could be just one download away. These datasets are usually not very large or recent, but they can be useful.
Larger and fresher datasets may require payment, but most of them are out of reach. Some domains simply don’t share data, like healthcare or finance, due to privacy concerns. Most of the large players in the Internet space have collected a huge amount of personal and historical data but are unlikely to share it, because it is their secret sauce. One nice exception is Google which is releasing very large datasets, all derived from the Web: the Google N-grams released in 2006 captures statistics on words and groups of words from a trillion words from the Web; and in the past month Google has released Open Images (6 million images in 6000 categories) and YouTube-8M (8M videos with labels from 5000 categories).
If none of this works for you, the next logical step is to tap into the most amazing source of data: the Web itself. A titanic amount of data is openly available, hidden behind a thin presentation layer. This is what Import.io is all about.
Deep Learning can be applied to a large number of real-life problems, odds are good that your company could benefit from it. Most of what you need is freely available, and members of the community can help get you started. Pick your problem, carefully think of the data that you will need and collect it. The rest will follow.