The idea of a rewards-based learning system is something most people can likely relate to. Any dog owner has experienced how much more likely their pet is to perform a trick when it realizes it will get a treat. A reward for an A.I. is similar. A technique often used in designing artificial intelligence is reinforcement learning, which offers a way to teach machines to do things that would be difficult to achieve through explicit instruction.
With reinforcement learning, when the A.I. takes an action, it receives either positive or negative feedback based on the parameters you've given it, and then tries to optimize its actions to receive more positive rewards. However, the reward can’t just be programmed into the A.I. - it has to interact with its environment to learn which actions will be considered good, bad, or neutral. Again, the idea is similar to a dog learning that tricks can earn it treats or praise, but misbehaving could result in punishment.
The technique was employed by AlphaGo, a program developed by Google researchers to play the ancient board game Go. While the rules of Go are simple, it is hard to explain how to play well, and players normally develop an intuitive aptitude through many hours of practice. The system learned by giving the A.I. agent a positive reward every time it created a successful sequence.
What's even more interesting is when you marry these techniques to try to produce creative outcomes from A.I.s. “There is no reason why machines cannot be curious and creative,” says Jürgen Schmidhuber, a professor at the University of Lugano in Switzerland who performed pioneering research on the type of neural networks used by Google’s researchers, and who has experimented with creativity using reinforcement learning.
Magenta is a Google Brain / Tensorflow project that aims to answer questions around whether A.I. can produce compelling creative outcomes. Watching these experiments continue to unfold and seeing machine learning gain new layers of complexity is fascinating. How will you and your organisation prepare for the onset of evolving machine intelligence?
They use an approach known as reinforcement learning to add simple principles of music theory—avoid repeating a refrain too often, do not play too quickly or slowly, and so on—to the overall learning process. The network receives a positive reward every time it produces a sequence of notes that not only resembles the patterns seen in previous songs, but also adheres to the musical rules it has been given.