Software has been known to find shortcuts or loopholes that maximise the size of the reward it receives
Software has been known to find shortcuts or loopholes that maximise the size of the reward it receives

Google hopes to prevent robot uprising with new AI training technique

Designed to discourage machines from cheating

Aatif Sulleyman
Monday 10 July 2017 12:39
comments

Google is developing a new system designed to prevent artificial intelligence from going rogue and clashing with humans.

It’s an idea that has been explored by a multitude of sci-fi films, and has grown into a genuine fear for a number of people.

Google is now hoping to tackle the issue by encouraging machines to work in a certain way.

The company’s DeepMind division, which was behind the AI that recently defeated Ke Jie, the world’s number one Go player, has teamed up with Open AI, a research group that’s part-funded by Elon Musk.

They’ve released a paper explaining how human feedback can be used to ensure machine-learning systems work things out the way in which their trainers want them to.

A technique called reinforcement learning, which is popular in AI research, challenges software to complete tasks, and rewards it for doing so.

However, the software has been known to cheat, by figuring out shortcuts or uncovering loopholes that maximise the size of the reward it receives.

In one instance it drove a boat around in circles in racing game CoastRunners, instead of actually completing the course because it knew it would still win a reward, reports Wired.

DeepMind and Open AI are trying to solve the problem by using human input to recognise when artificial intelligence complete tasks in the “correct” way, and then reward them for doing so.

“In the long run it would be desirable to make learning a task from human preferences no more difficult than learning it from a programmatic reward signal, ensuring that powerful RL systems can be applied in the service of complex human values rather than low-complexity goals,” reads the report.

Unfortunately, the improved reinforcement learning system is too time-consuming to be practical right now, but it gives us an idea of how the development of increasingly advanced machines and robots could be controlled in the future.

Join our new commenting forum

Join thought-provoking conversations, follow other Independent readers and see their replies

View comments