OpenAI, a nonprofit artificial intelligence research group, said their GPT-2 software is so good they are worried it could be misused.
The software generates coherent text, and can be prompted to write on certain subjects or in a certain style by feeding it paragraphs of source material.
The algorithm was trained on eight million web pages and the results are far better than any previous attempt at computer text-generation, where odd syntax changes and rambling nonsense have been difficult to iron out.
The success of the software has seen it dubbed “deepfakes for text”, and among the core concerns are that it could be used to generate unstoppable quantities of fabricated news or impersonate people online.
In a blog on the results, OpenAI provided examples of the prose the software generated.
Here is the human-written source text prompt they fed it: “In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.”
The software then carried on writing the piece, including its own invented quotes.
It wrote: “The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.
"Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.
“Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.”
The software reportedly took 10 attempts to produce this coherent example.
OpenAI said: “Overall, we find that it takes a few tries to get a good sample, with the number of tries depending on how familiar the model is with the context. When prompted with topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on), it seems to be capable of generating reasonable samples about 50 per cent of the time.”
Worries over how the product could be used mean at this stage the company has only released a smaller version of the software.
"Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version,” the company said. “We are not releasing the dataset, training code, or GPT-2 model weights.”
OpenAI also suggested government policy could be required to address some of the issues, and thereby allow further progression in the field.
“Governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems,” they said.
Join our commenting forum
Join thought-provoking conversations, follow other Independent readers and see their replies