OpenAI, a non-profit artificial intelligence research company, has hobbled the publicly available version of a new AI-based text generator due to “concerns about malicious applications of the technology”.
The group revealed yesterday it had been training a large-scale unsupervised language model which has the ability to generate “coherent paragraphs of text” using a human prompt as a starting point.
The model, called GPT-2, is a largetransformer-based language model with 1.5 billion parameters, trained on a datasetof eight million web pages.
“GPT-2 generates synthetic text samples in response to the model being primed with an arbitrary input,” OpenAI – which is backed by the likes of Elon Musk and Peter Thiel – explained.
The group shared a number of examples of the model’s capabilities. From the input “Miley Cyrus was caught shoplifting from Abercrombie and Fitch on Hollywood Boulevard today,” it continues a readable story that could have easily have been written by a reporter. From the input “Legolas and Gimli advanced on the orcs, raising their weapons with a harrowing war cry,” it completes a short fantasy tale, complete with characters and dialogue.
“The model is chameleon-like mdash; it adapts to the style and content of the conditioning text. This allows the user to generate realistic and coherent continuations about a topic of their choosing,” OpenAi said.
Despite some limitations – such as repetitive text and what researchers refer to as ‘world modelling failures’ like fires happening under water – the model is capable of generating results that “feel close to human quality and show coherence over a page or more of text”.
Its capability – which shows an improvement on a number of domain-specific language models – is so impressive that OpenAI has decided to release a smaller, limited version of GPT-2.
“These samples have substantial policy implications: large language models are becoming increasingly easy to steer towards scalable, customised, coherent text generation, which in turn could be used in a number of beneficial as well as malicious ways,” the group said.
Potentially, it could be used to generate misleading news articles, impersonate others online, automate the production of abusive or faked content to post on social media or automate the production of spam and phishing content, researchers argued.
“These findings, combined with earlier results on synthetic imagery, audio, and video, imply that technologies are reducing the cost of generating fake content and waging disinformation campaigns. The public at large will need to become more sceptical of text they find online, just as the ‘deep fakes’ phenomenon calls for more scepticism about images,” they added.
The use cases are not all bad, however. OpenAI anticipated benefits in the near-term to applications like AI writing assistants, better dialogue agents, unsupervised translation between languages and better speech recognition systems.