GPT-2

Hub AI

GPT-2 AI simulator

(@GPT-2_simulator)

Hub AI

GPT-2 AI simulator

(@GPT-2_simulator)

Wikipedia

Grokipedia

Generative Pre-trained Transformer 2 (GPT-2) is a large language model by OpenAI and the second in their foundational series of GPT models. GPT-2 was pre-trained on a dataset of 8 million web pages. It was partially released in February 2019, followed by full release of the 1.5-billion-parameter model on November 5, 2019.

GPT-2 was created as a "direct scale-up" of GPT-1 with a ten-fold increase in both its parameter count and the size of its training dataset. It is a general-purpose learner and its ability to perform the various tasks was a consequence of its general ability to accurately predict the next item in a sequence, which enabled it to translate texts, answer questions about a topic from a text, summarize passages from a larger text, and generate text output on a level sometimes indistinguishable from that of humans; however, it could become repetitive or nonsensical when generating long passages. It was superseded by the GPT-3 and GPT-4 models, which are no longer open source.

GPT-2 has, like its predecessor GPT-1 and its successors GPT-3 and GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model, which uses attention instead of older recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. This model allows for greatly increased parallelization, and outperforms previous benchmarks for RNN/CNN/LSTM-based models.

Since the transformer architecture enabled massive parallelization, GPT models could be trained on larger corpora than previous NLP (natural language processing) models. While the GPT-1 model demonstrated that the approach was viable, GPT-2 would further explore the emergent properties of networks trained on extremely large corpora. CommonCrawl, a large corpus produced by web crawling and previously used in training NLP systems, was considered due to its large size, but was rejected after further review revealed large amounts of unintelligible content. Instead, OpenAI developed a new corpus, known as WebText; rather than scraping content indiscriminately from the World Wide Web, WebText was generated by scraping only pages linked to by Reddit posts that had received at least 3 karma prior to December 2017. The corpus was subsequently cleaned; HTML documents were parsed into plain text, duplicate pages were eliminated, and Wikipedia pages were removed (since their presence in many other datasets could have induced overfitting).

While the cost of training GPT-2 is known to have been $256 per hour, the amount of hours it took to complete training is unknown; therefore, the overall training cost cannot be estimated accurately. However, comparable large language models using transformer architectures have had their costs documented in more detail; the training processes for BERT and XLNet consumed, respectively, $6,912 and $245,000 of resources.

GPT-2 was first announced on 14 February 2019. A February 2019 article in The Verge by James Vincent said that, while "[the] writing it produces is usually easily identifiable as non-human", it remained "one of the most exciting examples yet" of language generation programs:

Give it a fake headline, and it’ll write the rest of the article, complete with fake quotations and statistics. Feed it the first line of a short story, and it’ll tell you what happens to your character next. It can even write fan fiction, given the right prompt.

The Guardian described this output as "plausible newspaper prose"; Kelsey Piper of Vox said "one of the coolest AI systems I’ve ever seen may also be the one that will kick me out of my job". GPT-2's flexibility was described as "impressive" by The Verge; specifically, its ability to translate text between languages, summarize long articles, and answer trivia questions were noted.

See all