Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Generative pre-trained transformer
A generative pre-trained transformer (GPT) is a type of large language model (LLM) that is widely used in generative AI chatbots. GPTs are based on a deep learning architecture called the transformer. They are pre-trained on large datasets of unlabeled content, and able to generate novel content.
OpenAI was the first to apply generative pre-training (GP) to the transformer architecture, introducing the GPT-1 model in 2018. The company has since released many bigger GPT models. The popular chatbot ChatGPT, released in late 2022 (using GPT-3.5), was followed by many competitor chatbots using their own "GPT" models to generate text, such as Gemini, DeepSeek or Claude.
GPTs are primarily used to generate text, but can be trained to generate other kinds of data. For example, GPT-4o can process and generate text, images and audio. To improve performance on complex tasks, some GPTs, such as OpenAI o3, spend more time analyzing the problem before generating an output, and are called reasoning models. In 2025, GPT-5 was released with a router that automatically selects whether to use a faster model or slower reasoning model based on task.
According to The Economist, improved algorithms, more powerful computers, and an increase in the amount of digitized material fueled a revolution in machine learning during the 2010s. New techniques in the years before the AI boom resulted in "rapid improvements in tasks", including manipulating language. Modern software models are trained to learn by using millions of examples in artificial neural networks that are inspired by biological neural structures.
Separately, the concept of generative pre-training (GP) was a long-established technique in machine learning. GP is a form of self-supervised learning wherein a model is first trained on a large, unlabeled dataset (the "pre-training" step) to learn to generate data points. This pre-trained model is then adapted to a specific task using a labeled dataset (the "fine-tuning" step).
The transformer architecture for deep learning is the core technology of a GPT. Developed by researchers at Google, it was introduced in the paper "Attention Is All You Need", which was published on June 12, 2017. The transformer architecture solved many of the performance issues that were associated with older recurrent neural network (RNN) designs for natural language processing (NLP). The architecture's use of an attention mechanism allows models to process entire sequences of text at once, enabling the training of much larger and more sophisticated models. Since 2017, numerous transformer-based NLP systems have been available that are capable of processing, mining, organizing, connecting, contrasting, and summarizing texts as well as correctly answering questions from textual input.
On June 11, 2018, OpenAI researchers and engineers published a paper called "Improving Language Understanding by Generative Pre-Training", which introduced GPT-1, the first GPT model. It was designed as a transformer-based large language model that used generative pre-training (GP) on BookCorpus, a diverse text corpus, followed by discriminative fine-tuning to focus on specific language tasks. This semi-supervised approach was seen as a breakthrough. Previously, the best-performing neural models in natural language processing (NLP) had commonly employed supervised learning from large amounts of manually labeled data – training a large language model with this approach would have been prohibitively expensive and time-consuming.
On February 14, 2019, OpenAI introduced GPT-2, a larger model that could generate coherent text. Created as a direct scale-up of its predecessor, it had both its parameter count and dataset size increased by a factor of 10. GPT-2 has 1.5 billion parameters and was trained on WebText, a 40-gigabyte dataset of 8 million web pages. Citing risks of malicious use, OpenAI opted for a "staged release", initially publishing smaller versions of the model before releasing the full 1.5-billion-parameter model in November.
Hub AI
Generative pre-trained transformer AI simulator
(@Generative pre-trained transformer_simulator)
Generative pre-trained transformer
A generative pre-trained transformer (GPT) is a type of large language model (LLM) that is widely used in generative AI chatbots. GPTs are based on a deep learning architecture called the transformer. They are pre-trained on large datasets of unlabeled content, and able to generate novel content.
OpenAI was the first to apply generative pre-training (GP) to the transformer architecture, introducing the GPT-1 model in 2018. The company has since released many bigger GPT models. The popular chatbot ChatGPT, released in late 2022 (using GPT-3.5), was followed by many competitor chatbots using their own "GPT" models to generate text, such as Gemini, DeepSeek or Claude.
GPTs are primarily used to generate text, but can be trained to generate other kinds of data. For example, GPT-4o can process and generate text, images and audio. To improve performance on complex tasks, some GPTs, such as OpenAI o3, spend more time analyzing the problem before generating an output, and are called reasoning models. In 2025, GPT-5 was released with a router that automatically selects whether to use a faster model or slower reasoning model based on task.
According to The Economist, improved algorithms, more powerful computers, and an increase in the amount of digitized material fueled a revolution in machine learning during the 2010s. New techniques in the years before the AI boom resulted in "rapid improvements in tasks", including manipulating language. Modern software models are trained to learn by using millions of examples in artificial neural networks that are inspired by biological neural structures.
Separately, the concept of generative pre-training (GP) was a long-established technique in machine learning. GP is a form of self-supervised learning wherein a model is first trained on a large, unlabeled dataset (the "pre-training" step) to learn to generate data points. This pre-trained model is then adapted to a specific task using a labeled dataset (the "fine-tuning" step).
The transformer architecture for deep learning is the core technology of a GPT. Developed by researchers at Google, it was introduced in the paper "Attention Is All You Need", which was published on June 12, 2017. The transformer architecture solved many of the performance issues that were associated with older recurrent neural network (RNN) designs for natural language processing (NLP). The architecture's use of an attention mechanism allows models to process entire sequences of text at once, enabling the training of much larger and more sophisticated models. Since 2017, numerous transformer-based NLP systems have been available that are capable of processing, mining, organizing, connecting, contrasting, and summarizing texts as well as correctly answering questions from textual input.
On June 11, 2018, OpenAI researchers and engineers published a paper called "Improving Language Understanding by Generative Pre-Training", which introduced GPT-1, the first GPT model. It was designed as a transformer-based large language model that used generative pre-training (GP) on BookCorpus, a diverse text corpus, followed by discriminative fine-tuning to focus on specific language tasks. This semi-supervised approach was seen as a breakthrough. Previously, the best-performing neural models in natural language processing (NLP) had commonly employed supervised learning from large amounts of manually labeled data – training a large language model with this approach would have been prohibitively expensive and time-consuming.
On February 14, 2019, OpenAI introduced GPT-2, a larger model that could generate coherent text. Created as a direct scale-up of its predecessor, it had both its parameter count and dataset size increased by a factor of 10. GPT-2 has 1.5 billion parameters and was trained on WebText, a 40-gigabyte dataset of 8 million web pages. Citing risks of malicious use, OpenAI opted for a "staged release", initially publishing smaller versions of the model before releasing the full 1.5-billion-parameter model in November.