ChatGPT is everywhere. Where did it come from?

1980s–90s: Recurrent Neural Networks

ChatGPT is a version of GPT-3, a large language model developed by OpenAI. Linguistic models are a type of neural network trained on large amounts of text. (Neural networks are software inspired by the way neurons in an animal’s brain communicate with each other.) Because text consists of words in alphabetical order and of varying lengths, language models require a type of neural network that can understand such information. . Recurrent neural networks invented in the 1980s can handle word sequences, but are slow to train and can forget previous words in the sequence.

In the year In 1997, computer scientists Sepp Hochreiter and Jürgen Schmiduber fixed this with an invention. LTSM (Long Short Term Memory) networks, recurrent neural networks with special features that allow past information to be retained in the input sequence for long periods of time. LTSMs could handle text strings several hundred words long, but their linguistic capabilities were limited.

2017: Transformers

The development of today’s generation of large language models was achieved when a team of Google researchers created transformers that could track where each word or phrase occurs in a sequence. The meaning of a word depends mostly on the meaning of other words that come before or after it. By keeping track of this contextual information, transformers can handle longer text strings and accurately capture the meaning of words. For example, “hot dog” means very different things in the sentences “hot dogs should be given plenty of water” and “hot dogs should be eaten with mustard.”

2018–2019: GPT and GPT-2

OpenAI’s first two major language models came a few months apart. The company wants to develop versatile, general-purpose AI and believes large language models are a key step toward that goal. GPT (short for Generative Pre-trained Transformer) was flagged, beating the then-leading standards for natural-language processing.

Combining GPT transforms with unsupervised learning, a method for training machine learning models on data (lots and lots of text on this subject) that is not defined beforehand. This allows the software to extract patterns in the data on its own without being told what to look for. Much of the previous success in machine learning has relied on supervised learning and annotated data, but manual labeling of data is a slow task, limiting the size of datasets available for training.

But it was GPT-2 that created the biggest buzz. OpenAI says it is concerned that people are using GPT-2 to “generate misleading, biased or abusive language” and is not releasing the full model. How times change.

2020: GPT-3

GPT-2 was amazing, but OpenAI’s follow-up made GPT-3 jaw-dropping. His ability to create human-like text was a huge leap forward. GPT-3 can answer questions, summarize documents, create stories in different styles, translate between English, French, Spanish, and Japanese, and more. The impersonation is unmistakable.

One of the most surprising takeaways is that GPT-3’s gains were made by adapting existing techniques rather than creating new ones. GPT-3 has 175 billion parameters (values ​​in the network that are adjusted during training) compared to GPT-2’s 1.5 billion. In addition, a lot of additional information has been provided.

Source link

Related posts

Leave a Comment

ten − 6 =