A junior student cracked an incredible AI model on his own: "I just spent the time I spent flirting with girls studying machine learning."

Latest update time：2021-09-01 21:29

Reads：

Wen Geng from Yanqi Lake
Produced by QuantumBit | Official Account: QbitAI

GPT-2, an incredible AI.

OpenAI first announced the existence of this model in February of this year. GPT-2 writes articles with a natural flow of ideas, and without requiring specialized training, it can master various domain-specific language modeling tasks. It also possesses capabilities such as reading comprehension, question answering, generating article summaries, and translation.

But unusually, this model was not actually open sourced. OpenAI explained that it was too powerful and they dared not release the complete model. Despite being ridiculed by the outside world for this, GPT-2 remains closed to this day.

Now, someone has single-handedly cracked the secret that OpenAI doesn't want to be known.

Moreover, he is a junior student.

Connor Leahy, a student from the Technical University of Munich, spent 200 hours and approximately 6,000 RMB in two months to reproduce the GPT-2 project.

This incident attracted a lot of attention on Twitter. Some praised it as awesome, some engaged in in-depth discussion, and even several senior researchers from OpenAI came to communicate.

Another admirable thing is that Connor Leahy learned all his knowledge about machine learning by himself in his free time. He described himself as a curious undergraduate.

"I just used the time other people would spend picking up girls to conduct AI experiments."

in a fit of anger

GPT-2 is OpenAI's best research achievement.

This model is an "evolution" of GPT, with the biggest difference being its size. GPT-2 has 1.5 billion parameters and was trained using a 40GB dataset containing 8 million web pages.

We train a Transformer on a large dataset in an unsupervised manner using language modeling as a training signal, and then fine-tune this model on a smaller supervised dataset to help it solve a specific task.

GPT model

OpenAI researchers said that GPT-2 has achieved excellent scores in language modeling tests on various domain-specific datasets. As a model that has not been specially trained on any domain data, its performance is better than those models built for specific domains.

In addition to being used for language modeling, GPT-2 can achieve excellent results in tasks such as question answering, reading comprehension, summary generation, and translation without fine-tuning.

After the release of GPT-2, Hinton, the father of deep learning, made his third comment since registering on Twitter: "This should allow Silicon Valley unicorns to generate better English."

For more information on the powerful performance of this model, please refer to Quantum Bit's previous coverage . I will not elaborate on it here.

In short, it’s just one word: strong.

Because of its strength, OpenAI made the difficult decision not to release the full model. They first released a small version, less than one-tenth the size, with 117 million parameters. After months of criticism, they released a medium version with 345 million parameters.

There's no doubt that GPT-2 piqued Connor Leahy's curiosity, but he was also angered by OpenAI's decision to keep the model private. "Information should be free."

So he decided to reproduce it himself.

He wasn't just acting on impulse. Connor Leahy has a long and detailed reflection on why he wanted to replicate GPT-2 in his blog, arguing that instead of fearing fake news fabricated by AI, it's better to take proactive action, raise awareness of the problem, confront it bravely, and then find solutions.

Of course there is another reason why he does this:

Very cool.

Reproduced version of GPT-2

“How do you know you’ve successfully replicated the 1.5 billion parameter GPT-2 model?”

I'm afraid most people would like to know the answer to this question.

Connor Leahy responded that the two models were of the same size and number of parameters, were trained based on similar data sources, used similar computing resources, and had similar output quality.

He also gave some clear differences between the two, such as:

1. Dropout and learning rate are not officially disclosed, so the settings may be different.

2. The model was trained using Adafactor instead of Adam. Connor Leahy couldn't figure out how to fit 1.5 billion parameters + Adam into the TPU, even with 16-bit precision.

Eh? Wait...

How can an ordinary junior student use TPU to conduct this kind of research?

Thanks Google.

Google has a Tensorflow Research Cloud (TFRC) program. This program provides researchers with a cluster of 1,000 Cloud TPUs, completely free of charge. This program is designed to support a variety of research projects that require large amounts of computation and cannot be achieved through other means.

When Connor Leahy was researching the reproduction of GPT-2, he encountered a bottleneck in computing resources. He casually mentioned it to the TFRC team, and ended up receiving generous support from Google.

In fact, before promoting this project, Connor Leahy had never used TPU before, so he warmly expressed his gratitude to the Google team in his blog.

However, he still spent about 600-800 euros (about 6,000 yuan) on the cloud to create datasets, test codes and run experiments.

The laptop he used was an old ThinkPad.

Connor Leahy also expressed his gratitude for the noise-canceling headphones: They kept me calm.

The GPT-2 rendition is now open source on GitHub, and the code can be run on GPUs, TPUs, and (not recommended) CPUs. The authors have released two versions: a small version with 117 million parameters and a version called PrettyBig, which is slightly larger than the official medium version with 345 million parameters and is the largest GPT-2 model currently available.

As for the full version with 1.5 billion parameters, the author plans to release it on July 1st.

Connor Leahy is currently inviting everyone to download and try out the GPT-2 replica, and to discuss with him how to improve it. In a blog post about this, he stated: "I'm 100% open to any errors you point out, so please contact me if you find any."

About the author and portal

Connor Leahy was admitted to the Technical University of Munich in Germany in 2017 and is currently a third-year computer science student. On LinkedIn, he describes himself as passionate about artificial intelligence.

From September 2018 to date, he has been interning at the Max Planck Institute, and is also using TPU from Google to study serious AI topics.

Finally, place the portal.

Connor Leahy's thoughtful blog post:

https://medium.com/@NPCollapse/gpt2-counting-consciousness-and-the-curious-hacker-323c6639a3a8

Meet him on GitHub:

https://github.com/ConnorJL/GPT2

-over-

AI Insider | Focus on industry development

AI Community | Communicate with outstanding people