✍️
The Daily Ink
  • 🙇‍♂️AI research is getting crazy...
  • Paper Summaries
    • 👨‍⚕️[2/25/23] Anthropic makes AI that teaches itself ethics
    • 🪄[2/22/23] Models can magically learn new skills at scale
    • *️[2/21/23] Discovering a better optimization algorithm with evolution
    • 🔮[2/17/23] Talking to models requires special prompts that help them think sequentially
    • 🏹[2/15/23] Teaching LLMs to use tools and not suck at math
    • ➗[2/13/23] English is just math in prettier clothing
    • 👨‍💻[2/8/23] The secret to good writing is editing
    • 💢[2/6/23] Solving context length constraints by distillation
    • 🔭[2/3/23] A Large Language Model for SCIENCE
  • 🎇[2/1/23] Optimal parallelism in ML training is possible, says ALPA
  • 🎼[1/31/23] Google makes a language model for music
  • 🚒[1/27/23] Google's LaMDA model is too convincing, and a researcher is fired
  • 🤔[1/25/23] Teaching computers to think in abstractions
  • 🎩[1/23/23] The secret sauce behind ChatGPT
  • 📸[1/20/23] FlashAttention challenges ML researchers to think about systems-level improvements
  • ✂️[1/18/23] Make models smarter not larger, with data pruning
  • 🙅[1/16/23] DeepMind attempts to make AI that can do anything
  • 🐣[1/8/23] Chinchilla is the strongest animal/model in the world
  • ⬇️[1/5/23] Gradient descent-ing gradient descent
  • 🥘[1/3/23] Cramming: Training a Language Model on a Single GPU in One Day
  • 🗃️[1/1/23] A Neural Corpus Indexer for Document Retrieval
  • 👋[12/27/22] Can We Teach Machines to Act Like Humans? (And Should We?)
Powered by GitBook
On this page
  • Introduction and Motivation
  • Evaluation
  • Criticism and Future Work

[1/23/23] The secret sauce behind ChatGPT

InstructGPT: Training language models to follow instructions with human feedback, OpenAI

Previous[1/25/23] Teaching computers to think in abstractionsNext[1/20/23] FlashAttention challenges ML researchers to think about systems-level improvements

Last updated 2 years ago

Introduction and Motivation

The standard method for training a large language model uses a “next token prediction” method. There are two core problems with this:

  1. While we’re optimizing for next tokens, what we really want is for these models to be able to follow instructions (like an assistant)

  2. This method doesn’t distinguish between important and unimportant mistakes. For example, swapping out glass with mug is fine, but non-flammable with inflammable very incorrect. During training, there isn’t a difference. This

The Problem: What these mean together is that they make models less aligned — they hallucinate, don’t follow instructions, and produce harmful and toxic content.

The Solution: To align this model with a human objective, we must get humans involved in the process. The authors fine-tune GPT3 using “Reinforcement Learning from Human Feedback”, which improves model performance.

The Technique: Get a lot of data labelers. Show them various options for the same question. Make them pick it. Set up a reward function such that the model learns to answer more like the chosen answer and less like the discarded ones.

As the pictogram suggests, there is also supervised learning involved in Step 1.

Evaluation

The results, tested on human “win rate” is staggering. A model that is 100x smaller (1.3B parameters) but trained using RLHF outperforms the standard 175B parameter GPT3.

In terms of further breakdowns, hallucinations are markedly lower, explicit constraints listed in prompts are followed more, and the model follows instructions more readily.

The paper also claims that the model seems to generalize better to undertrained tasks like writing in French or programming, though it only cites examples and no in-depth analysis of these.

Criticism and Future Work

  1. This methodology is more scalable than standard supervised learning, but still requires a lot of human effort as labelers. This would be firstly expensive, and secondly infeasible as the size and capabilities of our models grows.

This method of training models lives and dies by the data collected from the labelers. However, the results are certainly impressive, and bear interesting implications for companies trying to improve their models!

Models that are good at following instructions are also good at following bad instructions. If the premise of the question is false (ask: “What are the benefits of eating socks after showering?”, it would attempt to give you an answer to an absurdly false question). It also shows more toxic behavior than previous models on biased prompts:

🎩