✍️
The Daily Ink
  • 🙇‍♂️AI research is getting crazy...
  • Paper Summaries
    • 👨‍⚕️[2/25/23] Anthropic makes AI that teaches itself ethics
    • 🪄[2/22/23] Models can magically learn new skills at scale
    • *️[2/21/23] Discovering a better optimization algorithm with evolution
    • 🔮[2/17/23] Talking to models requires special prompts that help them think sequentially
    • 🏹[2/15/23] Teaching LLMs to use tools and not suck at math
    • ➗[2/13/23] English is just math in prettier clothing
    • 👨‍💻[2/8/23] The secret to good writing is editing
    • 💢[2/6/23] Solving context length constraints by distillation
    • 🔭[2/3/23] A Large Language Model for SCIENCE
  • 🎇[2/1/23] Optimal parallelism in ML training is possible, says ALPA
  • 🎼[1/31/23] Google makes a language model for music
  • 🚒[1/27/23] Google's LaMDA model is too convincing, and a researcher is fired
  • 🤔[1/25/23] Teaching computers to think in abstractions
  • 🎩[1/23/23] The secret sauce behind ChatGPT
  • 📸[1/20/23] FlashAttention challenges ML researchers to think about systems-level improvements
  • ✂️[1/18/23] Make models smarter not larger, with data pruning
  • 🙅[1/16/23] DeepMind attempts to make AI that can do anything
  • 🐣[1/8/23] Chinchilla is the strongest animal/model in the world
  • ⬇️[1/5/23] Gradient descent-ing gradient descent
  • 🥘[1/3/23] Cramming: Training a Language Model on a Single GPU in One Day
  • 🗃️[1/1/23] A Neural Corpus Indexer for Document Retrieval
  • 👋[12/27/22] Can We Teach Machines to Act Like Humans? (And Should We?)
Powered by GitBook
On this page
  • Introduction and Motivation
  • Development Details
  • Evaluation
  • Limitations and Future Work

[1/16/23] DeepMind attempts to make AI that can do anything

A Generalist Agent

Previous[1/18/23] Make models smarter not larger, with data pruningNext[1/8/23] Chinchilla is the strongest animal/model in the world

Last updated 2 years ago

Introduction and Motivation

The future of AI is largely believed to be multi-modal— instead of acting on only text, or only images, or only sounds, future AI would be able to interact with all of these in all just one model. DeepMind tries to achieve that in Gato, “a generalist agent”.

Motivation: Larger, more general, models tend to outperform specific, smaller ones with enough data. Can there be a generalist agent that is capable of a number of tasks, and can be adapted in a little data to be capable of an even larger number of tasks?

Development Details

Core guiding principle: Train Gato on the widest variety of data possible. Convert all data into a flat sequence and train on it, much like a LLM.

One of the goals is to maximize “out of distribution transfer” — being able to learn tasks outside of its training distribution faster.

The agent is finally used as a control policy — it accepts some “fixed prompt” (unchanging environment data) and an additional prompt (“observations”), predicts an action, which is used to update the environment and get the next observation.

The data distribution is very heavily skewed to learning in 2D and 3D control environments, with only about 15% being text and images.

Evaluation

The biggest complaint levied against Gato is that it’s results feel somewhat subpar, compared to the latest advances in each of the specific fields it attempts to generalize over.

Here are Gato’s (less than stellar) attempts to caption images:

But Gato is quite impressive in robotic tasks. The statistics below say that Gato is roughly 50% as well as expert demonstrations.

Also realize that Gato is significantly larger than the baseline (BC-IMP) comparison in terms of parameters or training data. This is therefore somewhat of an unfair comparison.

The hope with Gato was that there would be positive transfer— seeing a lot of different tasks would help fulfill specific tasks easier. The data seems to suggest like this isn’t the result.

Instead, some of this data suggests that there could be negative transfer — generalizing making it harder to perform specific tasks. Note some “all data” does worse than “same domain only data” below:

Limitations and Future Work

  • Gato does not yet output image tokens or non-textual observations, but there is no reason it could not do so.

  • Gato learns using imitation learning, and thus relies heavily on high-quality expert demonstration data for a vast majority of its tasks.

  • The model currently has a prompt length limited to 1024 tokens. As mentioned previously, when the model is used as an RL agent, this is often a bottleneck.

  • In a lot of the comparisons to baseline, the size of the baseline model is not mentioned, which could make any comparisons unfair.

  • The paper also does not explicitly highlight the potential “negative transfer” of the model — the data that suggests such negative effects can only be found in the appendix of the paper.

While I appreciate the motivation of the paper, I think it makes the crucial error of attempting to brush over weaknesses to highlight it’s strengths (something very common in CS research). I am not the biggest fan of such an approach.

🙅