✍️
The Daily Ink
  • 🙇‍♂️AI research is getting crazy...
  • Paper Summaries
    • 👨‍⚕️[2/25/23] Anthropic makes AI that teaches itself ethics
    • 🪄[2/22/23] Models can magically learn new skills at scale
    • *️[2/21/23] Discovering a better optimization algorithm with evolution
    • 🔮[2/17/23] Talking to models requires special prompts that help them think sequentially
    • 🏹[2/15/23] Teaching LLMs to use tools and not suck at math
    • ➗[2/13/23] English is just math in prettier clothing
    • 👨‍💻[2/8/23] The secret to good writing is editing
    • 💢[2/6/23] Solving context length constraints by distillation
    • 🔭[2/3/23] A Large Language Model for SCIENCE
  • 🎇[2/1/23] Optimal parallelism in ML training is possible, says ALPA
  • 🎼[1/31/23] Google makes a language model for music
  • 🚒[1/27/23] Google's LaMDA model is too convincing, and a researcher is fired
  • 🤔[1/25/23] Teaching computers to think in abstractions
  • 🎩[1/23/23] The secret sauce behind ChatGPT
  • 📸[1/20/23] FlashAttention challenges ML researchers to think about systems-level improvements
  • ✂️[1/18/23] Make models smarter not larger, with data pruning
  • 🙅[1/16/23] DeepMind attempts to make AI that can do anything
  • 🐣[1/8/23] Chinchilla is the strongest animal/model in the world
  • ⬇️[1/5/23] Gradient descent-ing gradient descent
  • 🥘[1/3/23] Cramming: Training a Language Model on a Single GPU in One Day
  • 🗃️[1/1/23] A Neural Corpus Indexer for Document Retrieval
  • 👋[12/27/22] Can We Teach Machines to Act Like Humans? (And Should We?)
Powered by GitBook
On this page
  • Introduction and Motivation
  • Evaluation
  • Evaluation
  • Limitations and Future Work
  1. Paper Summaries

[2/8/23] The secret to good writing is editing

Lessons from InCoder: A Generative Model for Code Infilling and Synthesis

Previous[2/13/23] English is just math in prettier clothingNext[2/6/23] Solving context length constraints by distillation

Last updated 2 years ago

It seems like there’s a new model coming out every other day now. If you’ve ever seen a large language model answer a question, it does so in a left-to-right manner — recursively spitting out tokens. But the process of writing something well isn’t necessarily just that — it’s also editing.

This is especially true in programming, where tasks like adding comments, fixing bugs, or renaming variables necessitate editing. Moreover, programs are often not written top-down but have complex dependencies. Can a language model ever learn to capture this?

Introduction and Motivation

Enter Incoder.

The Core Idea: The model randomly replaces a portion of code with a special marker and moves it to the end of the line. Then, the model is trained to predict what the code looks like in its new order. When it's time to edit a program, the model replaces a portion of the code with the special marker and then creates new code to fill the void. This lets the model make changes to the program without having to start over from scratch.

This clever approach casts editing as a next-token generation problem. And preliminary results look quite good.

The model uses a “causal masked objective”, which allows it to conditionally mask tokens. INCODER 7.6B was trained on 248 V100 GPUs for 24 days, and 159GB of total code.

Evaluation

The paper does some interesting ablation studies and demonstrates the following:

  • A casual masked objective often does better than just a casual objective or just masked generation.

    • I must emphasize — often but not always (see picture below). The paper does not posit a theory for why left-to-right reranking sometimes does better. We theorize leaving this to future work.

The model uses a “causal masked objective”, which allows it to conditionally mask tokens. INCODER 7.6B was trained on 248 V100 GPUs for 24 days, and 159GB of total code.

Evaluation

The paper does some interesting ablation studies and demonstrates the following:

  • A casual masked objective often does better than just a casual objective or just masked generation.

    • I must emphasize — often but not always (see picture below). The paper does not posit a theory for why left-to-right reranking sometimes does better. We theorize leaving this to future work.

In the appendix, we see comparisons to actual equivalent models like Codex and CodeBERT. As mentioned in the image below, the comparison could be unfair, since Codex might contain CodeSearchNet (the test set) in its training set, causing data leakage.

Limitations and Future Work

  • First, the paper finally compares itself to Codex in the appendix and comes up short, citing that Codex could have the test set in its training data. I have two problems with this:

    • CodeBERT, lacking a similar problem, outperforms InCoder in certain languages, demonstrating better generalization.

    • If CodeSearchNet was not available, why was another test set not used?

  • The model compares to itself quite a bit and demonstrates many capabilities — variable name generation, return type generation, docstring generation, coding and so on. I would have appreciated seeing more comparisons with more SOTA models in these fields, to properly evaluate these techniques.

  • An interesting idea the data collection process of the model suggested — the model did much better when trained on data from StackOverflow. We could theorize that some combination of the explanations alongside the code allowed it to draw better correlations between natural language and programming. I would love to see this further explored.

In Summary — moving negative evaluations of your model to the appendix feels so common I should probably make it its own section in this newsletter (“Appendix Secrets”, perhaps?). I particularly dislike this trend in CS, because it teaches researchers to hide their flaws, and in my opinion, goes against the spirit of scientific inquiry.

However, I don’t mean to be cynical or belittle the accomplishments of the paper. Infilling is a very important next step, and InCoder is a step in the right direction in thinking about how to evolve language models beyond scale.

👨‍💻