C
ChaoBro

Kaiming He's Team Releases ELF: Diffusion Language Models in Continuous Embedding Space

Kaiming He's Team Releases ELF: Diffusion Language Models in Continuous Embedding Space

Diffusion models have firmly established themselves in image and video generation, but have struggled in language modeling.

The reason is straightforward—image and video data is naturally continuous, while language is discrete. A character is a token; there's no "half a character." Previous diffusion language models mostly operated in discrete token space, with compromised results.

Meta FAIR's Kaiming He team submitted a paper with a direct approach: if discrete space doesn't work, don't work in discrete space.

ELF's Core Idea

The paper is called ELF (Embedded Language Flows), and the core operation is:

  1. Map text into continuous embedding space
  2. Run Flow Matching (continuous-time flow matching) in embedding space
  3. Only at the final step, map back to discrete tokens

This "staying in continuous space until the last step" is key. Previous methods jumped back and forth between continuous and discrete during diffusion—like constantly switching between land and water. ELF stays in continuous space and only discretizes at output.

This approach also has another advantage: mature techniques from image diffusion models can be directly ported over, like classifier-free guidance (CFG). Doing CFG in discrete space requires various hacks; in continuous embedding space, almost no modifications are needed.

Experimental Results

The paper's conclusion is straightforward: ELF substantially outperforms existing discrete and continuous DLMs in generation quality, with fewer sampling steps.

Fewer sampling steps means lower inference cost—diffusion models have long been criticized for slow inference, and reducing steps is a real speedup.

Author Lineup

Authors include Keya Hu, Linlu Qiu, Yiyang Lu, Hanhong Zhao, Tianhong Li, Yoon Kim, Jacob Andreas, and Kaiming He. From Meta FAIR, MIT, and other institutions.

Since transitioning from CV to general AI research, He's team has been exploring non-autoregressive language modeling approaches. ELF is another attempt in this direction.


Primary sources: