Research Paper overview

How do we get machines to learn and act more like like humans and animals particularly being able to reason and plan?

Here’s our review of the 2022 Yann LeCun technical paper A Path Towards Autonomous Intelligence (LeCun, 2022).

Why is paper important?

It’s quite rare finding a single author paper grab so much attention. Yann LeCun

Meta Chief AI Scientist
Turing Award winner
Meta has continued to progress this idea and has since published
- I-JEPA (a foundation model for various kinds of image tasks, Assran et al, 2023) and
- V-JEPA (a foundation model for video tasks, Bardes et al, 2024).

despite all the hype around ChatGPT and large language models, it’s still bad.

Not because it doesn’t work—it clearly does. But compared to how a human child learns about the world, our most advanced AI systems are embarrassingly inefficient. A teenager learns to drive in 20 hours. We’ve poured billions into self-driving cars for over a decade. A 10-year-old can clear a dinner table and load a dishwasher. No robot can reliably do this. An 8-month-old discovers gravity by repeatedly dropping toys from a high chair. GPT-4 has processed more text than a human could read in 22,000 years, yet it doesn’t truly understand that objects fall.

This is Moravec’s Paradox in action.

Computers excel at what humans find hard (chess, calculus, generating fluent text) but fail spectacularly at what we find trivially easy (understanding that objects persist when hidden, navigating a cluttered room, having common sense).

The Core Problem: Why LLMs Can Never Truly Understand

Large Language Models like ChatGPT operate through what he calls “auto-regressive generation”, predicting one token at a time with a fixed amount of computation per token. This creates several critical limitations:

Exponential Divergence Problem
- Imagine all possible text sequences as a tree.
- Each token you generate has a small probability ε of being wrong.
- The probability of generating n correct tokens in sequence is (1-ε)^n—an exponentially decaying function
- You can make ε smaller with more data and larger models, but you can’t fix the exponential decay
World Model
- LLMs are “purely trained on text” with “no knowledge of the underlying reality.”
- When asked if a vector multiplied by a positive semi-definite matrix can rotate more than 90 degrees, a human visualizes the transformation.
- An LLM has no such mental model—it can only pattern-match against text it’s seen.
System 1 Thinking
- Using Kahneman’s framework, current AI is stuck in System 1—reactive, immediate responses.
- There’s no System 2—no deliberation, no planning, no reasoning through multiple possibilities.
- LLMs can’t pause to think harder about difficult problems; they use the same computation for easy and hard questions alike.

JEPA: The Architecture

JEPA (Joint Embedding Predictive Architecture) represents a different way to think about machine intelligence.
Rather than generating outputs directly, JEPA:

Encodes observations into abstract representations
Predicts future representations in this latent space
Uses energy functions instead of probabilities
Employs regularization instead of contrastive learning

Given two compatible inputs (x and y), for example 2 consecutive video frames or two parts of an image:

Encoder_x transforms x into a latent representation s_x
Encoder_y transforms y into a latent representation s_y
A Predictor uses s_x (and optionally a latent variable z) to predict ŝ_y
An Energy function measures compatibility between s_y and ŝ_y