Are we on the brink of debunking Moravec’s Paradox? In the 1980s, AI and robotics researcher Hans Moravec highlighted a counterintuitive aspect of AI: tasks requiring high-level reasoning — like chess or Go — are easier for AI to master than basic sensory and motor skills — such as walking or identifying your mom’s face — which humans find instinctive. Adding complexity, these “simpler” skills actually demand much more computational power. This insight sheds light on the complexity of replicating human-like perception and dexterity, outcomes of millions of years of evolution, as opposed to logical reasoning, a more recent development. In today’s AI and ML landscape, this paradox underscores the challenges in creating robots and AI systems capable of seamlessly navigating and interacting with the physical world.

However, last week, Bernt Bornich, CEO and founder of 1x, a humanoid robotics company, wrote, “New progress update on the droids dropping in 4 weeks, looks like Moravec’s paradox might be debunked, and we just didn’t have the data.” I suspect that this has something to do with the advancements in foundation models. Originally known for their ability to perform a wide range of tasks based on a single type of data (like text for language models), these models become “multimodal” when integrating and interpreting information across different sensory inputs, closely mirroring human-like understanding.

Could the embodiment of AI, with all its sensory inputs, plus reasoning-imitation algorithms like LLMs, be the pool of data that disproves Moravec’s paradox?

Another intriguing development caught my attention. Huang Jensen, Nvidia’s CEO, responded to a question from Wired about what current development could change everything. Jensen replied, “There are a couple of things. One doesn’t really have a name, but it’s part of the work we’re doing in foundational robotics. If you can generate text and images, can you also generate motion? The answer is probably yes. And if you can generate motion, you can understand the intent and generate a generalized version of articulation. Therefore, humanoid robotics should be right around the corner.

Something to observe in the coming weeks!

In related news from the robotics universe, Figure AI, a humanoid robotics startup, made headlines by raising approximately $675 million in funding. What’s more impressive is the list of backers: Amazon, NVIDIA, Microsoft, OpenAI, Intel, LG, and Samsung. This indicates a strong belief in the potential of humanoid robotics to disrupt various sectors.

Yet, there are skeptical voices. Rodney Brooks, who coined Nouvelle AI*, posted last week: “Tele-op robots presented as autonomous, like the Tesla Optimus humanoid folding a shirt, and 1X humanoid robots, are misrepresentations of what robots are actually doing, which can also be called LIES. Note that the Stanford robot cooking and cleaning videos are also tele-operated.

If 2023 was the year of LLMs, are we ready to evolve to an embodied AI and make 2024 the year of robots?


🎁 Bonus: The freshest research papers from the week of Feb 19 — Feb 25

Enhancing Large Language Models (LLMs)

Multimodal and Multi-Agent Systems

Advancements in Specific Domains

Developer Tools and APIs

Security and Adversarial Research

Model Efficiency and Quantization

Instruction Tuning and Data Quality


*Also published here.
*