Recently, Lex Fridman released a five-hour podcast with Dario Amodei, Amanda Askell, and Chris Olah of Anthropic AI.

https://www.youtube.com/watch?v=ugvHCXCOmm4&t=9578s&embedable=true

After publishing his pretty lengthy article about the vision for AI development “Machines of Loving Grace”, Dario has been expanding on it since, likewise, it was the main focus of his conversation with Lex with other Anthropic members expanding on those topics.

For those less nerdy than me, I thought it’d be nice to summarise key ideas that this leading AI team had to share. Since the release of Claude 3.5 Sonnet (New) this autumn, it is clear that Anthropic’s progress in LLMs is on par with what openAI has achieved with their o1 preview model. They are amongst the leaders in this AI race which gives them good authority on the topic.

Apart from restating what has been said by the Anthropic team, I’d like to also fantasize about what each point implies for the future of practical AI application 1 year from now and 5 years from now as two important timelines. I am expecting to be wrong with my predictions (simply too many factors at play) but I think it’s fun mental gymnastics to perform and to look at this text when we do appear in that “actual future” time-wise.

  1. We don’t know when the scale-up effect will plateau but NOT YET

    One of the key takeaways for me was his perspective on the future effects of continuing down the scaling hypothesis path (the idea that throwing more, better data with more capable computation will accordingly make models smarter). Dario seems to hint that simply using all of the old techniques and adding more data may not be very effective anymore in getting a significant AI boost. The main focus of AI labs right now is to understand which part to scale.

    Some of the promising avenues in his view are synthetic data generation (applying the AlphaGo approach to trial and error training for complex tasks) or Adding more of the guard-railed data ie. giving the models examples of good answers and bad answers for specific domains so it understands the general rules and applies them a bit better.

    • 2025 - the AlphaGo AI self-training approach will become more common and models will surpass human ability in additional complex exercises that have a near-immediate feedback loop (maybe trading)

    • 2030 - the AlphaGo self-training approach could be generalized in models such that they self-improve on difficult practical tasks when given sufficient time to practice the task.

      2. The approach to AI safety will develop alongside model development

Autonomy and misuse pose the biggest risks.

Dario claims that his team is testing both risks every time they train a new model so they can create preventions before releasing it.

ASL-1 (like a chess bot) - does not pose risks

ASL-2 (current AI models) - does not provide much risky information beyond what can be simply googled.

ASL-3 (can increase capabilities of wrongdoers) - cyber, nuclear, bio weapon enablement via those systems will have to be seriously nerfed before models can be released.

ASL-4+ (Smarter than ASL-3 + autonomous) - it is not clear how those will be controlled yet, they will only be de-risked when there are signs of such a model after training.

Conclusion

Regardless of the predictions, it is going to be fun to observe the next phase of AI. And, if no revolution actually happens in 5 years, It will at least be refreshing to reread this article as I finally choose to cancel my then $300/month OpenAI subscription.