Top 5 AI resources for large language models
Since the arrival of ChatGPT in our lives 6 months ago, Artificial Intelligence seems to be at an inflexion point with announcements, demos, new startups and new technological advances arriving daily. If you are like me, you are struggling to keep up with this firehose of information and new technologies and you wonder what the impact is going to be on your day to day life, not only in your job and your company but in your life in general.
Some folks refer to what is happening as a "Cambrian Explosion" referring to a time 500 million years ago when life on earth changed dramatically. To simply quote Wikipedia:
...As the rate of diversification subsequently accelerated, the variety of life became much more complex, and began to resemble that of today...
The following diagram certainly makes it look like it (source: [Mooler0410's LLMs Practical Guide](https://github.com/Mooler0410/LLMsPracticalGuide) on GitHub). The virtual explosion in the number of new Large Language Models (LLMs) feels like an AI Big Bang much more so than Watson playing Jeopardy, DeepBlue beating Kasparov or AlphaGo beating Lee Sedol in 1996.
What we are seeing right now in AI and its implication for our world cannot be dismissed as a one shot. Instead of writing yet another blog and sharing some of the experiments that we are doing at TriggerMesh to understand how this AI "moment" is going to help us or disrupt us, I will share five resources that I found particularly useful in the last couple months.
AI Through a Cloud Lens
First is an article by Mitchell Hashimoto, the co-founder of Hashicorp, called AI Through a Cloud Lens. In his article Hashimoto describes the recent advancements in AI as a possible "platform shift" that offers immediate value and despite some current impracticalities could mature even faster than Cloud computing did. This definitely rings a bell with me, even in the lens of very specific technologies like Docker and Kubernetes, this "AI moment" feels like one that cannot be dismissed and that will affect not only how we build software products but one that will directly impact the life of people.
What is ChatGPT Doing ?
Second resource that I found very useful is this long explanation of ChatGPT by Steven Wolfram the famous creator of Mathematica. Wolfram has always had the ability to distill complex information for everyone to understand and this article is no different. He goes back to the principles of Neural Networks and how to train them. I will just put one excerpt in this blog which to me higlights the even bigger potential of what is happening:
...But in the end, the remarkable thing is that all these operations—individually as simple as they are—can somehow together manage to do such a good “human-like” job of generating text. It has to be emphasized again that (at least so far as we know) there’s no “ultimate theoretical reason” why anything like this should work. And in fact, as we’ll discuss, I think we have to view this as a—potentially surprising—scientific discovery: that somehow in a neural net like ChatGPT’s it’s possible to capture the essence of what human brains manage to do in generating language.
Not only don't we know how these new LLMs and associated tools are going to be used but they could ultimately help us understand ourselves better.
We Have No Moat
More recent than the previous two articles is the one that created a lot of buzz last week, a leaked "memo" from Google called "We have no Moat" . The memo presents some thoughts on how Google may not have any differentiating technology in AI and how the open source community is innovating so fast that Google would be well served to jump in before it is too late...
The main thought is summed up in one heading:
...Directly Competing With Open Source Is a Losing Proposition
The gist of it is that following the open sourcing of LLaMa from Meta, the weights of the neural networks were leaked. This triggered rapid innovation in the open source community with new models being created. Fine tuning is now possible in a day and the availability of datasets (including conversations) like RedPajama allows everyone to experiment and train their own models.
Yann Le Cun
Fourth is not an article but a person: Yann Le Cun. In addition to being French :) he is Chief AI scientist at Meta and professor at the Courant Institute of Mathematical Sciences. He received the Turing award in 2018 for his work on Deep Learning. Quite handy for us mere mortals, he is a rather avid Tweeter with a blue check mark. I now regularly read his tweets and find key information from him like the evolutionary tree of LLMs above. Thank you.
Finally, the last resource I want to share is a GitHub repository. This is from an AI research group at Berkeley. It is plain and simple an Apache License 2.0 reproduction of LLaMA. The README itself is a nice read to start getting familiar with all the lingo: tokens, weights, fine-tuning, checkpoints etc. Perhaps more importantly this means that there is a truly open source LLM that can be the basis for your own models and tools and alleviate the dependency on closed models and paid APIs.
In short, don't dismiss this AI cambrian explosion, open source is going to win and while it may not be obvious yet how your job and your life is going to change you should definitely keep up to speed with what is happening before our eyes.