A former Nvidia researcher who left the chip giant for academia shares why AI models should be open source and free for all
- AI researcher Anima Anandkumar highlights the role of AI in weather, climate, and medical design.
- She formerly worked at Nvidia on machine learning research and now spearheads Caltech's AI research.
- She emphasizes the importance of open-source AI and regulating the technology.
While the tech industry races to build language-learning models for AI, researcher Anima Anandkumar focuses on how AI can simulate the physical world, from predicting climate patterns to redesigning proteins in medicine.
The former Nvidia senior director of AI research and Amazon Web Services alum spent decades working at the forefront of AI algorithms and helped make Nvidia's FourCastNet weather simulator available as open source. She has served as the Bren Professor in the computer science and mathematics department at Caltech since 2017 and oversees the university's machine learning research efforts. Specifically, she focuses on tensors, a multilinear function that can help solve higher-order problems in AI, like modeling fluids.
AI has limits to what it can learn. If you ask ChatGPT about the weather, it will gather an answer from existing weather apps on the web. On the other hand, Anandkumar is researching how AI can more accurately simulate the weather and help give predictions for the climate crisis a decade later.
In a conversation with Business Insider, Anandkumar spoke about her academic research with neural operators, a network that maps between different AI functions and that can be used to model physical processes. She also explained why it's important to have a culture of open-sourcing AI in Big Tech.
The following was edited for length and clarity.
BI: You've fluidly gone through industry and academia most of your career. Why did you recently decide to leave Nvidia for academia full time whereas a lot of people are kind of going in the other direction?
Over the last decade or so, I've had a foot both in industry and academia. I see that as the golden age of open research where even in industry, you could open source, you could publish, you could freely collaborate. Unfortunately, since ChatGPT has come out, a lot of that has changed because there is a lot more push toward closed research and closed models in many of the Big Tech [companies]. Even when models are getting open sourced, not a lot of the details are shown to the public, and so there are certainly big firewalls that are happening. To me, my goal to be having a dual appointment was to really enable open research and do that for beneficial tasks. We built our first AI-based weather model. We built models for drug discovery and are designing medical devices. All of this requires some openness once that is lacking, I felt like my true goal is to enhance the impact and benefit humanities. As cheesy as it sounds, I felt like this was a better way to do that. The notion is that there is a lot of computational spend to train large models and the aspect of, how do you monetize that. I'm not believing that to be a wrong goal; it just does not align with how I felt the best impact could be made.
BI: What is the day-to-day of doing research for industry versus now with Caltech?
Certainly, the resources are much fewer. Both at AWS and Nvidia, we had a lot of freedom to pursue new research areas. The computational resources are nowhere near what Big Tech has. To me, necessity is the mother of invention. So how do we build frameworks and algorithms that can do more with less? I think there's a lot of interesting research that will come out of academia in the next few years because that's the only way we can keep up with innovation in academia that won't be following the same approaches that the Big Techs are taking, but really thinking a lot more on how to make the algorithms more efficient.
BI: What are better, more 'efficient' algorithms, and what are their broader impacts?
Clever algorithms can come in many ways. One is being clever about how to make these algorithms efficient, not just having better hardware.
The other aspect is to reduce the memory during training by doing cleverer optimization algorithms. You can drastically reduce the amount of memory needed so you can fit even fairly big models on much smaller GPUs or GPUs with lower memory that are cheaper. Sometimes you need to go back to the first principles and ask, are there some fundamental ways to completely rethink how we do this?
BI: In your TED Talk, you say, 'Language models hallucinate, and they have no physical grounding.' Does this relate to any misconceptions that people have about AI?
Text does not encode all the knowledge in the world. It's the knowledge that humans have collected, but it's not the knowledge of the physical world. You can have all kinds of theories about playing tennis well, but the language model by itself cannot execute the action of playing tennis. Similarly, for weather forecasting, you can ask ChatGPT what the weather is tomorrow, and it can come up with an answer, but it will do that by looking up a weather app. It does not internally have the ability to simulate what happens to weather. It does not internally understand what is a hurricane. It can look up some text information, but it cannot go and simulate the physical process. Same with designing a better aircraft wing or a better rocket. You can ask any of the image generators, and they may give you the most fantastic-looking rocket, but I most likely will not be able to fly in the real world because those designs are not grounded in physics.
BI: You talk a lot about neural operators. What are they, and how are they being used now in everyday life for the tech industry or other industries?
Think of it as extending the paradigm of deep learning to these continuous objects: the fact that now the Earth is not just a fixed set of pixels, but the Earth is really this continuous space. You can go and zoom in at any resolution and get answers at all those resolutions. That's how the whole physical world works. If you really think about the world, it happens at so many different scales. Think about even quantum effects that are needed for drug design and many very important practical problems. While really needing to have that ability to model weather and climate, we need to also model how these tiny particles in the cloud interact with one another. How do these vortices move? With standard AI, you can do top-down. In the case of weather, there's a lot of historical information, but you can be very blind. What we do with neural operators is to really combine the two paradigms you can both learn with the data, as well as add any of the physical knowledge you have, and do that at multiple resolutions so you can very accurately model these physical phenomena.
Scientific research, engineering design in the past has been really about trial and error. Somebody you know comes up with an idea, but you have to go still test it in the lab, and the lab experiments can be very laborious and long, and not all the time.
Neural operators with an AI framework are a game changer because with this, you can build highly accurate digital twins. So instead of going to the lab and testing physically, now you have a digital twin that very accurately models very complex phenomena like fluid dynamics. And with that, the ability is to actually optimize design. So not only use this as a digital twin just to try out different possibilities, but ask it to generate the best design.
BI: What are some of the things that we should still be focusing on when it comes to democratizing AI?
When I say democratization, that can be in terms of models like Llama being openly available, but also the knowledge of how to train models is something that we need more in the open so we can carefully test and further develop the techniques because there are lots of big challenges when it comes to both language and all the other classes of models, too. That's where, you know, we want to be very careful not having regulations in place that harm open source and the small players.
I recently spearheaded a letter out of Caltech to stop the SB-1047 Bill. (Note: SB-1047 passed in the California state legislature on Wednesday and will go to Gov. Gavin Newsom for final consideration.) Many other academics have made statements, and people in the industry have also come out against this bill. Even if there are good intentions, it's important to be careful that there can be unintended consequences. There is the notion of regulatory capture, where big companies can still deal with regulation by hiring a lot of people. And when it comes to startups, they are at a severe disadvantage. If you're effectively asking open source developers a lot in terms of if there's any harmful use downstream and that they would be responsible, then open source is effectively killed. That's not good for academia, either. AI has always been an open-source revolution from its early days because people created these data sets and held competitions, and everybody participated, did research, talked about it, shared the code. That's how the community just made these amazing developments in such a short amount of time. I think the way to be the right approach to doing this is not to think about regulating AI, but regulating harmful effects no matter what was used.