By Jaime Sevilla, PhD at UNIABDN.
We have compiled information about the date of development and trainable parameter counts of n=139 machine learning systems between 1952 and 2021. This is, as far as we know, the biggest public dataset of its kind.
The accompanying article, published in the Aligment Forum, presents the dataset together with some insights related to the growth of model size in different domains of application of ML.
The article has been well received by the AI community, and has been featured in ImportAI and the Alignment Newsletter.
This project is a collaboration between NL4XAI fellow Jaime Sevilla, and independent researchers Pablo Villalobos and Juan Felipe Cerón.
This work builds on recent work on scaling laws and large ML models, like AI and compute by Amodei et al, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? by Bender et al, and Scaling Laws for Neural Language Models, by Kaplan et al.
We chose to focus on parameter count because previous work indicates that it is an important variable for model performance [1], because it helps as a proxy of model complexity and because it is information usually readily available or easily estimable from descriptions of model architecture.
We hope our work will help AI researchers and forecasters understand one way in which models have become more complex over time, and ground their predictions of how the field will progress in the future. In particular, we hope this will help us tease apart how much of the progress in Machine Learning has been due to algorithmic improvements versus increases in model complexity.
It is hard to draw firm conclusions from our biased and noisy dataset. Nevertheless, our work seems to give weak support to two hypotheses:
- There was no discontinuity in any domain in the trend of model size growth in 2011-2012. This suggests that the Deep Learning revolution was not due to an algorithmic improvement, but rather the point where the trend of improvement of Machine Learning methods caught up to the performance of other methods.
- In contrast, it seems there has been a discontinuity in model complexity for language models somewhere between 2016-2018. Returns to scale must have increased, and shifted the trajectory of growth from a doubling time of ~1.5 years to a doubling time of between 4 to 8 months.
The structure of this article is as follows. We first describe our dataset. We point out some weaknesses of our dataset. We expand on these and other insights. We raise some open questions. We finally discuss some next steps and invite collaboration.
This blogpost is part of the NL4XAI project that has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 860621.
The article is available here. You can access the public dataset here. The code to produce an interactive visualization is available here.