The BigScience project, an international collaboration of over 1,000 researchers, has released BLOOM, the world’s largest open-access multilingual language model. Trained on the Jean Zay supercomputer in France, the model supports 46 natural languages and aims to democratize AI research by providing a transparent alternative to proprietary systems.
TLDR: A global team of 1,000 researchers has launched BLOOM, a 176-billion-parameter AI model designed for transparency and multilingual inclusivity. Hosted on France’s Jean Zay supercomputer, the project challenges the dominance of private tech firms by offering an open-source, ethically governed alternative for the global scientific community.
The landscape of artificial intelligence underwent a significant shift with the release of BLOOM, a massive multilingual large language model developed through an unprecedented international collaboration. Known as the BigScience project, this initiative brought together over 1,000 volunteer researchers from more than 60 countries and 250 institutions. Unlike the proprietary models developed by private corporations, BLOOM was designed from the ground up to be transparent, open-access, and representative of global linguistic diversity.
The model features 176 billion parameters, making it one of the most powerful computational engines ever created for natural language processing. Its training data encompasses 46 natural languages and 13 programming languages, a breadth that far exceeds many contemporary models that focus primarily on English. This multilingual focus ensures that the benefits of generative AI are accessible to speakers of languages that are often marginalized in the digital sphere, including several African and Indic languages. The data selection process was particularly rigorous, involving a year-long effort to curate the ROOTS dataset, which spans 1.6 terabytes of text from diverse sources like parliamentary records and scientific papers.
Training such a massive model required immense computational resources, which were provided by the Jean Zay supercomputer located near Paris, France. This facility, operated by the French National Centre for Scientific Research (CNRS), utilized thousands of graphics processing units over several months. The project was funded by public grants, emphasizing a move toward sovereign AI where public institutions, rather than just private tech giants, hold the keys to foundational technology. During the training phase, the researchers also prioritized environmental transparency, calculating the carbon footprint of the process and offsetting it through institutional programs.
One of the core pillars of the BigScience project was ethical data governance. The researchers meticulously documented the datasets used for training, allowing for a level of scrutiny impossible with black box commercial models. They implemented a Responsible AI License, which permits free use of the model while prohibiting its application in harmful contexts, such as illegal surveillance or the generation of medical misinformation. This framework attempts to balance the benefits of open science with the necessity of safety and accountability.
The collaborative nature of the project also addressed the compute divide in AI research. By making the model weights and the training code publicly available, the BigScience team enabled researchers in smaller institutions and developing nations to study and build upon state-of-the-art technology without needing multi-million dollar budgets. This democratization is seen as a vital step in preventing a monopoly on the future of intelligence. Furthermore, the project utilized a living governance model, where decisions regarding the model’s future are made by a steering committee of researchers rather than a corporate board.
Beyond its technical capabilities, BLOOM serves as a case study in large-scale scientific cooperation. The project was organized into various working groups focusing on data sourcing, model architecture, evaluation, and ethics. This decentralized approach allowed for a diverse range of perspectives to influence the model’s development, resulting in a tool that is more culturally nuanced than its predecessors. The researchers also developed new evaluation benchmarks to test performance in non-English contexts, revealing that BLOOM often outperforms larger, English-centric models in multilingual tasks.
As the AI field continues to evolve, the legacy of the BigScience project remains a benchmark for transparency. Future research is now focusing on distilling the model into smaller, more efficient versions that can run on consumer-grade hardware. The success of BLOOM has prompted further international consortia to explore open-source alternatives for other domains, including climate modeling and drug discovery, ensuring that the most powerful tools of the 21st century remain a common good.

