- 8
- April
- 2025
CREF promotes original and high-impact lines of research, based on physical methods, but with a strong interdisciplinary character and in relation to the main problems of the modern knowledge society.
The CREF was born with a dual soul: a research centre and a historical museum. Its aim is to preserve and disseminate the memory of Enrico Fermi and to promote the dissemination and communication of scientific culture.
Higher education and projects for young researchers
Recent advances in artificial intelligence are rapidly changing how we process information and generate content. In particular, the models for the language Large Language Models – LLM, (ChatGPT, GPT4, LLaMA) and the models for generating images Diffusion Models – DM, (DALLE 2, Midjourney, StableDiffusion) have proven capable of producing content and responding to requests at levels comparable to those of a human being. These tools have the potential to profoundly change the way we manage, transform and produce information, content and, ultimately, work.
In this project, we will focus mainly on Language Models, which possess the greatest transformative potential for the labor market and at the same time, are best suited to being used as tools for economic and social analysis.
Language models like OpenAI’s GPT3, released in June 2020, have pushed the boundaries of natural language understanding and generation. With 175 billion parameters, GPT3 is one of the largest language models and has demonstrated impressive capabilities in various tasks, such as translation, question-answering and text completion. The GPT3 architecture is based on the transformer model, which uses self-attention mechanisms to capture long-range dependencies in text. In early 2023 OpenAI released GPT4, a multimodal model, capable of working simultaneously on images and text, which is estimated to be composed of around 100 trillion parameters.
In the same months, several open-source LLMs were released: models whose parameters were made public, unlike OpenAI models, which remain proprietary. The main open-source model available at the moment is LLaMA. A key feature of these open-source models is the ability to fine-tune them to specific tasks, achieving cutting-edge results in sentiment analysis, named entity recognition, and text classification. This allows you to exploit the linguistic capabilities of these models in a huge variety of tasks.
Creative processes are closely related to innovation dynamics in numerous systems belonging to complexity theory. In both, one of the characterising elements is the temporal expansion of the conceptual space due both to the inclusion of new and unexpected elements and autonomous processes linked to the exploration of the space of the possible. In general, modern Artificial Intelligence techniques have achieved extraordinary levels of recombination of the information in their knowledge base. ChatGPT or GPT4 currently manage to produce extremely convincing texts thanks to learning an enormous amount of training information and technological infrastructures of considerable size. However, although the generative capabilities of these systems are completely convincing, to date, these machines do not possess a specific ability to include new information or even simply to anticipate its arrival in a context of continuous learning. This limitation, which we tried to address through the adoption of mixed techniques – such as neural networks and reinforcement learning – has not led to solutions that can be used in applications on non-stationary systems. In particular, this problem takes on a very relevant role in all applications in which AI must establish a connection between human creativity – articulated in an artistic sense or as a process of innovation of ideas and development – to represent a tool useful and effective for supporting natural creative dynamics.
In the context of collaboration with AI tools, text production has a prominent role due to the results obtained in the field of Natural Language Processing NLP, such as the suggestion of semantically correct alternatives, the analysis of text readability and the generation of texts starting from a user request. This latest technology promises to radically change the writing process radically, providing an engine for the production of texts, the synthesis and the proposal of alternative rhetorical forms which, despite being opened to the public less than a year ago, is already used on a wide scale and is already influencing the workflow of many writers.
However, the use of these tools remains relegated to text analysis as a static form. When we talk about time, in the NLP community, we most often talk about the concept of codifying the succession of textual elements of a work and not about the writing process.
In this project, we aim to analyze the transformative potential of these tools in three directions:
we will articulate the research along multiple lines:
Analysis of technological and innovation dynamics. The process of knowledge exploration is widely documented through the production of scientific articles, patents and code repositories. The vastness and variety of this production make it impossible to trace the dynamics of innovation with traditional methods, except for macroscopic trends. However, a detailed knowledge of these dynamics allows anticipating imminent discoveries. Using LLM systems, we will build a mathematical representation of innovation concepts (embeddings) expressed in documents (scientific articles, patents, code repositories). We will study their dynamics and design prediction models based on semantic proximity.
Mapping of green technological skills. Techniques for reducing polluting emissions are the subject of numerous regulations. In particular, the European legislation on the reduction of industrial emissions is considered a reference standard at a global level. We aim to use LLM to connect the technical specifications detailed in European legislation with the production of patents. This will allow detailed mapping of these processes’ technological footprints and the related skills’ geographical availability.
Critical raw materials. We aim to integrate studies on sustainable transition and the use of innovative artificial intelligence methodologies based on recent studies connecting the production of individual technologies (in particular for adaptation and mitigation of climate change – de Cunzo et al., 2022) with the export of products (Pugliese et al., 2019). In particular, we intend to research critical raw materials (MPC) in technologies for the adaptation to and mitigation of climate change, through the study of the abstracts of the patents present in the Y02-Y04S technological fields through large language models sized, natural language processing (NLP) systems with billions of parameters that have demonstrated new abilities to answer reading comprehension questions, as well as generate creative texts or solve mathematical problems. The diffusion of green technologies represents a fundamental step for achieving policy objectives on climate change mitigation. Still, it involves a significant expansion of the production and trade of critical raw materials that are fundamental for their functioning and are currently irreplaceable, e.g. solar panels, wind turbines, electric vehicles, and energy-efficient lighting (International Energy Agency, 2021; Herrington, 2021; Hund et al., 2020; Kowalski and Legendre, 2023). MPCs are a series of raw materials of strategic importance and high supply risk identified by the European Commission, which continuously updates a list of these resources linked to today’s technology (European Commission, 2011, 2020a, 2020b). Our work could provide a comprehensive descriptive empirical analysis of the presence of CRM in green technologies, initially using text mining techniques on green patent descriptions and then with more complex language models. Combining this result with information on the countries where patents are filed makes it possible to identify which green technologies rely most on CRMs and where they are employed.
Furthermore, taking into account the production data of critical raw materials, it will be possible to better articulate the dependence of green technologies on these materials, for example, by including a concentration index of their production and, on the other hand, to geolocate the spatial distribution of these inputs and compare it with that of countries where MPC-dependent green technologies are employed. Preliminary analyses indicate that materials such as lithium, silicon, rare earth, cobalt and graphite are critical components for the development of green technologies (in particular, energy generation and transmission technologies and those related to the production or processing of goods) and certainly require careful monitoring due to the poor diversification of their production. Finally, we see a clear divergence between the producers of the raw materials necessary for developing green technologies and the countries developing these technologies. More advanced AI methodologies will allow us to refine the preliminary results and distinguish between the use of materials as inputs necessary for the construction of technologies or their presence within the patent descriptions that aim to remove environmental pollution and the recycling of MPCs.
The impact of AI on the labour market. Job advertisement data is considered one of the best tools for observing the dynamics of employer demand for skills. These advertisements are collected in large databases of poorly structured data. Using language modelling tools, it is possible to extract information from these advertisements, such as the sector to which the advertisement relates, the job and the skills required. In this project, we will use these databases to study (also in real time) how the emergence and diffusion of AI tools is changing the demand for skills at a geographical and sectoral level.
Web scraping for services. Unlike data on trade in goods, data on trade in services are poorly available, not harmonised and generally unreliable. This severely limits economic analysis, especially in the area of economic complexity. In collaboration with Translated.com, the world’s largest company in the translation market, based in Rome, we intend to use LLM to develop an analytical pipeline to identify and geolocate service providers through systematic scraping of large databases of web pages (CommonCrawl). This activity will also involve the International Finance Corporation, with which the CREF has signed a Memorandum of Understanding and which has expressed interest in the possibilities of analysing the service economy in emerging markets. A first pilot project will be carried out in Cape Verde.
DreamingLearning. This AI learning technique is currently being developed and finalised. It makes it possible to endow a generative neural network with the property of anticipating new information that may arrive at the input of such systems, allowing new features to be very easily incorporated into the existing knowledge base. Furthermore, its theoretical description will enable us to establish a precise link with Bayesian approaches for automatically describing appropriate priors. These characteristics allow the machine to generalise the training information in a very effective way while at the same time focusing attention on the relevant innovations in terms of their impact on the non-stationary temporal sequence under analysis.
Evolutionary neural systems. The advantage demonstrated in the application of Dreaming Learning leads us to believe that neural systems equipped with architectures less tied to the feed-forward model and in which information can circulate more freely within the network are more effective in understanding innovation dynamics flexibly and adaptively. In this direction, we intend to develop new AI systems based on evolutionary dynamics, in which the architecture can adapt effectively and naturally to the proposed problem.
WeWrite. An innovative collaborative non-linear writing platform that facilitates the translation of complex ideas into usable text forms, such as scientific articles and pieces of fiction, but also navigable text networks, as in the case of interactive games. The platform itself, or the skills developed in its construction, can form the basis for a study of the distribution of work in a hybrid writing environment, suggesting easily explorable alternatives and allowing us to reconstruct the history of the creative process and to credit the work of the various participants, human or AI.
Metrics for identifying creative phases in writing processes New work on the writing process enables us to use time-ordered sequences of texts to identify the amount of exploration, in terms of ideas and rhetorical resolutions, that the writing team undertakes between one version of the text and the next. It is possible to develop a writing assistant that is aware of the creative process and able to provide guidance based on the writer’s path, using these techniques in conjunction with new generative artificial intelligence tools.
FC – International Finance Corporation
Translated.com
Mamacrowd
Sony CSL Rome participates in the STARTS-AIR project at the croospath of Art and Science.
Aterballetto, for the development of performative artistic interaction technologies between performers and AI systems to support natural creativity.
Linguistics/Centre for Behaviour and Evolution (Christine Cusckley)
CSL-Rome is part of the French “ScientIA” project on studying the impact of Artificial Intelligence on other scientific discipline
Andrea Tacchella, Angelica Sbardella, Francesco De Cunzo (CREF)
Alessandro Londei, Vittorio Loreto, Ruggiero Lo Sardo (CREF – SONY)
De Cunzo, F., Petri, A., Zaccaria, A., & Sbardella, A. (2022). The trickle down from environmental innovation to productive complexity. Scientific Reports, 12(1), 22141.
Napolitano, L., Sbardella, A., Consoli, D., Barbieri, N., & Perruchas, F. (2022). Green innovation and income inequality: A complex system analysis. Structural Change and Economic Dynamics, 63, 224-240.
Caldarola, B., Grazzi, M., Occelli, M., & Sanfilippo, M. (2023). Mobile internet, skills and structural transformation in Rwanda. Accepted in Research Policy.
Bruno, M., Lambiotte, R., & Saracco, F. (2022). Brexit and bots: characterizing the behaviour of automated accounts on Twitter during the UK election. EPJ Data Science, 11(1), 17.
Patuelli, A., & Saracco, F. (2023). Sustainable development goals as unifying narratives in large UK firms’ Twitter discussions. Scientific Reports, 13(1), 7017.
Via Panisperna 89 A – 00184 Roma
PEC: centrofermi@pec.centrofermi.it
CUU: UF5JTW
Phone: +39 06 4550 2901
VAT: 06431991006
CF: 97214300580
Mail: via Panisperna 89a – 00184 Rome
Guest Entrance: Piazza del Viminale 1 – 00184 Rome