Diego Antognini, PhD

I am Diego Antognini, a tech lead and senior AI researcher at Google DeepMind, specializing in multimodal large language models (LLMs). My work primarily involves enhancing these models through iterative feedback and refinement processes. I have 9 years of research experience in natural language processing, machine learning, and recommendation systems. Additionally, I am a lecturer and module head at the Lucerne University of Applied Sciences (HSLU) where I teach advanced generative AI and deep learning for NLP at the M.Sc. level.

Previously, I was a research scientist at IBM Research and collaborated with the MIT-IBM Watson AI Lab. My research focused on two areas: developing new methods to align large language models (LLMs) and efficient NLP for resource‐constrained training and inference settings. In the first area, my work involved (1) teaching LLMs to transform multi-turn conversations into SQL queries for massive databases; (2) personalizing LLMs according to user preferences; (3) automatically augmenting prompts to steer their behavior; (4) adapting retrieval-augmented LLMs for question-answering systems in the context of scientific literature. In terms of efficient NLP, I built models with a model size in the order of a few megabytes and a latency of a couple of milliseconds with similar performance and higher throughput than large models. Finally, I am also experienced in interpretable models that generate personalized and actionable textual explanations.

I hold a Ph.D. degree in Computer Science from the EPFL, where I conducted research in the AI laboratoy under the supervision of Prof. Boi Faltings. My thesis is titled "Textual Explanations and Critiques in Recommendation Systems" (available here). I developed models to infer high-quality explanations from text documents in a scalable and data-driven manner through selective rationalization. Moreover, I designed models to make textual explanations actionable (referred to as critiquing) and explored two important applications in natural language processing and conversational recommendation systems. I also worked on multi-objective recommendation and multi-document summarization.

Periodically, I give talks such as my work on efficient NLP at MIT-IBM Watson AI Lab or at the NLP Meetup in Zürich where I presented one past work. Additionally, I've appeared on national media when I have participated in challenges with students and won a $10k prize at the IARPA Geopolitical Forecasting Challenge 2018 (press coverage: EPFL News, 24 Heures, RTS Radio (22:25)).

On this website, I present some publications and patents I have been working on and some (prior to Ph.D.) of the most exciting projects. If you have any questions, feel free to contact me.

News

[June 2025]: Glad that multiple projects I led and co-led ended up into Gemini! Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities.
[Apr. 2025]: Going to ICLR 2025 in Singapore to discuss current research directions related to my research interests. Please feel free to reach out if you would like to chat with me!
[Mars 2025]: A new patent has been published: Layer Normalization for Calibrated Uncertainty in Deep Learning.
[Feb. 2025]: Come and join us for our Google DeepMind workshop, Natural Interactions with Foundation Models, at AMLD - Applied Machine Learning Days 2025.
[Dec. 2024]: Discussing our accepted work Trans-LoRA at NeurIPS 2024 in Vancouver, Canada.
[Dec. 2024]: I'm hiring Student Researcher for 2025 at Google DeepMind in Zürich, Switzerland. Please feel free to contact me if you are interested in applying.
[Oct. 2024]: I've been promoted to Senior!
[Sep. 2024]: The paper Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning has been accepted at NeurIPS 2024.
[Sep. 2024]: Two patents have been published: Domain-specificity prediction for natural language processing and Determining specificity of text terms in application contexts.
[Sep. 2024]: The paper Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning has been accepted at NeurIPS 2024.
[Aug. 2024]: Two patents have been published: Self-supervised term encoding with confidence estimation and Updating window representations of sliding window of text using rolling scheme.
[May 2024]: New paper Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning.
[Mars 2024]: New paper Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models accepted at NAACL 2024.
[Feb. 2024]: New paper MC Layer Normalization for Calibrated Uncertainty in Deep Learning accepted at TMLR.
[Jan. 2024]: Joined Google DeepMind as a researcher in Zürich, Switzerland.
[Dec. 2023]: I am a panelist talking about LLMs and efficiency at the HSLU AI & Big Data Panel at Lucerne University of Applied Sciences (HSLU) (host: Dr. Guang Lu).
[Nov. 2023]: Our Demo talk FlowPilot: An LLM-powered system for enterprise data integration has been accepted at NeurIPS 2023.
[Nov. 2023]: Demo paper ESG Accountability Made Easy: DocQA at Your Service accepted at AAAI 2024.
[Oct. 2023]: Invited talk Conversational Critiquing: From Recommender Systems to Text Generation at Google Research (host: Dr. Claudiu Musat).
[July 2023]: Presenting and discussing our two accepted works at ACL 2023 in Toronto, Canada.
[June 2023]: Keynote Eﬃcient Machine Learning in Low-Resource and Highly-Speciﬁc Domains at SwissText 2023 (hosts: Prof. Hatem Ghorbel and Prof. Mark Cieliebak).
[May 2023]: Paper pNLP-Mixer: an Efficient all-MLP Architecture for Language accepted at ACL 2023.
[May 2023]: Paper Extracting Text Representations for Terms and Phrases in Technical Domains accepted at ACL 2023.
[Mar. 2023]: Received an award at IBM: first plateau (i.e., 4 patents) invention achievement award.
[Mar. 2023]: One new patent filed with The United States Patent and Trademark Office (USPTO).
[Mar. 2023]: Invited talk Eﬃcient Machine Learning in Low-Resource and Highly-Speciﬁc Domains at MIT-IBM Watson AI Lab (host: Dr. Leonid Karlinsky).
[Feb. 2023]: Two new patents filed with The United States Patent and Trademark Office (USPTO).
[Feb. 2023]: Received an award at IBM: patent application invention achievement award.
[Jan. 2023]: Paper Assistive Recipe Editing through Critiquing accepted at EACL 2023.
[Jan. 2023]: One new patent filed with The United States Patent and Trademark Office (USPTO).
[Sep. 2022]: Paper Unsupervised Term Extraction for Highly Technical Domains accepted at EMNLP 2022.
[Aug. 2022]: Workshop paper Active Learning for Imbalanced Civil Infrastructure Data accepted at CVCIE @ ECCV2022.
[May 2022]: Joined IBM Research AI as a Research Scientist in Zürich, Switzerland.
[Mar. 2022]: Officially graduated from Swiss Federal Institute of Technology in Lausanne (EPFL).
[Feb. 2022]: Defended my Ph.D. dissertation: Textual Explanations and Critiques in Recommendation Systems. My doctoral committee was composed of Prof. Boi Faltings, Prof. Julian McAuley, Prof. Scott Sanner, Prof. Antoine Bosselut, and Prof. Robert West.

Selected Publications

For the full list, you can consult my Google scholar profile.

29) Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper
..., Diego Antognini, ...
2025, Technical Report

TL;DR: we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models.

28) Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning Paper
Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, Leonid Karlinsky
2024, NeurIPS

TL;DR: We propose a nearly data-free transfer of LoRA modules between models using synthetic data, eliminating the need for original training data. This approach shows effective transfer across various models and tasks, improving performance in many cases.

27) Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models Paper
Yue Zhou, Yada Zhu, Diego Antognini, Yoon Kim, Yang Zhang
2024, NAACL

TL;DR: We investigates how the presentation or surface form of a mathematical problem affects its solvability by large-scale language models. Then, we present Self-Consistency over-Paraphrases (SCoP), which diversifies reasoning paths from specific surface forms.

26) MC Layer Normalization for Calibrated Uncertainty in Deep Learning Paper
Thomas Frick, Diego Antognini, Ioana Giurgiu, Benjamin Grewe, Cristiano Malossi, Rong Zhu, Mattia Rigotti
2023, TMLR

TL;DR: A drop-in replacement for Layer Normalization to endow neural networks with calibrated prediction uncertainty.

25) FlowPilot: An LLM-powered system for enterprise data integration
Enrico Toniato, Abdel Labbi, Katya Mirylenka, Christoph Miksovic Czasch, Thomas Gschwind, Paolo Scotton, Francesco Fusco, Diego Antognini
2023, NeurIPS Demo Talk

TL;DR: A system to generate training datasets and fine-tune LLMs tailored for customers to convert natural language into SQL queries for massive databases in a conversational manner. FlowPilot ensures the mitigation of errors during both the training and inference, and it can also generates Python code and charts.

24) ESG Accountability Made Easy: DocQA at Your Service Paper
Lokesh Mishra, Cesar Berrospi, Kasper Dinkla, Diego Antognini, Francesco Fusco, Benedikt Bothur, Maksym Lysak, Nikolaos Livathinos, Ahmed Nassar, Panagiotis Vagenas, Lucas Morin, Christoph Auer, Michele Dolfi, Peter Staar
2024, AAAI Demo

TL;DR: A question-answering system using large language models and retrieval-augmented generation, designed to generate accurate and relevant answers from a given corpus of domain-specific documents.

23) pNLP-Mixer: an Efficient all-MLP Architecture for Language Paper or Paper
Francesco Fusco, Damian Pascual, Peter Staar, Diego Antognini
2023, ACL

TL;DR: An embedding-free MLP-Mixer model for on-device NLP using a projection layer that relies on MinHash and counting bloom filters. Our model occupies merely one megabyte and achieves 99% of the performance of mBERT.

22) Extracting Text Representations for Terms and Phrases in Technical Domains Paper or Paper
Francesco Fusco*, Diego Antognini*
2023, ACL

TL;DR: Meaningful word embeddings can be achieved using character-based models that are 5x smaller and 10x faster than BERT-based counterparts and do not suffer from out-of-distribution problems.

21) Assistive Recipe Editing through Critiquing Paper or Paper
Diego Antognini, Shuyang Li, Boi Faltings, Julian McAuley
2023, EACL

TL;DR: A framework for generating recipes and enabling users to edit them using critiques in an iterative manner. The system coherently rewrites recipes to satisfy users’ feedback.

20) Towards Workflows for the Use of AI Foundation Models in Visual Inspection Applications Paper
Mattia Rigotti, Diego Antognini, Roy Assaf, Kagan Bakirci, Thomas Frick, Ioana Giurgiu, Klara Janouskova, Filip Janicki, Husam Jubran, Cristiano Malossi, Alexandru Meterez, Florian Scheidegger
2023, Journal (ce/papers)

TL;DR: A set of proof-of-concepts workflows using foundation models in visual inspection applications for civil engineering.

19) Unsupervised Term Extraction for Highly Technical Domains Paper or Paper
Francesco Fusco, Peter Staar, Diego Antognini
2022, EMNLP

TL;DR: A fully unsupervised method for term extraction that generalizes across domains. Our setup improves predictive performance and decreases inference latency on both CPUs and GPUs.

18) Active Learning for Imbalanced Civil Infrastructure Data Paper or Paper
Thomas Frick, Diego Antognini, Mattia Rigotti, Ioana Giurgiu, Benjamin Grewe, Cristiano Malossi
2022, ECCV Workshop on Computer Vision for Civil and Infrastructure Engineering (CVCIE)

TL;DR: A method capable of operating on datasets suffering from heavy class imbalance, achieved by replacing the traditional active learning acquisition function with an auxiliary binary discriminator.

17) Textual Explanations and Critiques in Recommendation Systems Paper or Paper
Diego Antognini
2022, EPFL Ph.D. thesis

TL;DR: This dissertation focuses on two fundamental challenges. The first involves generating explanations: inferring high-quality explanations from text documents in a scalable and data-driven manner. The second challenge consists of making explanations actionable, which we refer to as critiquing. This dissertation examines two important applications in natural language processing and recommendation tasks.

16) Positive & Negative Critiquing for VAE-based Recommenders Paper or Paper
Diego Antognini, Boi Faltings
2022, CoRR

TL;DR: Fast negative and positive critiquing generalized for variational autoencoders, resulting in up to a 15% higher success rate compared to state-of-the-art models. The key lies in modeling positive and negative critiques as different modalities and employing a multi-modal VAE with weak supervision.

15) Interlock-Free Multi-Aspect Rationalization for Text Classification Paper or Paper
Shuangqi Li, Diego Antognini, Boi Faltings
2022

TL;DR: Addressing the interlocking dynamics of multi-aspect rationalization, utilizing a novel self-supervised contrastive loss and multi-stage training to generate more semantically diverse rationales.

14) Interacting with Explanations through Critiquing (T-RECS) Paper
Diego Antognini, Claudiu Musat, Boi Faltings
2021, IJCAI

TL;DR: How to extract explanations significantly preferred by humans over those produced by state-of-the-art models and how to make them actionable; enabling users to interact with them iteratively for improving the recommendation.

13) Fast Multi-Step Critiquing for VAE-based Recommender Systems (M&Ms-VAE) Paper or Paper Video
Diego Antognini, Boi Faltings
2021, RecSys

TL;DR: Fast multi-step critiquing generalized for variational autoencoders, resulting in speeds up to 26x faster and a success rate 20% higher compared to state-of-the-art models. The key lies in modeling the problem using multi-modal VAE and weak supervision.

12) Multi-Step Critiquing User Interface for Recommender Systems Paper or Paper Video
Diana, Petrescu*, Diego Antognini*, Boi Faltings
2021, RecSys Demo

TL;DR: We propose and demonstrate a new way of interacting with recommender systems to help users make decisions and find their ideal items.

11) Rationalization through Concepts (ConRAT) Paper or Paper Video
Diego Antognini, Boi Faltings
2021, ACL Findings

TL;DR: Generalization of MTM: how to extract interpretable multi-faceted concepts (i.e., rationales) for single-task classification problems. It generate concepts that align with human rationalization, and outperforms state-of-the-art methods trained on each aspect label independently.

10) Multi-Dimensional Explanation of Target Variables from Documents (MTM) Paper or Paper Video
Diego Antognini, Claudiu Musat, Boi Faltings
2021, AAAI

TL;DR: One model to extract interpretable, meaningful, and coherent multi-faceted rationales for multi-task text classification problems, and perform better than individual rationalization models.

9) Addressing Fairness in Classification with a Model-Agnostic Multi-Objective Algorithm Paper or Paper Video
Kirtan Padh, Diego Antognini, Emma L. Glaude, Boi Faltings, Claudiu Musat
2021, UAI

TL;DR: A model-agnostic multi-objective architecture that optimizes multiple fairness notions and sensitive attributes using a novel differentiable relaxation that approximates fairness notions through the hyperbolic tangent function.

8) Multi-Gradient Descent for Multi-Objective Recommender Systems Paper or Paper
Nikola Milojkovic, Diego Antognini, Giancarlo Bergamin, Boi Faltings, Claudiu Musat
2020, AAAI Workshop on Interactive and Conversational Recommendation Systems (WICRS)

TL;DR: An efficient stochastic multi-gradient descent approach for multi-objective recommender systems.

7) HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset Paper or Paper
Diego Antognini, Boi Faltings
2020, LREC

TL;DR: A new dataset with 50 million hotel reviews with meta-attributes, user information, and multi-aspect ratings.

6) Recommending Burgers based on Pizza Preferences: Addressing Data Sparsity with a Product of Experts Paper or Paper
Martin Milenkoski, Diego Antognini, Boi Faltings
2021, Recsys Workshop of Cross-Market Recommendation

TL;DR: We address data sparsity and generate recommendations in domains where there is limited knowledge about the user preferences.

5) Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context Paper or Paper
Milena Filipovic*, Blagoj Mitrevski*, Diego Antognini, Emma L. Glaude, Boi Faltings, Claudiu Musat
2021, RecSys Workshop on Perspectives on the Evaluation of Recommender Systems

TL;DR: Omitting temporal context while evaluating recommender systems leads to false confidence. We propose an evaluation protocol and a model-agnostic training procedure to incorporate temporal context.

4) Momentum-based Gradient Methods in Multi-objective Recommender Systems Paper or Paper
Blagoj Mitrevski*, Milena Filipovic*, Diego Antognini, Emma L. Glaude, Boi Faltings, Claudiu Musat
2021, RecSys Workshop on Multi-Objective Recommender Systems

TL;DR: A coordinated multi-objective optimization method in which each objective is optimized using an algorithm similar to the Adam algorithm.

3) GameWikiSum: a Novel Large Multi-Document Summarization Dataset Paper or Paper
Diego Antognini, Boi Faltings
2020, LREC

TL;DR: A non-news domain-specific dataset for multi-document summarization that is 100 times larger than commonly used datasets.

2) Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization Paper or Paper
Diego Antognini, Boi Faltings
2019, EMNLP Workshop on New Frontiers in Summarization

TL;DR: How to leverage universal and domain-sepcific sentence embeddings using a graph structure for multi-document summarization.

1) Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision Paper or Paper
Athanasios Giannakopoulos*, Diego Antognini*, Claudiu Musat, Andreea Hossmann and Michael Baeriswyl
2017, ICDM Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE)

TL;DR: How to utilize large corpora for improved aspect term extraction using distant supervision.

Filed Patents

5) Layer Normalization for Calibrated Uncertainty in Deep Learning Paper
Thomas Frick, Mattia Rigotti, Diego Antognini, Ioana Giurgiu, Cristiano Malossi
2025

4) Domain-specificity prediction for natural language processing Paper
Diego Antognini*, Francesco Fusco*
2024

3) Determining specificity of text terms in application contexts Paper
Francesco Fusco*, Diego Antognini*
2024

2) Self-supervised term encoding with confidence estimation Paper
Francesco Fusco*, Diego Antognini*
2024

1) Updating window representations of sliding window of text using rolling scheme Paper
Francesco Fusco*, Diego Antognini*
2024

Projects (prior to Ph.D.)

From Relation Extraction to Knowledge Graphs - M.Sc. thesis

My master thesis at Iprova. A system for extracting concepts from large corpora and built interactive knowledge graphs to provide invention developers with new insights. View more

NeoBrain - B.Sc. thesis

A research project focused on optimizing neuronal activity maps treatment using massively parallel technologies. View more

Hurricane

A scalable, decentralized system that aggregates secondary storage devices in a cluster with the aim of supporting parallel scans of data stored across them. View more

Optimized flocking algorithm for e-pucks

I implemented, tested, analyzed and optimized a flocking algorithm for e-pucks. The objective was for the robots to avoid obstacles within the arena while maintaining their collective formation. Work in a multidisciplinary team. View more

PokerFace

Realization of a complete Texas Hold'em Poker game with artificial intelligence. View more

Starfighter 4K

Shoot 'em up game utilizing motion recognition with Kinect and Wiimotes for spaceship movement, inclination, and shooting. View more

CUDA

Several mini-projects are available for learning about GPGPU technologies, primarily CUDA. View more

Image classification

Classifier that recognizes the object present in an image using advanced models. The objects could be classified as a horse, airplane, car, or something else. View more

Social Recommendation System

Recommender systems for events based on user data and Facebook profile. View more

Facial recognition among profiles

Detect whether a person is wearing sunglasses using a collection of profile pictures of different individuals. Each person has pictures taken from different head angles, displaying different emotions, and with or without sunglasses. View more

Pattern classification and machine learning project 1

Project on regression and classification using linear models. One dataset is provided for each task without any accompanying information. View more

Recommender System challenge

Third task of the challenge of European Semantic Web Conference on a Top-N recommendation of books (ESWC-14 Challenge). Github Report

EPFL-IMDB

A movie directory with heavy database background using real data from IMDb. View more

Star²

Planetarium software displays the current view of the sky at the present location. View more

Diego Antognini, PhD

Tech Lead & Senior AI/ML Researcher at Google DeepMind

Interests in generative AI, LLM alignment, multimodal, iterative refinement, efficient ML, NLP, self-supervision, conversational explainable recommendation.

News

Selected Publications

Filed Patents

Projects (prior to Ph.D.)