Open-Source NLP Research Projects

I'm a Machine Learning Research Engineer focused on NLP, XAI and Adversarial Robustness.

For up-to-date information follow my Hugging Face profile!

Hackathon Somos NLP 2023: Los LLMs hablan español

NLP
Somos NLP
Second edition of the largest open-source hackathon of NLP in Spanish. This year's edition counted with +500 participants, 17 speakers, and 7 mentors.
Check the awarded projects and the recorded talks and keynotes!

Somos Mujeres NLP

Women in AI
NLP
Somos NLP
Organized two initiatives to promote both the work and research of women in NLP and also the projects that apply NLP to fight sexism.

NLP Course by Hugging Face

NLP
Somos NLP
Contributing to the translation of the NLP Course by Hugging Face to Spanish.

BigCode Project: LLMs for Code

NLP
Research
Contributing to BigCode. Project in progress.

EleutherAI: Polyglot Romance

NLP
Research
BERTIN Project
Contributing to EleutherAI's research project "Polyglot Romance". Project in progress.

Hackathon of NLP in Spanish

NLP
Somos NLP
With more than 500 participants from 39 countries, it is the largest open-source hackathon of NLP in Spanish. The recorded talks and workshops have already more than 5k visualizations! Organized by Somos NLP and sponsored by Hugging Face, Platzi and Paperspace. Check the awarded projects!

BigScience Research Workshop

NLP
Hugging Face
Research
A one-year long international research workshop on large multilingual models and datasets. We created, among other cool things, ROOTS: A 1.6TB Composite Multilingual Dataset that was then used to train BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.

BERTIN Project: Perplexity Sampling

NLP
Hugging Face
Research
BERTIN is a series of RoBERTa-based models in Spanish trained using a novel sampling technique that we call "perplexity sampling". More detailed info can be found in the model card and the paper BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling.

Course: NLP de 0 a 100 con Hugging Face

NLP
Somos NLP
The first NLP course from zero to hero in Spanish. It's open-source and was organized by Somos NLP with the support of Spain AI. I taught the classes on sequential models and the Transformer architecture.

Pre-training GPT-2, T5 & Wav2Vec2 models in Spanish

NLP
Hugging Face
HF Hackathon
A series of Spanish language models trained with Flax/Jax and using TPUs sponsored by Google during the Flax/Jax Community Week organized by Hugging Face in June 2021. Here are the model cards: GPT-2 model , T5 model and Wav2Vec2 model.

WaiACCELERATE Program

Entrepreneurship
Women in AI & Robotics
A program where we provide women entrepreneurs with the tools, knowledge, mentoring and network to successfully realize their startup/business idea in the AI sector.

Making Spanish NLP datasets available in the HF Hub

NLP
Hugging Face
HF Hackathon
Addition of 3 datasets in Spanish to the huggingface/datasets library during the open-sprint organized by Hugging Face in Dec 2020. The datasets are HEAD-QA (a multi-choice HEAlthcare Dataset), the dataset of the eHealth-KD Challenge at IberLEF 2020, and the Spanish Billion Words Corpus.

Quality Analysis of ML Models

Python Package
AI Performance
AI Robustness
PyPI package to perform quality analyses on ML models. It focuses on the three quality pillars: functionality, robustness and explainability.

Chatbot COVID-19

Conversational AI
Backend
Frontend
DevOps
Math Thesis
Chatbot that understands and answers questions about the COVID-19: symptoms, prevention, regulation, the situation in Spain. Don't hesitate to chat with AURORA!
The chatbot understands correctly on the 1st attempt 92% of the requests and helped 1500+ people during the first months of the pandemic. Collaboration with Accenture’s Gijón office.

Neural Network for the study of the Higgs Boson with data from the LHC (CERN)

Machine Learning
Physics Thesis
Implementation of a Neural Network that predicts - with a correlation coefficient of 0.778 - characteristics of the Higgs Boson produced in the particle collider. Collaboration with the university's high energy particle research team.