Gabriel Cretin
I build deep learning models for protein biology and the infrastructure to train them. From designing neural architectures to managing GPU clusters, I handle both the science and the systems.
Deep Learning
Computational Biologist - Ph.D
Specialized in Protein Language Models and distributed GPU training using PyTorch with 40k+ A100/H100 GPU hours on national HPC (IDRIS/CNRS). Published research on protein structure analysis, prediction, and generative modeling.
SRE & MLOps
Linux System Administrator
Running 15+ production web services including GitLab, JupyterHub, and structural biology tools. Managing a GPU cluster with 20+ workstations, 380+ CPUs, and 1 PB+ backup infrastructure.
Professional Profile
A rare hybrid profile combining 6 years of Linux infrastructure management with 4 years of deep learning research. I design novel AI architectures and deploy them on the infrastructure I build, training on national supercomputers to production APIs serving 18K+ weekly requests.
Scientific Track
Deep Learning Research
Focus on Protein Language Models (ESM-2, Ankh, ProtTrans) and generative architectures. Developed Adversarial Autoencoders for embedding compression and contrastive learning for improved fold recognition, published in top-tier journals.
- 8 peer-reviewed publications
- 40k+ GPU hours on IDRIS/CNRS
- 4 production web tools (PYTHIA, SWORD2, ICARUS, PEGASUS)
Engineering Track
SRE & Infrastructure
Managing a complete Linux ecosystem: web servers, GPU clusters, HPC, and centralized authentication. Building reliable platforms for research teams with automated provisioning and monitoring.
- 14 web servers, 12 databases (~18K views/week)
- HPC cluster: 48 nodes, 708 cores
- 1 PB+ cumulated storage infrastructure
AI & Protein Science
Ph.D Research (2021–2025)
Representation Learning & Generative Modeling
Designed Adversarial Autoencoder (AAE) architectures to compress high-dimensional pLM embeddings (ESM-2, Ankh) into fixed-size latent spaces. Implemented contrastive triplet learning to improve structural fold recognition, surpassing state-of-the-art structure-based methods. Explored de novo protein design through latent space interpolation.
Papers in preparation - currently writing manuscripts on representation learning, adversarial autoencoders, and contrastive learning.
Tech Stack
ML Engineering
- PyTorch / Lightning Expert
- Python / Bash Expert
- HPC / SLURM Advanced
- AlphaFold / Foldseek Advanced
SRE & Infrastructure
HPC & Compute
Cluster Management
5 GPU servers for deep learning & molecular dynamics. SGI HPC cluster: 48 nodes, 708 cores. 25 Linux workstations with 700+ CPUs.
Services
Self-Hosted Stack
14 web servers, 2 APIs, 12 databases serving ~18K views/week. GitLab, Mattermost, JupyterHub, centralized auth.
Automation & Networking
Infrastructure as Code
1 PB+ backup infrastructure (3 dedicated servers). Ansible playbooks, Docker Swarm orchestration, Samba/NFS storage.
Publications
Featured Publications
PEGASUS
Protein Science 2025
Protein flexibility is essential to its biological function. However, experimental methods for its assessment, such as X‐ray crystallography and nucle...
ATLAS
Nucleic Acids Research 2024
Dynamical behaviour is one of the most crucial protein characteristics. Despite the advances in the field of protein structure resolution and predicti...
SWORD2
Nucleic Acids Research 2022
Understanding the functions and origins of proteins requires splitting these macromolecules into fragments that could be independent in terms of folding, activity, or evolution. For that purpose, stru
MEDUSA
Journal of Molecular Biology 2021
MEDUSA: Prediction of Protein Flexibility from Sequence
Papers in Preparation
Currently writing manuscripts on my Ph.D. research — expected submission in 2025:
- Representation Learning with Adversarial Autoencoders — Compressing pLM embeddings into continuous latent spaces for protein generation
- Contrastive Learning for Fold Recognition — Triplet-based training to improve structural similarity detection beyond structure-based methods
Complete Bibliography
2025
Ragousandirane Radjasandirane, Gabriel Cretin, Julien Diharce, Alexandre G. de Brevern, Jean-Christophe Gelly. PATHOS: Predicting Variant Pathogenicity by Combining Protein Language Models and Biological Features.
Yann Vander Meersche, Gabriel Duval, Gabriel Cretin, Aria Gheeraert, Jean‐Christophe Gelly, Tatiana Galochkina. PEGASUS: Prediction of MD‐derived protein flexibility from sequence. Protein Science, 2025.
Aria Gheeraert, Thomas Bailly, Yani Ren, Ali Hamraoui, Julie Te, Yann Vander Meersche, Gabriel Cretin, Ravy Leon Foun Lin, Jean-Christophe Gelly, Serge Pérez, Frédéric Guyon, Tatiana Galochkina. DIONYSUS: a database of protein–carbohydrate interfaces. Nucleic Acids Research, 2025.
Charlotte Perin, Gabriel Cretin, Jean-Christophe Gelly. Hierarchical Analysis of Protein Structures: From Secondary Structures to Protein Units and Domains. Methods in Molecular Biology, 2025.
2024
Yann Vander Meersche, Gabriel Cretin, Aria Gheeraert, Jean-Christophe Gelly, Tatiana Galochkina. ATLAS: protein flexibility description from atomistic molecular dynamics simulations. Nucleic Acids Research, 2024.
2023
Gabriel Cretin, Charlotte Périn, Nicolas Zimmermann, Tatiana Galochkina, Jean-Christophe Gelly, Lenore Cowen. ICARUS: flexible protein structural alignment based on Protein Units. Bioinformatics, 2023.
2022
Gabriel Cretin, Tatiana Galochkina, Yann Vander Meersche, Alexandre G de Brevern, Guillaume Postic, Jean-Christophe Gelly. SWORD2: hierarchical analysis of protein 3D structures. Nucleic Acids Research, 2022.
Nora El Jahrani, Gabriel Cretin, Alexandre G. de Brevern. CALR-ETdb, the database of calreticulin variants diversity in essential thrombocythemia. Platelets, 2022.
2021
Gabriel Cretin, Tatiana Galochkina, Alexandre de Brevern, Jean-Christophe Gelly. PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction. International Journal of Molecular Sciences, 2021.
Yann Vander Meersche, Gabriel Cretin, Alexandre G. de Brevern, Jean-Christophe Gelly, Tatiana Galochkina. MEDUSA: Prediction of Protein Flexibility from Sequence. Journal of Molecular Biology, 2021.
CV / Resume
Download the PDF and browse a short timeline snapshot.
-
PhD Thesis Defense
Université Paris Cité - "Deep learning approaches for protein analysis, prediction, and generation."
-
PhD Student
DSIMB Lab - Protein Language Models embeddings compression into continuous latent space for generation,
and protein structure analysis and prediction -
Lead Linux System Administrator
Managing the full-stack infrastructure: 30+ GPU workstations, GPU cluster, 1 PB+ storage and containerized web services.
-
MSc (Master) - Biology, Computer Science, Bioinformatics
Université Paris Cité - with honors (rank 2/23).
-
BSc (Licence 3)- Biology, Computer Science, Bioinformatics
Université Paris Cité - rank 6/21.
-
Two-year degree (D.U.T) - Bioengineering & Bioinformatics
Université de Clermont-Ferrand (Campus Aurillac) - ranks 4/45 (1st year) and 5/34 (2nd year).
© Gabriel Cretin