Federico Cesarini

AI Engineer

I design and build production AI systems that solve real problems. At Lextel.ai, I lead the development of Legal Deep Research, AI-powered Document Generation, and Agentic Legal Chat. My work spans the full stack of modern AI engineering: from training and deploying LLMs on Google Cloud, to building RAG pipelines and scalable Kubernetes infrastructure, to writing about it all on Medium.

Get in Touch View Projects Read Articles

About Me

Engineering the next wave of AI systems

I'm an AI Engineer at Lextel.ai (Tinexta Innovation Hub), where I lead the development of core AI features including Legal Deep Research, AI-powered Document Generation, and Agentic Legal Chat. I build production systems that bring generative AI to the legal industry.

My expertise spans the full AI stack: from training NER models and building RAG pipelines, to deploying open-source LLMs on Kubernetes with autoscaling, to engineering high-performance Elasticsearch clusters for vector and hybrid search.

I hold a B.Sc. in Computer Engineering from Universita degli studi di Bergamo, and I regularly write on Medium about practical AI engineering topics.

AI & Generative AI

RAGAgentic AI LLM / SLM DeployGPT GeminiOpenAI Agents SDK Google ADKSmolAgents NERFine-tuning vLLM

Cloud & Infrastructure

GCPKubernetes (GKE) KEDACloud Run Cloud FunctionsApp Engine DatastoreDocker

Search & Data

ElasticsearchVector Search Hybrid QueriesEland ES RallyText Vectorization

Development & Ops

PythonDjango FastAPICelery RabbitMQRedis PrometheusGrafana Linux / Bash

Career

Professional Experience

From cloud infrastructure to production AI systems.

AI Engineer

Tinexta Innovation Hub

Feb 2025 - Present

Architected Kubernetes ingestion pipeline with KEDA autoscaling, increasing speed by 100%
Implemented Celery into K8s pipeline enabling parallel processing of vast data volumes
Developed and deployed generative AI solutions for legal and patent data analysis
Created and optimized RAG pipelines for improved retrieval from legal documents
Led Elasticsearch cluster operations, engineering queries to boost retrieval performance
Deployed ML models to Elasticsearch using Eland
Benchmarked production Elasticsearch clusters using ES Rally
Mentored junior developers, accelerating onboarding

Software Engineer

Warrant Hub

Dec 2021 - Feb 2025

Developed GCP infrastructure solutions for scalable data processing
Trained NER models for material extraction from patent data, achieving 85% accuracy
Created NLP text ingestion solutions for unstructured patent information
Designed Elasticsearch text and vector databases for efficient retrieval
Built Django backend systems for data processing applications
Created Kubernetes pipelines for automated data processing workflows

Software Developer

GestApp srl, Bergamo

Jun - Oct 2020

Developed backend solutions using C# and VB.NET
Created and maintained business logic components for enterprise applications

Retail Employee

Apple Inc

Aug 2019 - Jan 2020

Provided exceptional customer service in an Apple Store environment
Assisted customers with technical issues and product recommendations

Portfolio

Key Projects

Open-source tools, production pipelines, and AI experiments.

linkAIin

Cloud Function that auto-generates LinkedIn content with GPT, web research, DALL-E images, and engagement analytics.

PythonCloud FunctionsGPTDALL-E

Repository

ES Natural Language Query

GPT-powered engine that translates natural language into Elasticsearch queries using index mapping as context.

ElasticsearchGPT APINLP

Repository

Embeddings K8s Pipeline

GKE pipeline for Lextel.ai: chunking, vectorizing and indexing at scale with KEDA, Celery, RabbitMQ and Grafana monitoring.

KubernetesCeleryRabbitMQGrafana

NER Material Model

spaCy NER model extracting materials and components from technical texts with high precision.

spaCyNERMLPython

GCP Ingestion Pipeline

Multi-step pipeline with Cloud Functions, VMs, Datastores, and Cloud Run processing PDFs into NoSQL.

GCPCloud RunNoSQL

Italian Recipes RAG Bot

Scraped and vectorized Italian recipe data, built a RAG chatbot for contextual cooking answers.

RAGVector SearchGenAI

Writing

Medium Publications

Practical AI engineering articles on LLM deployment, RAG, NER, and cloud infrastructure.

December 2025

NER in the LLM Era: How I Used Giant Models to Train Tiny Ones

Knowledge distillation techniques to train lightweight NER models using LLMs, reducing manual labeling effort dramatically.

Read article

November 2025

Deploy Any Open-Source LLM on Google Cloud Without a Single GPU

Complete guide to deploying Qwen, Mistral, and Llama on GKE with vLLM, autoscaling and scale-to-zero.

Read article

November 2025

The AI Memory Problem: Why RAG Needs to Evolve

Why persistent memory is the missing piece in the AGI race and how next-gen RAG architectures are evolving.

Read article

October 2025

Deploy an SLM on Cloud Run and Scale to Zero

Self-host small language models with Docker and Ollama for full data privacy and pay-only-when-used.

Read article

Federico Cesarini

AI Engineer

Engineering the next wave of AI systems

AI & Generative AI

Cloud & Infrastructure

Search & Data

Development & Ops

Professional Experience

Led core AI features for Lextel.ai

Key Projects

linkAIin

ES Natural Language Query

Embeddings K8s Pipeline

NER Material Model

GCP Ingestion Pipeline

Italian Recipes RAG Bot

Medium Publications

NER in the LLM Era: How I Used Giant Models to Train Tiny Ones

Deploy Any Open-Source LLM on Google Cloud Without a Single GPU

The AI Memory Problem: Why RAG Needs to Evolve

Deploy an SLM on Cloud Run and Scale to Zero

Let's work together