Federico Cesarini

Federico Cesarini

AI Engineer

I design and build production AI systems that solve real problems. At Lextel.ai, I lead the development of Legal Deep Research, AI-powered Document Generation, and Agentic Legal Chat. My work spans the full stack of modern AI engineering: from training and deploying LLMs on Google Cloud, to building RAG pipelines and scalable Kubernetes infrastructure, to writing about it all on Medium.

Engineering the next wave of AI systems

I'm an AI Engineer at Lextel.ai (Tinexta Innovation Hub), where I lead the development of core AI features including Legal Deep Research, AI-powered Document Generation, and Agentic Legal Chat. I build production systems that bring generative AI to the legal industry.

My expertise spans the full AI stack: from training NER models and building RAG pipelines, to deploying open-source LLMs on Kubernetes with autoscaling, to engineering high-performance Elasticsearch clusters for vector and hybrid search.

I hold a B.Sc. in Computer Engineering from Universita degli studi di Bergamo, and I regularly write on Medium about practical AI engineering topics.

AI & Generative AI

RAGAgentic AI LLM / SLM DeployGPT GeminiOpenAI Agents SDK Google ADKSmolAgents NERFine-tuning vLLM

Cloud & Infrastructure

GCPKubernetes (GKE) KEDACloud Run Cloud FunctionsApp Engine DatastoreDocker

Search & Data

ElasticsearchVector Search Hybrid QueriesEland ES RallyText Vectorization

Development & Ops

PythonDjango FastAPICelery RabbitMQRedis PrometheusGrafana Linux / Bash

Professional Experience

From cloud infrastructure to production AI systems.

Feb 2025 - Present

Led core AI features for Lextel.ai

Spearheaded the development of major functionality powering the legal AI platform.

Legal Deep Research Document Generation Agentic Legal Chat
  • Architected Kubernetes ingestion pipeline with KEDA autoscaling, increasing speed by 100%
  • Implemented Celery into K8s pipeline enabling parallel processing of vast data volumes
  • Developed and deployed generative AI solutions for legal and patent data analysis
  • Created and optimized RAG pipelines for improved retrieval from legal documents
  • Led Elasticsearch cluster operations, engineering queries to boost retrieval performance
  • Deployed ML models to Elasticsearch using Eland
  • Benchmarked production Elasticsearch clusters using ES Rally
  • Mentored junior developers, accelerating onboarding
Software Engineer
Warrant Hub
Dec 2021 - Feb 2025
  • Developed GCP infrastructure solutions for scalable data processing
  • Trained NER models for material extraction from patent data, achieving 85% accuracy
  • Created NLP text ingestion solutions for unstructured patent information
  • Designed Elasticsearch text and vector databases for efficient retrieval
  • Built Django backend systems for data processing applications
  • Created Kubernetes pipelines for automated data processing workflows
Software Developer
GestApp srl, Bergamo
Jun - Oct 2020
  • Developed backend solutions using C# and VB.NET
  • Created and maintained business logic components for enterprise applications
Retail Employee
Apple Inc
Aug 2019 - Jan 2020
  • Provided exceptional customer service in an Apple Store environment
  • Assisted customers with technical issues and product recommendations

Key Projects

Open-source tools, production pipelines, and AI experiments.

linkAIin

Cloud Function that auto-generates LinkedIn content with GPT, web research, DALL-E images, and engagement analytics.

PythonCloud FunctionsGPTDALL-E
Repository

ES Natural Language Query

GPT-powered engine that translates natural language into Elasticsearch queries using index mapping as context.

ElasticsearchGPT APINLP
Repository

Embeddings K8s Pipeline

GKE pipeline for Lextel.ai: chunking, vectorizing and indexing at scale with KEDA, Celery, RabbitMQ and Grafana monitoring.

KubernetesCeleryRabbitMQGrafana

NER Material Model

spaCy NER model extracting materials and components from technical texts with high precision.

spaCyNERMLPython

GCP Ingestion Pipeline

Multi-step pipeline with Cloud Functions, VMs, Datastores, and Cloud Run processing PDFs into NoSQL.

GCPCloud RunNoSQL

Italian Recipes RAG Bot

Scraped and vectorized Italian recipe data, built a RAG chatbot for contextual cooking answers.

RAGVector SearchGenAI

Medium Publications

Practical AI engineering articles on LLM deployment, RAG, NER, and cloud infrastructure.

NER in the LLM Era
December 2025

NER in the LLM Era: How I Used Giant Models to Train Tiny Ones

Knowledge distillation techniques to train lightweight NER models using LLMs, reducing manual labeling effort dramatically.

Read article
Deploy LLM on GCP
November 2025

Deploy Any Open-Source LLM on Google Cloud Without a Single GPU

Complete guide to deploying Qwen, Mistral, and Llama on GKE with vLLM, autoscaling and scale-to-zero.

Read article
AI Memory Problem
November 2025

The AI Memory Problem: Why RAG Needs to Evolve

Why persistent memory is the missing piece in the AGI race and how next-gen RAG architectures are evolving.

Read article
SLM on Cloud Run
October 2025

Deploy an SLM on Cloud Run and Scale to Zero

Self-host small language models with Docker and Ollama for full data privacy and pay-only-when-used.

Read article
B.Sc. Computer Engineering
Universita degli studi di Bergamo
Sep 2017 - Aug 2021
Italian (Native) English (C1)

Let's work together

Interested in AI engineering, cloud architecture, or the latest in LLMs? Reach out.