I design and build production AI systems that solve real problems. At Lextel.ai, I lead the development of Legal Deep Research, AI-powered Document Generation, and Agentic Legal Chat. My work spans the full stack of modern AI engineering: from training and deploying LLMs on Google Cloud, to building RAG pipelines and scalable Kubernetes infrastructure, to writing about it all on Medium.
I'm an AI Engineer at Lextel.ai (Tinexta Innovation Hub), where I lead the development of core AI features including Legal Deep Research, AI-powered Document Generation, and Agentic Legal Chat. I build production systems that bring generative AI to the legal industry.
My expertise spans the full AI stack: from training NER models and building RAG pipelines, to deploying open-source LLMs on Kubernetes with autoscaling, to engineering high-performance Elasticsearch clusters for vector and hybrid search.
I hold a B.Sc. in Computer Engineering from Universita degli studi di Bergamo, and I regularly write on Medium about practical AI engineering topics.
From cloud infrastructure to production AI systems.
Open-source tools, production pipelines, and AI experiments.
Cloud Function that auto-generates LinkedIn content with GPT, web research, DALL-E images, and engagement analytics.
RepositoryGPT-powered engine that translates natural language into Elasticsearch queries using index mapping as context.
RepositoryGKE pipeline for Lextel.ai: chunking, vectorizing and indexing at scale with KEDA, Celery, RabbitMQ and Grafana monitoring.
spaCy NER model extracting materials and components from technical texts with high precision.
Multi-step pipeline with Cloud Functions, VMs, Datastores, and Cloud Run processing PDFs into NoSQL.
Scraped and vectorized Italian recipe data, built a RAG chatbot for contextual cooking answers.
Practical AI engineering articles on LLM deployment, RAG, NER, and cloud infrastructure.

Knowledge distillation techniques to train lightweight NER models using LLMs, reducing manual labeling effort dramatically.
Read article
Complete guide to deploying Qwen, Mistral, and Llama on GKE with vLLM, autoscaling and scale-to-zero.
Read article
Why persistent memory is the missing piece in the AGI race and how next-gen RAG architectures are evolving.
Read article
Self-host small language models with Docker and Ollama for full data privacy and pay-only-when-used.
Read articleInterested in AI engineering, cloud architecture, or the latest in LLMs? Reach out.