Geoffrey Meric | Applied AI Engineer

Applied AI Engineer

I help build & evaluate agentic systems, deploy them into the world and monitor & iterate on them.

LLMOps & MLOps (improving reliability and reducing latency)
Model development & evaluation, with a focus on LLM classification (intent & routing, named-entity recognition, tool usage, LLM-as-a-judge) and generation (RAG generation, agent planning)
A/B experimentation at scale
Improving programmatic answering capabilities (routing, retrieval, generation, guard-railing) and agent planning capabilities
Developing context engineering & experience personalization solutions
Digging into product usage & system performance data to identify issues & improvement opportunities and understand user wants

Experience

Applied AI Engineer

Autodesk

Mar 2025 - Present Montreal, Canada

Distributed Backend Engineering (AWS), Science and LLMOps for Autodesk's Commerce & Support Agent
Designing full-lifecycle feature improvement experiments, progressive rollout & success metric measurement strategies, configuring A/B tests using LaunchDarkly platform, performing causal inference analysis on outcomes
Developing planning evaluation framework & named-entity recognition tool for ReAct Langchain Deep Agents
Primary MLOps responder in US timezones: monitoring operations, handling error spikes/VDB outages/alerts, shipping hotfixes, and improving observability via CloudWatch, Dynatrace, and the Opik LLMOps/eval platform
Latency optimization: deploying ECS + DynamoDB context engineering infrastructure to parallelize context computation (e.g. conversation summarization), reducing answer latency by ~1s/10% in 15k daily conversations
Backfilling LLM-as-a-judge conversation evaluation data for ~1M past conversations using Airflow & OpenAI Batch API, creating dashboard to semantically cluster user requests & source documents and inspect query performance (latency, errors, escalations, LLM-j metrics) filtered by product & request type and identify content gaps
Creating personalization PoCs using inferred user intent from activity & recommender outputs to steer answering
Developing a synthetic evaluation pipeline using LLMs to generate test queries and ground-truth answers from support docs, distill RAG outputs into fact-only representations with stylistic noise removed, and classify responses as semantically equivalent, incomplete, hallucinated, or contradictory while measuring document retrieval recall
Addressing repeated client reinitialization causing high latency & resource leaks via singletons (10x Weaviate error reduction), rewrote high-latency event instrumentation services to asynchronously buffer events to batch send, resolved faulty data bugs causing A/B test SRM + UX inconsistencies and errors for all LATAM & Norway users

Machine Learning Engineer

Serifos Technologies

Sep 2024 - Dec 2024 Montreal, Canada

Developed online economic sentiment analysis tools for institutional clients using GPT and BART-NLI models
Scraped and analyzed online forum discussions (X, reddit, etc.), using GPT and BART-NLI language models to identify salient topics and performed topic modeling to identify trending themes outside of predefined indicators
Built Streamlit UI to visualize user sentiment over time across economic indicators, search comments via vectorDB

Machine Learning Operations Engineering Intern

Autodesk

May 2024 - Aug 2024 Montreal, Canada

Created a python-based programmatic evaluation system for Retrieval-Augmented Generation (RAG) models to diagnose model failures and identify trends in queries that yield poor quality responses, using LLMs & NLI models
Improved conversation summarization by designing evaluation methods to compare summarization prompts

Software Development Intern

Autodesk

May 2023 - Aug 2023 Montreal, Canada

Built live job logging & monitoring tools for a cloud rendering Autodesk Maya plug-in & tripled its throughput

Software Engineering Intern

Procter & Gamble

May 2022 - Aug 2022 Geneva, Switzerland

Built customer data platform architecture visualization and CRM campaign operation automation dashboards

Other online presence

Substack Writing on AI, financial systems, and other topics. Lichess Bullet chess account. 2500+ bullet rating on lichess.