Applied AI Engineer
I help build & evaluate agentic systems, deploy them into the world and monitor & iterate on them.
- LLMOps & MLOps (improving reliability and reducing latency)
- Model development & evaluation, with a focus on LLM classification (intent & routing, named-entity recognition, tool usage, LLM-as-a-judge) and generation (RAG generation, agent planning)
- A/B experimentation at scale
- Improving programmatic answering capabilities (routing, retrieval, generation, guard-railing) and agent planning capabilities
- Developing context engineering & experience personalization solutions
- Digging into product usage & system performance data to identify issues & improvement opportunities and understand user wants
Experience
Applied AI Engineer
Autodesk
Applied AI Engineer
Autodesk
- Distributed Backend Engineering (AWS), Science and LLMOps for Autodesk's Commerce & Support Agent
- Designing full-lifecycle feature improvement experiments, progressive rollout & success metric measurement strategies, configuring A/B tests using LaunchDarkly platform, performing causal inference analysis on outcomes
- Developing planning evaluation framework & named-entity recognition tool for ReAct Langchain Deep Agents
- Primary MLOps responder in US timezones: monitoring operations, handling error spikes/VDB outages/alerts, shipping hotfixes, and improving observability via CloudWatch, Dynatrace, and the Opik LLMOps/eval platform
- Latency optimization: deploying ECS + DynamoDB context engineering infrastructure to parallelize context computation (e.g. conversation summarization), reducing answer latency by ~1s/10% in 15k daily conversations
- Backfilling LLM-as-a-judge conversation evaluation data for ~1M past conversations using Airflow & OpenAI Batch API, creating dashboard to semantically cluster user requests & source documents and inspect query performance (latency, errors, escalations, LLM-j metrics) filtered by product & request type and identify content gaps
- Creating personalization PoCs using inferred user intent from activity & recommender outputs to steer answering
- Developing a synthetic evaluation pipeline using LLMs to generate test queries and ground-truth answers from support docs, distill RAG outputs into fact-only representations with stylistic noise removed, and classify responses as semantically equivalent, incomplete, hallucinated, or contradictory while measuring document retrieval recall
- Addressing repeated client reinitialization causing high latency & resource leaks via singletons (10x Weaviate error reduction), rewrote high-latency event instrumentation services to asynchronously buffer events to batch send, resolved faulty data bugs causing A/B test SRM + UX inconsistencies and errors for all LATAM & Norway users
Machine Learning Engineer
Serifos Technologies
Machine Learning Engineer
Serifos Technologies
- Developed online economic sentiment analysis tools for institutional clients using GPT and BART-NLI models
- Scraped and analyzed online forum discussions (X, reddit, etc.), using GPT and BART-NLI language models to identify salient topics and performed topic modeling to identify trending themes outside of predefined indicators
- Built Streamlit UI to visualize user sentiment over time across economic indicators, search comments via vectorDB
Machine Learning Operations Engineering Intern
Autodesk
Machine Learning Operations Engineering Intern
Autodesk
- Created a python-based programmatic evaluation system for Retrieval-Augmented Generation (RAG) models to diagnose model failures and identify trends in queries that yield poor quality responses, using LLMs & NLI models
- Improved conversation summarization by designing evaluation methods to compare summarization prompts
Software Development Intern
Autodesk
Software Development Intern
Autodesk
- Built live job logging & monitoring tools for a cloud rendering Autodesk Maya plug-in & tripled its throughput
Software Engineering Intern
Procter & Gamble
Software Engineering Intern
Procter & Gamble
- Built customer data platform architecture visualization and CRM campaign operation automation dashboards