ML Performance Engineer at AWS optimizing large language model inference on custom ML accelerators. 6+ years of experience spanning ML systems, performance optimization, distributed systems, and data engineering. Proficient in ML frameworks, kernel-level optimization, and low-level systems analysis.
Experience
Amazon Web Services
January 2025 – PresentML Performance Engineer
- Analyze and optimize inference performance of large language models, determining optimal sharding strategies across Trainium chips, AWS' custom ML accelerators
- Lead development of end-to-end performance benchmarking and dashboarding infrastructure for monitoring model inference latency and throughput
- Profile model workloads and implement collective and kernel optimizations in MLP, MoE, and attention kernels to improve LLM inference performance
- Implement kernel integrations and framework improvements including collective scheme changes on the vLLM-Neuron framework (vLLM plugin for Neuron hardware backend)
- Organize a recurring paper reading group to grow the team's knowledge of ML systems and emerging research
Amazon
August 2021 – December 2024Software Engineer
- Developed a real-time distributed platform and API for ad measurement using AWS CDK, EC2/ECS, ELB and Elasticache, enabling the onboarding of new advertiser groups not previously supported
- Develop and maintain systems producing attribution data for Modeled Attribution training
- Lead system launches by performing data validation, load testing, building dashboards and creating monitoring and alarming
- Work cross-functionally with Product Management to develop new features, perform data investigations, address Advertiser's concerns and explain business logic
Royal Bank of Canada
February 2019 – August 2021Data Engineer
- Developed multicomponent streaming and cache-based application for address cleansing, leading to 30% cost savings from the previous process
- Created end-to-end process for collecting, monitoring, and reporting application metrics across core components of the team's product
- Developed ETL pipelines with Spark and Airflow to merge multiple data sources for use as the back-end for an API
- Developed and managed various workflows with Airflow on OpenShift
Royal Bank of Canada
May 2018 – August 2018Data Science Co-op
- Trained a machine learning model using PySpark and scikit-learn that reduced false positive rate by 40% for anti-money laundering/fraud screening of wire payments
- Used design thinking process to understand the needs of our products end-users
- Demo and explain product to leaders of AML department
Government of Canada
September 2017 – April 2018Data Science Co-op, Ottawa
- Cleansed and preprocessed text data for NLP applications using pandas and numpy
- Performed exploratory data analysis & evaluated predictive models
- Created interactive visualization of results from a predictive model using plotly
- Evaluated practical uses of Bayesian deep learning methods relevant to the team's needs
Centre for Vision Research — York University
May 2017 – September 2017Research Intern
- Trained a convolutional neural network based on paper specifications using TensorFlow for particle detection within micrograph images
Technologies
Languages: Python, Java, Scala, SQL, TypeScript
ML & AI: PyTorch, vLLM, Neuron SDK, TensorFlow, scikit-learn, PySpark MLlib
Libraries & Frameworks: Spark, AWS CDK, Airflow, Hadoop, Pandas, NumPy, Elasticsearch, OpenShift, Git, Maven
Cloud Services: AWS — Trainium/Inferentia, ECS, EC2, EMR, S3, SQS, CloudWatch, Elasticache, DynamoDB
Education
York University
B.Sc., Honours in Computer Science