Andrew Cheung

LinkedIn | Github | Resume

Education

Stony Brook University 2022 - 2024
Master's in Applied Mathematics and Statistics
Cornell University 2015 - 2019
Bachelor's in Biological Engineering

Work Experience

Epic Systems 2024 - Present
Technical Solutions Engineer
Rockefeller University 2019 - 2022
Research Assistant
Andrew Cheung

About Me

Hi! I am a Technical Solutions Engineer at Epic Systems. I focus on customer success on Ambulatory used by clinics and outpatient groups. My role involves troubleshooting issues, escalating critical concerns, and implementing code fixes to improve functionality. I am involved in the decision-making process behind optimizing workflows for millions of users. This experience has strengthened my appreciation for well-designed software and its ability to solve real-world problems.

I am passionate about leveraging my background in computational biology and applied mathematics to drive meaningful impacts in software engineering and data science. I have completed advanced graduate coursework in algorithms (including string matching and computational geometry) and machine learning (covering statistical learning and NLP).

Andrew at beach

Interests

I grew up in New York City and love eating good food. Some of my hobbies include playing video games, reading books, and enjoying the outdoors. My favorite shows include Community and Attack on Titan.

Technical Skills

Language: Python, MATLAB, JavaScript, R, C/C++, SQL, MUMPS (M)
Front-End & Back-End: HTML, CSS, MongoDB, React.js, Express.js, Node.js, Git, REST API

Projects


League of Legends Draft Generation


This project is a full-stack web application that uses machine learning to generate optimal League of Legends team drafts based on real match data. I used Riot's API to collect and preprocess 14,000 unique matches from the latest 100 matches played from the top 300 Challenger players (patch 15.8). Built with Flask, the site lets users input an enemy draft, choose a region-specific model (NA, KR, EUW), and receive a recommended team composition with its predicted win probability.

The site integrates two machine learning models:

  • Outcome Predictor: A binary classifier that estimates win probability based on team compositions.
  • Draft Generator: A transformer model that generates team drafts countering the opponent’s composition.

Github

GIF of League Draft Generator

Moments


I built Moments, a social networking website built with React.js for the front end and Node.js with the Express.js framework for the back end. To securely store user profiles and posts, I used MongoDB Atlas. Users can register by providing an email, username, and password. On the home page, they can view and like posts from other users. Additionally, users can create their own posts, including a title, image, and caption. Customizable profiles allow for personalization with a profile picture and biography section.

Github

Picture of Moments

Mental Health Chatbot


In this project, I fine-tuned the open-sourced Llama 2 Transformer model to act as a mental health assistant. My goal was to enhance the accuracy and relevance of the model's responses in the context of mental health therapy. Three datasets were used: COUNSELING dataset (which contains real therapist-patient responses on online platforms), PHR dataset (a synthetically generated counseling data), and data scraped from mental health-related subreddits such as r/offmychest, r/advice, r/mentalhealth, r/confessions, and r/self. We train three models using different combinations of these datasets:

  • 1000 rows of COUNSELING dataset
  • 200 rows of COUNSELING + 1000 rows of PHR dataset
  • 200 rows of COUNSELING + 1000 rows of Reddit web-scraped data

  • The performance of these models, compared with the vanilla Llama2, was evaluated using the BLEU score on unseen data (the last 500 rows of the COUNSELING dataset). The results showed an improvement in the performance of the trained models, as measured by the 4-gram BLEU score.

    Github

    Retro-Doodle Jump


    I developed this game using Python’s Tkinter and Time libraries. Players control a character in an infinite level-climbing adventure, with stages growing progressively longer. Each level is packed with enemies to defeat and helpful power-ups.

    Here is how to play:

  • Use arrow keys to move left and right
  • Use space to jump
  • Reach the top black platform to win the stage
  • Green power-ups boost jump height when landed on
  • Only touch the red Enemy platforms from the top, otherwise you lose!
  • Github

    Picture of Game

    Polygonal Face Detection and Graph Analysis


    In this program I created, user-drawn line segments are analyzed to determine whether their union forms a closed face. I implemented the Bentley Ottmann Sweep Line Algorithm with a self-balancing tree structure to efficiently detect line intersections.

    Each line segment is represented as a node within a custom-built AVL tree, designed to manage the Sweep Line Status (SLS), which maintains the segments in left-to-right order as a horizontal sweep line progresses through segments. The tree determines the placement of nodes by performing left tests, deciding whether a segment should be inserted as a left or right child. As the sweep line moves downward, segments are inserted into or removed from the SBT, and intersections between adjacent segments are identified and processed. When an intersection occurs, the involved segments swap positions in the AVL tree, and potential new intersections are checked efficiently. At any moment, the event queue contains only the highest intersection that has yet to be processed.

    Each intersection, once processed, is assigned a unique label, which is stored in the corresponding segment’s node. If two intersecting segments share the same label, it propagates through subsequent intersections along the structure. The key idea is that the topmost intersection label is passed down through all related segments. A closed face is identified when two intersecting segments already contain the same label, confirming that they have previously intersected and fully enclosed a region.

    Github

    Line Intersection

    String Matching


    I created this project to explore and test different approaches to the string-matching problem, a fundamental algorithmic challenge with important applications in bioinformatics, such as mapping sequencing reads to genes or protein sequences. My goal was to implement and compare key data structures, including the suffix array, Burrows-Wheeler Transform (BWT), and FM-index, to see how they perform in efficient pattern searching.

    Given a long text of length n and a short pattern of length m, the suffix array is constructed in O(n log n) time by sorting suffixes based on the first 2i characters (a suboptimal but sufficient approach). The FM-index enables efficient pattern searching in reverse order in O(m) time, leveraging the BWT to determine character ranks and locate positions in the suffix array.

    As a benchmark, the Knuth-Morris-Pratt (KMP) algorithm serves as a control, identifying all occurrences of a pattern of length m within a text of length n in O(n) time. This comparison helps assess the FM-index's accuracy and efficiency.

    Github