Andrew Cheung

League of Legends Draft Generation

This project is a full-stack web application that uses machine learning to generate optimal League of Legends team drafts based on real match data. I used Riot's API to collect and preprocess 14,000 unique matches from the latest 100 matches played from the top 300 Challenger players (patch 15.8). Built with Flask, the site lets users input an enemy draft, choose a region-specific model (NA, KR, EUW), and receive a recommended team composition with its predicted win probability.

The site integrates two machine learning models:

Outcome Predictor: A binary classifier that estimates win probability based on team compositions.
Draft Generator: A transformer model that generates team drafts countering the opponent’s composition.

Github

Moments

I built Moments, a social networking website built with React.js for the front end and Node.js with the Express.js framework for the back end. To securely store user profiles and posts, I used MongoDB Atlas. Users can register by providing an email, username, and password. On the home page, they can view and like posts from other users. Additionally, users can create their own posts, including a title, image, and caption. Customizable profiles allow for personalization with a profile picture and biography section.

Github

Mental Health Chatbot

In this project, I fine-tuned the open-sourced Llama 2 Transformer model to act as a mental health assistant. My goal was to enhance the accuracy and relevance of the model's responses in the context of mental health therapy. Three datasets were used: COUNSELING dataset (which contains real therapist-patient responses on online platforms), PHR dataset (a synthetically generated counseling data), and data scraped from mental health-related subreddits such as r/offmychest, r/advice, r/mentalhealth, r/confessions, and r/self. We train three models using different combinations of these datasets:

1000 rows of COUNSELING dataset

200 rows of COUNSELING + 1000 rows of PHR dataset

200 rows of COUNSELING + 1000 rows of Reddit web-scraped data

The performance of these models, compared with the vanilla Llama2, was evaluated using the BLEU score on unseen data (the last 500 rows of the COUNSELING dataset). The results showed an improvement in the performance of the trained models, as measured by the 4-gram BLEU score.

Github

Retro-Doodle Jump

I developed this game using Python’s Tkinter and Time libraries. Players control a character in an infinite level-climbing adventure, with stages growing progressively longer. Each level is packed with enemies to defeat and helpful power-ups.

Here is how to play:

Use arrow keys to move left and right

Use space to jump

Reach the top black platform to win the stage

Green power-ups boost jump height when landed on

Only touch the red Enemy platforms from the top, otherwise you lose!

Github

Polygonal Face Detection and Graph Analysis

In this program I created, user-drawn line segments are analyzed to determine whether their union forms a closed face. I implemented the Bentley Ottmann Sweep Line Algorithm with a self-balancing tree structure to efficiently detect line intersections.

Each line segment is represented as a node within a custom-built AVL tree, designed to manage the Sweep Line Status (SLS), which maintains the segments in left-to-right order as a horizontal sweep line progresses through segments. The tree determines the placement of nodes by performing left tests, deciding whether a segment should be inserted as a left or right child. As the sweep line moves downward, segments are inserted into or removed from the SBT, and intersections between adjacent segments are identified and processed. When an intersection occurs, the involved segments swap positions in the AVL tree, and potential new intersections are checked efficiently. At any moment, the event queue contains only the highest intersection that has yet to be processed.

Each intersection, once processed, is assigned a unique label, which is stored in the corresponding segment’s node. If two intersecting segments share the same label, it propagates through subsequent intersections along the structure. The key idea is that the topmost intersection label is passed down through all related segments. A closed face is identified when two intersecting segments already contain the same label, confirming that they have previously intersected and fully enclosed a region.

Github

String Matching

I created this project to explore and test different approaches to the string-matching problem, a fundamental algorithmic challenge with important applications in bioinformatics, such as mapping sequencing reads to genes or protein sequences. My goal was to implement and compare key data structures, including the suffix array, Burrows-Wheeler Transform (BWT), and FM-index, to see how they perform in efficient pattern searching.

Given a long text of length n and a short pattern of length m, the suffix array is constructed in O(n log n) time by sorting suffixes based on the first 2ⁱ characters (a suboptimal but sufficient approach). The FM-index enables efficient pattern searching in reverse order in O(m) time, leveraging the BWT to determine character ranks and locate positions in the suffix array.

As a benchmark, the Knuth-Morris-Pratt (KMP) algorithm serves as a control, identifying all occurrences of a pattern of length m within a text of length n in O(n) time. This comparison helps assess the FM-index's accuracy and efficiency.

Github

Andrew Cheung

Education

Work Experience

About Me

Interests

Technical Skills

Publications

Projects

League of Legends Draft Generation

Moments

Mental Health Chatbot

Retro-Doodle Jump

Polygonal Face Detection and Graph Analysis

String Matching