Projects

Scaling Distributed GEMM on Cerebras Wafer-Scale Engine

Scaling Distributed GEMM on Cerebras Wafer-Scale Engine In Large Language Model (LLM), the fundamental operations of transformer architecture are attention and multi-layer perceptron computation, both of which are built on a massive amount of GEMM (General Matrix Multiply) and GEMV (General Matrix-Vector Multiplication). During inference, the decoding step (specifically GEMV) is memory-bandwidth bound due to the LLM autoregressive nature. (i.e. the GPU, such as NVIDIA’s accelerator, spends most of its time loading data into the compute unit for a relatively little computation, which computes one new token per step....

Note #2: Random Exploration on SGLang Kernel

This study detailed the SGLang rabbit hole I went down.

Note #1: How Computers Use Power

A brief history of Computers The precursor of modern computers - the Analytical Engine - was proposed by Charles Babbage in the 1800s. Although it was never fully built, this is the first mechanical general-purpose computer, powered by steam, that featured a CPU-like processor, memory, and programmable input devices such as punched cards. This design laid the groundwork for concepts in general-purpose computing, including the separation of processing and storage. Around the same time, Ada Lovelace is the first to recognize that the Analytical Engine has applications beyond pure calculation, and is often recognized as the first computer programmer....

A survey on Inference Serving for Large Language Models

This blog includes my review on Inference Serving for Large Language Models as part of the requirements in the R244 Large-scale data processing and optimisation course.

A review of TensorFlow by M Abadi et al.

This blog includes my review for TensorFlow as part of the requirements in the R244 Large-scale data processing and optimisation course.

Optimizing Scientific Applications on HPC Systems

This blog offers an engineer’s perspective on optimizing the performance of scientific applications on HPC heterogeneous systems, drawing from international HPC competition experience and an internship in NSCC.

NTU Multidisciplinary Project (MDP)

MDP is a group-based design and development robot car project. This blog aims to provide a starting guide for incoming students who will be taking this course. The information is based on MDP AY24/25 Sem 2.