Scaling Distributed GEMM on Cerebras Wafer-Scale Engine

Scaling Distributed GEMM on Cerebras Wafer-Scale Engine In Large Language Model (LLM), the fundamental operations of transformer architecture are attention and multi-layer perceptron computation, both of which are built on a massive amount of GEMM (General Matrix Multiply) and GEMV (General Matrix-Vector Multiplication). During inference, the decoding step (specifically GEMV) is memory-bandwidth bound due to the LLM autoregressive nature. (i.e. the GPU, such as NVIDIA’s accelerator, spends most of its time loading data into the compute unit for a relatively little computation, which computes one new token per step....

March 30, 2026 · 76 min
Scaling Distributed GEMM on Cerebras Wafer-Scale Engine

Note #2: Random Exploration on SGLang Kernel

This study detailed the SGLang rabbit hole I went down.

February 7, 2026 · 24 min
Note #2: Random Exploration on SGLang Kernel

Note #1: How Computers Use Power

A brief history of Computers The precursor of modern computers - the Analytical Engine - was proposed by Charles Babbage in the 1800s. Although it was never fully built, this is the first mechanical general-purpose computer, powered by steam, that featured a CPU-like processor, memory, and programmable input devices such as punched cards. This design laid the groundwork for concepts in general-purpose computing, including the separation of processing and storage. Around the same time, Ada Lovelace is the first to recognize that the Analytical Engine has applications beyond pure calculation, and is often recognized as the first computer programmer....

January 11, 2026 · 9 min
Note #1: How Computers Use Power

A survey on Inference Serving for Large Language Models

This blog includes my review on Inference Serving for Large Language Models as part of the requirements in the R244 Large-scale data processing and optimisation course.

December 23, 2025 · 13 min
A survey on Inference Serving for Large Language Models

A review of TensorFlow by M Abadi et al.

This blog includes my review for TensorFlow as part of the requirements in the R244 Large-scale data processing and optimisation course.

October 25, 2025 · 12 min
A review of TensorFlow by M Abadi et al.

Optimizing Scientific Applications on HPC Systems

This blog offers an engineer’s perspective on optimizing the performance of scientific applications on HPC heterogeneous systems, drawing from international HPC competition experience and an internship in NSCC.

June 23, 2025 · 23 min
Optimizing Scientific Applications on HPC Systems

NTU Multidisciplinary Project (MDP)

MDP is a group-based design and development robot car project. This blog aims to provide a starting guide for incoming students who will be taking this course. The information is based on MDP AY24/25 Sem 2.

May 25, 2025 · 17 min
NTU Multidisciplinary Project (MDP)