SISA: A Scale-In Systolic Array for GEMM Acceleration

📰 ArXiv cs.AI

SISA is a novel systolic array architecture for accelerating General Matrix-Matrix Multiplication (GEMM) operations in AI/ML workloads

advanced Published 1 Apr 2026

Action Steps

Understand the limitations of traditional square Systolic Arrays (SAs) for GEMM operations in LLMs
Design a scale-in systolic array architecture that can efficiently handle input-dependent and highly skewed matrices
Implement SISA using Processing Elements (PEs) and evaluate its performance on various AI/ML workloads
Optimize SISA for specific use cases, such as LLMs and DNNs, to maximize its acceleration benefits

Who Needs to Know This

AI engineers and researchers working on Large Language Models (LLMs) and Deep Neural Networks (DNNs) can benefit from SISA's efficient GEMM acceleration, enabling them to improve model performance and reduce computational costs

Key Insight

💡 SISA's scale-in design enables more efficient execution of GEMM operations in LLMs and DNNs, leading to improved model performance and reduced computational costs