Category: 未分类
-
Aug 2025: Kernel Pilot V1.0 is online for free use.
Kernel Pilot V1.0, an automatic, fast, and efficient CUDA code generation tool, is now available for free use! Try it!
-
Aug 2025: Kernel Pilot V1.0 Beats TensorRT in many cases.
Results indicate that FSR-generated CUDA kernels significantly accelerate inference workloads across a wide range of TensorRT layers. In particular, several tasks exhibit notably high speedup. For example, on (Normalization Layer), (Ragged Softmax Layer), and (Reduce Layer), our method achieves speedups of 38.7$\times$, 17.2$\times$, and 20.6$\times$ respectively, compared to the manually hand-written baseline. Additionally, (Convolution Layer)…
-
July 2025: Kernel Pilot website is public.
The Kernel Pilot website is setup and the tools are public for free use. If you have any suggestions or comments please send us an email. We will keep improving the Kernel Pilot.
-
June 2025: CUDA-LLM Paper is available on arXiv.
Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation. However, generating the code which is deeply hardware-specific, architecture-aware, and performance-critical, especially for massively parallel GPUs, remains a complex challenge. In this work, we explore the use of LLMs for the automated generation and optimization of CUDA programs, with the goal of producing…