kernel-optimization

Star

Here are 5 public repositories matching this topic...

RightNow-AI / autokernel

Star

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

gpu cuda pytorch triton kernel-optimization autoresearch

Updated Mar 13, 2026
Python

WecoAI / weco-cli

Star

The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.

machine-learning code-generation code-optimization prompt-engineering kernel-optimization

Updated Mar 17, 2026
Python

SUNMMIO / Tilelang

Star

Extended TileLang as a unified DSL to enable high-performance kernel development for Near-Memory Computing, Distributed Memory AI Accelerators, and Networked Accelerators.

scale-up scale-out near-memory-compute kernel-optimization distributed-memory-accelator

Updated Mar 16, 2026
Python

ssmall256 / mps-kernels-skill

Star

Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).

python machine-learning deep-learning metal gpu pytorch mps apple-silicon kernel-optimization metal-shading-language pytorch-mps

Updated Feb 16, 2026
Python

LessUp / llm-speed

Star

CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core GEMM with pybind11 Bindings | LLM 推理加速 CUDA Kernel 库：FlashAttention、HGEMM、Tensor Core GEMM，含 pybind11 Python 绑定

deep-learning cuda attention gpu-computing gemm pybind11 llm kernel-optimization

Updated Mar 13, 2026
Python

Improve this page

Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel-optimization

Here are 5 public repositories matching this topic...

RightNow-AI / autokernel

WecoAI / weco-cli

SUNMMIO / Tilelang

ssmall256 / mps-kernels-skill

LessUp / llm-speed

Improve this page

Add this topic to your repo