Skip to content

feat: add 50 SGLang operator baselines and accuracy tests#35

Open
factnn wants to merge 1 commit into
mainfrom
dev/sglang
Open

feat: add 50 SGLang operator baselines and accuracy tests#35
factnn wants to merge 1 commit into
mainfrom
dev/sglang

Conversation

@factnn

@factnn factnn commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

50 SGLang operators added to KernelGenBench as a new sglang namespace.

Structure

  • src/kernelgenbench/dataset/baseline/sglang/ — 50 thin wrapper baselines (follows vLLM pattern)
  • src/kernelgenbench/accuracy/sglang/ — 50 accuracy + speedup test files
  • scripts/test_sglang_baselines.py — batch verification script

Operators by SGLang source module

Module Count Operators
layers/activation.py 5 silu_and_mul, gelu_and_mul, quick_gelu, new_gelu, xielu
layers/layernorm.py 6 rms_norm, layer_norm, gemma_rms_norm, gemma3_rms_norm, gemma4_rms_norm, rms_norm_without_scale
layers/rotary_embedding/ 12 rotary_embedding, mrotary_embedding, dual_chunk_rope, deepseek_scaling_rope, llama3_rope, dynamic_ntk_rope, linear_scaling_rope, phi3_long_rope, triton_mrope_fused, triton_ernie45_rope, apply_interleaved_rope_triton, dynamic_ntk_alpha_rope
layers/moe/ 3 fused_moe, topk, moe_align_block_size
layers/attention/fla/ 8 l2norm, rms_norm_gated, fused_recurrent_gated_delta_rule + variants, fused_gdn_gating, layer_norm_gated_fwd
layers/attention/mamba/ 5 causal_conv1d_fn, causal_conv1d_update, selective_scan_update, mamba_chunk_scan_combined_fwd, mixer2_rms_norm_gated
layers/elementwise.py 6 fused_dual_residual_rmsnorm, softcap, silu_and_mul_triton, gelu_and_mul_triton, fused_rmsnorm, experts_combine_triton
layers/gemma4_fused_ops.py 2 gemma_rmsnorm_residual_scalar, gemma_qkv_rmsnorm
layers/conv.py 2 conv2d_layer, conv3d_layer
layers/quantization/ 1 per_token_quant_int8

Test plan

  • Run python scripts/test_sglang_baselines.py on a machine with sglang installed and correct CUDA driver
  • Verify all 50 baselines import and call correctly
  • Run accuracy tests for simple operators (activation/norm) as smoke test

Note

All signatures verified against SGLang source code. This server cannot run GPU tests (CUDA driver version incompatibility with sgl_kernel).

50 SGLang operators organized by SGLang source modules:
- layers/activation.py (5): silu_and_mul, gelu_and_mul, quick_gelu, new_gelu, xielu
- layers/layernorm.py (6): rms_norm, layer_norm, gemma_rms_norm, gemma3_rms_norm, gemma4_rms_norm, rms_norm_without_scale
- layers/rotary_embedding/ (12): rotary_embedding, mrotary_embedding, dual_chunk, deepseek_scaling, llama3, dynamic_ntk_scaling, linear_scaling, phi3_long_rope, triton_mrope_fused, triton_ernie45_rope_fused, apply_interleaved_rope_triton, dynamic_ntk_alpha
- layers/moe/ (3): fused_moe, topk, moe_align_block_size
- layers/attention/fla/ (8): l2norm, rms_norm_gated, fused_recurrent_gated_delta_rule, fused_recurrent_gated_delta_rule_update, fused_sigmoid_gating_delta_rule_update, fused_sigmoid_gating_delta_rule_packed_decode, fused_gdn_gating, layer_norm_gated_fwd
- layers/attention/mamba/ (5): causal_conv1d_fn, causal_conv1d_update, selective_scan_update, mamba_chunk_scan_combined_fwd, mixer2_rms_norm_gated
- layers/elementwise.py (6): fused_dual_residual_rmsnorm, softcap, silu_and_mul_triton, gelu_and_mul_triton, fused_rmsnorm, experts_combine_triton
- layers/gemma4_fused_ops.py (2): gemma_rmsnorm_residual_scalar, gemma_qkv_rmsnorm
- layers/conv.py (2): conv2d_layer, conv3d_layer
- layers/quantization/ (1): per_token_quant_int8

Each operator has:
- Thin wrapper baseline (follows vLLM pattern)
- Accuracy + speedup test (follows vLLM test pattern)
- Signatures verified against SGLang source

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant