add matmul#820
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces TileLang DSL templates and comprehensive system tests (ST) for tgemv, tmatmul_acc, and tmatmul_bias operations, while also expanding the existing tmatmul test suite to support various data types and alignment configurations. Feedback on the changes highlights an accidental file copy leftover (tmatmul_template copy.py) that should be renamed. Additionally, two issues were identified in tmatmul/gen_data.py: a logic bug in the string-padding helper function check and an overly restrictive data type check that causes custom floating-point types like bfloat16 to incorrectly fall back to integer generation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| # INCLUDING BUT NOT LIMITED TO NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. | ||
| # See LICENSE in the root of the software repository for the full text of the License. | ||
|
|
||
| """TileLang DSL template for pto.tmatmul.""" |
There was a problem hiding this comment.
| def check(x, n): | ||
| if len(x) < n: | ||
| x = '0' * (n - len(x)) + x | ||
| elif len(x) > n: | ||
| x = x[1:] | ||
| return x |
There was a problem hiding this comment.
The current implementation of check has a logic bug: if len(x) > n, it only slices off the first character (x = x[1:]), which does not guarantee the length is truncated to n if the original length was n + 2 or more. Using x[-n:].zfill(n) is a more robust and correct way to ensure the string is exactly n characters long, padded with zeros on the left if necessary.
def check(x, n):
return x[-n:].zfill(n)| if a_dtype in (np.float16, np.float32): | ||
| a = np.random.uniform(-1.0, 1.0, size=(M, K)).astype(a_dtype) | ||
| b = np.random.uniform(-1.0, 1.0, size=(K, N)).astype(b_dtype) |
There was a problem hiding this comment.
The condition if a_dtype in (np.float16, np.float32): does not match bfloat16 (from ml_dtypes) or other custom floating-point types, causing them to fall back to generating integer values. Using np.issubdtype(a_dtype, np.floating) is a more robust way to detect all floating-point types.
| if a_dtype in (np.float16, np.float32): | |
| a = np.random.uniform(-1.0, 1.0, size=(M, K)).astype(a_dtype) | |
| b = np.random.uniform(-1.0, 1.0, size=(K, N)).astype(b_dtype) | |
| if np.issubdtype(a_dtype, np.floating): | |
| a = np.random.uniform(-1.0, 1.0, size=(M, K)).astype(a_dtype) | |
| b = np.random.uniform(-1.0, 1.0, size=(K, N)).astype(b_dtype) |
Codex Review该评论由 review 机器人自动更新。
SummaryReview failed at stage Findings未生成结构化 findings,因为 review 过程提前失败。 Log Tail |
A3 板测失败
失败用例
|
A3 板测失败详情:PR #820orchestration_example_kernel_add
vector_example_dag_kernel_add_scalar
paged_attention_example_kernel_pv_matmul
paged_attention_example_kernel_init_inplace
vector_example_dag_kernel_add
paged_attention_example_kernel_online_update
paged_attention_example_kernel_softmax_prepare
orchestration_example_kernel_add_scalar
paged_attention_example_kernel_qk_matmul
orchestration_example_kernel_mul
vector_example_dag_kernel_mul
prelu
plan_memory_bind_tile_alias_liveness
plan_memory_peak_exact_capacity
plan_memory_loop_no_reuse_outer_live
plan_memory_if_yield
plan_memory_loop_in_if
plan_memory_peak_8_overlapping
plan_memory_if_in_loop
plan_memory_fragmentation_hole_fit
plan_memory_for_iter_args_yield
plan_memory_no_reuse_overlap
plan_memory_reuse_sequential
plan_memory_nested_loops
plan_memory_fragmentation_two_holes
rems
partition_view_verify_rank_mismatch_valid
partition_view_verify_valid
partition5d_dynamic
partition5d
sparse_attn_test_incore_7
decode_hca_test_incore_54
attention_swa_test_incore_40
decode_swa_test_incore_40
decode_csa_test_incore_81
attention_hca_test_incore_54
attention_csa_test_refresh_incore_81
tensor_view_layout_dn
rope_kv_cache
qwen3_decode_incore_4
post_rmsnorm
qwen3_decode_incore_1
qwen3_decode_incore_10
qwen3_decode_incore_11
rmsnorm
qwen3_decode_incore_6
qwen3_decode_incore_2
qwen3_decode_incore_7
qwen3_decode_incore_5
qwen3_decode_incore_12
test_barrier_sync
matmul
add_double_dynamic
nested_loop_confliect
rar_optimization_test
test_dynamic_valid_shape
test_auto_sync_tail_hint
compensation_test
rem
|
No description provided.