I've already opened a discussion on below issue on the HF model page here: https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/discussions/11
Adding it here as a GitHub issue to ensure visibility, as it might be overlooked in the HF discussions section.
Issue
In the illustration of the model architecture (Figure 3 in the paper), the KDA block includes a component labeled "Kimi Delta Attention". However, the naming seems slightly confusing – it appears inside the KDA block but actually refers to the modified Gated Delta Rule itself.
Could you please clarify what the component in the KDA block is officially called? Is it intended to be “Kimi Delta Attention” or, for example, “Kimi (Gated) Delta Rule”?

Additionally, could you please explain what the different shape symbols (trapezoid and the pair of invertedly stacked trapezoids) represent for the paths of alpha, beta, and the output gate in Figure 3?
Thank you!
I've already opened a discussion on below issue on the HF model page here: https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/discussions/11
Adding it here as a GitHub issue to ensure visibility, as it might be overlooked in the HF discussions section.
Issue
In the illustration of the model architecture (Figure 3 in the paper), the KDA block includes a component labeled "Kimi Delta Attention". However, the naming seems slightly confusing – it appears inside the KDA block but actually refers to the modified Gated Delta Rule itself.
Could you please clarify what the component in the KDA block is officially called? Is it intended to be “Kimi Delta Attention” or, for example, “Kimi (Gated) Delta Rule”?
Additionally, could you please explain what the different shape symbols (trapezoid and the pair of invertedly stacked trapezoids) represent for the paths of alpha, beta, and the output gate in Figure 3?
Thank you!