Skip to content

Model Architecture Naming: KDA #7

@d-kleine

Description

@d-kleine

I've already opened a discussion on below issue on the HF model page here: https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/discussions/11

Adding it here as a GitHub issue to ensure visibility, as it might be overlooked in the HF discussions section.

Issue

In the illustration of the model architecture (Figure 3 in the paper), the KDA block includes a component labeled "Kimi Delta Attention". However, the naming seems slightly confusing – it appears inside the KDA block but actually refers to the modified Gated Delta Rule itself.

Could you please clarify what the component in the KDA block is officially called? Is it intended to be “Kimi Delta Attention” or, for example, “Kimi (Gated) Delta Rule”?

KDA

Additionally, could you please explain what the different shape symbols (trapezoid and the pair of invertedly stacked trapezoids) represent for the paths of alpha, beta, and the output gate in Figure 3?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions