Skip to content

wzbxpy/codec

Repository files navigation

CoDec

This is the code for CoDec: Prefix-Shared Decoding Kernel for LLMs

Environment

CUDA Toolkit 12.9

For CoDec On Ascend, the required hardware and software environment dependencies for this project are as follows:

OS CANN gcc cmake python
Ubuntu 20.04.5 8.5.0 9.3 3.16 3.10
Ubuntu 22.04.5 8.5.0 11.3 3.22 3.10
openEuler 22.03 SP4 8.5.0 10.3 3.22 3.10

Installation

uv pip install torch
uv pip install -Ue . --no-build-isolation

For Codec On Ascend, run the following build command in the project directory:

  1. Install the Community Edition CANN toolkit package

Based on the category of Ascend product you are using, download the corresponding CANN toolkit package Ascend-cann-toolkit_{version}_linux-{arch}.run. See CANN toolkit for the download link.

Then install the CANN toolkit package (for details, refer to the CANN Installation Guide).

# Ensure the installer has executable permission
chmod +x Ascend-cann-toolkit_{version}_linux-{arch}.run
# Install the CANN toolkit package
./Ascend-cann-toolkit_{version}_linux-{arch}.run --full --force --install-path=${install_path}
# Enable the CANN environment. For default path installation, taking root user as an example
# (for non-root users, replace /usr/local with ${install_path})
source /usr/local/Ascend/ascend_toolkit/set_env.sh
  • {version}: CANN package version.
  • {arch}: System architecture.
  • {install_path}: Specified installation path, default is /usr/local/Ascend.
  1. Download and install dependencies

Download the source code of this project, and execute the following commands in the project directory.

# Download project source code
git clone https://github.com/wzbxpy/codec.git
# Install the Python environment dependencies according to the requirements file.
# Build the specified example
cd catlass-faInfer-shared-prefix
bash scripts/build.sh flash_attention_infer_tla

If the following message appears, the build is successful.

"[INFO] Target "{flash_attention_infer_tla}" built successfully."
  1. Run the operator

We have prepared the script for running and testing:

bash examples/flash_attention_infer_tla/run.sh

You can modify the following parameters in the script:

batch, qSeqlen, kvSeqlen, numHeads, kvHeads, headSize, dtype="bf16"(or "half"), device, accCheck

Evaluation

# kernel evaluation
scripts/kernel.sh

# end to end evaluation
scripts/e2e.sh

About

CoDec: Prefix-Shared Decoding Kernel for LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages