* spec: support MTP * fix batch size * rename files * cont : simplify (#7) * MTP: clean-up (#9) * MTP: clean-up * review: use llama_context_type instead of llama_graph_type * review: remove llama_model_has_mtp * review: fix convert issues * convert: fix pycheck * review: formatting * use `mtp-` for identifying mtp models * convert: fix mtp conversion * mtp -> draft-mtp * remove unused llama_arch * add need_embd in speculative * llama: allow partial seq_rm for GDN models for speculative decoding Currently speculative checkpoint needs to restart from a checkpoint after some draft tokens are not accepted, this leads to some wastage in running the target again. This PR adds the ability to rollback upto `draft_max` by storing the GDN intermediates. * fix pending state * vulkan: add GDN partial rollback * meta: extend check to axis 1 * metal: add GDN partial rollback Extend the gated delta net kernel to store intermediate states for partial rollback support on the Metal backend. - Add K (snapshot slot count) as a function constant - Read input state from slot 0 of the 3D state tensor - Write intermediate states to different slots during token loop - For K=1, maintain backward-compatible single-slot behavior Ref: https://github.com/ggml-org/llama.cpp/commit/8c05923630110223669f069af2000e9cf10c02bc Assisted-by: llama.cpp:local pi * delta_net_base: use ggml_pad instead of new_tensor * review: add need_rs_seq * review: rename part_bounded to n_rs * review: deslop comments * review: rename, add asserts * server : adjust checkpoint logic (#11) * server : adjust checkpoint logic * cont : rm asserts * server-context: fix early exit * spec : fix compatibility with n-gram and add TODOs (#13) * metal : cleanup * llama : fix faulty bitwise check in recurrent memory * server : disable RS-based MTP in combination with other spec types * spec : add TODOs * cont : fix comment * cont : update comment * common : fix logic for ngram + mtp compat * llama-memory: enable checkpointing with partial rollback * cont: add test-case for loading into a dirty ctx * llama-memory-recurrent: clear rs_idx in clear * download: fix mtp path * llama-arch: fix enorm op * docs: update docs * conversion: fix type annotations --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
gguf
This is a Python package for writing binary files in the GGUF (GGML Universal File) format.
See convert_hf_to_gguf.py as an example for its usage.
Installation
pip install gguf
Optionally, you can install gguf with the extra 'gui' to enable the visual GGUF editor.
pip install gguf[gui]
API Examples/Simple Tools
examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.
examples/reader.py — Extracts and displays key-value pairs and tensor details from a GGUF file in a readable format.
gguf/scripts/gguf_dump.py — Dumps a GGUF file's metadata to the console.
gguf/scripts/gguf_set_metadata.py — Allows changing simple metadata values in a GGUF file by key.
gguf/scripts/gguf_convert_endian.py — Allows converting the endianness of GGUF files.
gguf/scripts/gguf_new_metadata.py — Copies a GGUF file with added/modified/removed metadata values.
gguf/scripts/gguf_editor_gui.py — Allows for viewing, editing, adding, or removing metadata values within a GGUF file as well as viewing its tensors with a Qt interface.
Development
Maintainers who participate in development of this package are advised to install it in editable mode:
cd /path/to/llama.cpp/gguf-py
pip install --editable .
Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py.
In this case, upgrade Pip to the latest:
pip install --upgrade pip
Automatic publishing with CI
There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.
- Bump the version in
pyproject.toml. - Create a tag named
gguf-vx.x.xwherex.x.xis the semantic version number.
git tag -a gguf-v1.0.0 -m "Version 1.0 release"
- Push the tags.
git push origin --tags
Manual publishing
If you want to publish the package manually for any reason, you need to have twine and build installed:
pip install build twine
Then, follow these steps to release a new version:
- Bump the version in
pyproject.toml. - Build the package:
python -m build
- Upload the generated distribution archives:
python -m twine upload dist/*
Run Unit Tests
From root of this repository you can run this command to run all the unit tests
python -m unittest discover ./gguf-py -v
TODO
- Include conversion scripts as command line entry points in this package.