Files
shahondin1624 ddebb5ddf6 turboquant: squash-merge TheTom/llama-cpp-turboquant feature/turboquant-kv-cache
Squashes the entire TurboQuant KV-cache feature branch from
https://github.com/TheTom/llama-cpp-turboquant (tip 5aeb2fdbe) onto our master.

Includes: TurboQuant KV-cache types (turbo2_0, turbo3_0, turbo4_0, tq3_1s,
tq4_1s), GGML_OP_TURBO_WHT op, CUDA + Metal kernels (including TQ-rotated
mul_mm path), CPU reference paths, HIP template instances, perplexity tooling,
and 18 post-upstream-sync fixes (CVE-2026-21869 server clamp, HIP FA pool
retention, n_head_v reshape, sparse-V CUDA gating, etc.).

Conflict-resolution notes (review carefully before depending on these paths):

- common/arg.cpp, common/speculative.cpp: master's refactored speculative API
  kept (params.speculative.types / ngram_mod struct, per-sinfo n_low/i_last).

- ggml-cuda/fattn.cu: head-size exclusion lists unioned (now exclude both 192
  and 640 alongside other sizes).

- ggml-cuda/ggml-cuda.cu: both master's ADD/SUB/MUL/DIV F16 widening AND
  TurboQuant's GGML_OP_TURBO_WHT support cases kept.

- ggml-metal-device.h/.cpp: master's new get_pipeline_mul_mv_ext signature
  (const ggml_tensor * op) kept; TurboQuant's get_pipeline_turbo_wht added.

- ggml-metal-ops.cpp: TurboQuant's TQ-rotated mul_mm path preserved; non-TQ
  else-branch adapted to master's pipeline.nr0/nr1/nsg dispatch API.

- ggml-vulkan.cpp: master's spec-constant-driven flash_attn pipeline iteration
  taken (over TurboQuant's CREATE_FA-per-type macro approach). TURBO3_0 added
  to the fa_kv_ok lambda for type validation.

- ggml-vulkan/flash_attn_base.glsl, vulkan-shaders-gen.cpp: master's new
  spec-constant FA shader generation kept; TurboQuant's DATA_A_TURBO3_0 macro
  path NOT carried over. *** Vulkan TURBO3_0 flash-attention paths need
  re-implementation against the new spec-constant API. *** Vulkan TURBO3_0
  inference will likely fail until that work is redone.

Squash base: 7fc1c4ef78 (TheTom's last upstream merge point).
2026-05-19 15:13:49 +02:00
..
2026-05-19 15:32:58 +03:00