llama.cpp

shahondin1624/llama.cpp

Fork 0

Files

T

History

shahondin1624 4c66df50ca

CI / build-cmake-pkg (push) Successful in 15m57s

Details

CI / android-arm64 (push) Failing after 15s

Details

CI / ubuntu-latest-rpc (push) Failing after 13s

Details

CI / ubuntu-latest-cuda (push) Failing after 9s

Details

Release / android-arm64 (push) Failing after 34s

Details

Server / server (default) (push) Failing after 13s

Details

Server / server (backend-sampling) (push) Failing after 17s

Details

Server (self-hosted) / server-metal (GPUx2, backend-sampling) (push) Has been cancelled

Details

CI (self-hosted) / Determine tag name (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-webgpu (push) Has been cancelled

Details

CI / macOS-latest-arm64 (push) Has been cancelled

Details

CI / macOS-latest-x64 (push) Has been cancelled

Details

CI / macOS-latest-arm64-webgpu (push) Has been cancelled

Details

CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled

Details

CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (push) Has been cancelled

Details

CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled

Details

CI / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled

Details

CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled

Details

CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (push) Has been cancelled

Details

CI / ubuntu-24-webgpu (push) Has been cancelled

Details

CI / ubuntu-24-webgpu-wasm (push) Has been cancelled

Details

CI / ubuntu-22-hip (push) Has been cancelled

Details

CI / ubuntu-22-musa (push) Has been cancelled

Details

CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (push) Has been cancelled

Details

CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (push) Has been cancelled

Details

CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (push) Has been cancelled

Details

CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (push) Has been cancelled

Details

CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (push) Has been cancelled

Details

CI / windows-2022-cuda (12.4) (push) Has been cancelled

Details

CI / windows-latest-hip (push) Has been cancelled

Details

CI / ubuntu-cpu-riscv64-native (push) Has been cancelled

Details

CI / ggml-ci-x64-cpu-low-perf (push) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-low-perf (push) Has been cancelled

Details

CI / ggml-ci-x64-cpu-high-perf (push) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-high-perf (push) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-high-perf-sve (push) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-kleidiai (push) Has been cancelled

Details

CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (push) Has been cancelled

Details

Code Style Checker / model-naming (push) Has been cancelled

Details

EditorConfig Checker / editorconfig (push) Has been cancelled

Details

HIP quality check / ubuntu-22-hip-quality-check (push) Has been cancelled

Details

Release / macOS-cpu (arm64, arm64, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON, macos-14) (push) Has been cancelled

Details

Release / macOS-cpu (arm64, arm64-kleidiai, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_CPU_KLEIDIAI=ON, macos-14) (push) Has been cancelled

Details

Release / macOS-cpu (x64, x64, -DGGML_METAL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=13.3, macos-15-intel) (push) Has been cancelled

Details

Release / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled

Details

Release / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled

Details

Release / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled

Details

Release / ubuntu-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled

Details

Release / ubuntu-vulkan (x64, ubuntu-22.04) (push) Has been cancelled

Details

Release / ubuntu-24-openvino (push) Has been cancelled

Details

Release / windows-cpu (arm64) (push) Has been cancelled

Details

Release / windows-cpu (x64) (push) Has been cancelled

Details

Release / windows (arm64, opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON, ggml-opencl) (push) Has been cancelled

Details

Release / windows (x64, vulkan, -DGGML_VULKAN=ON, ggml-vulkan) (push) Has been cancelled

Details

Release / windows-cuda (12.4) (push) Has been cancelled

Details

Release / windows-cuda (13.1) (push) Has been cancelled

Details

Release / windows-sycl (push) Has been cancelled

Details

Release / ubuntu-24-sycl (fp16, ON) (push) Has been cancelled

Details

Release / ubuntu-24-sycl (fp32, OFF) (push) Has been cancelled

Details

Release / ubuntu-22-rocm (7.2.1, x64, gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201) (push) Has been cancelled

Details

Release / windows-hip (gfx1150;gfx1151;gfx1200;gfx1201;gfx1100;gfx1101;gfx1102;gfx1030;gfx1031;gfx1032, radeon) (push) Has been cancelled

Details

Release / ios-xcode-build (push) Has been cancelled

Details

Release / openEuler-cann (aarch64, Release, 310p, off) (push) Has been cancelled

Details

Release / openEuler-cann (aarch64, Release, 910b, on) (push) Has been cancelled

Details

Release / openEuler-cann (x86, Release, 310p, off) (push) Has been cancelled

Details

Release / openEuler-cann (x86, Release, 910b, on) (push) Has been cancelled

Details

Server (self-hosted) / server-metal (GPUx2) (push) Has been cancelled

Details

Server (self-hosted) / server-metal (GPUx1) (push) Has been cancelled

Details

Server (self-hosted) / server-metal (GPUx1, backend-sampling) (push) Has been cancelled

Details

Server (self-hosted) / server-kleidiai (CPUx1, kleidiai) (push) Has been cancelled

Details

Server / server-windows (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-cuda (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-mac-metal (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-mac-webgpu (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-mac-vulkan (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-linux-intel-vulkan (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-win-intel-vulkan (push) Has been cancelled

Details

CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (push) Has been cancelled

Details

Release / release (push) Has been cancelled

Details

Release / ui-publish (push) Has been cancelled

Details

hip: fix HIP graph capture crash for FA quantized KV f16 dequant

The HIP branch in launch_fattn used raw hipMalloc / hipFree /
hipStreamSynchronize(main_stream) for the K/V f16 dequant temp buffers
(introduced to avoid pool retention OOM). These three calls are illegal
during HIP graph capture and abort cudaStreamEndCapture with
hipErrorStreamCaptureUnsupported, manifesting as the "ROCm error" at
ggml-cuda.cu:104 when running models like Qwen3.6-27B-Dense and
Qwen3.6-35B-A3B-Q8 with -fa 1 on gfx1151. Workaround was
GGML_CUDA_DISABLE_GRAPHS=1.

Probe cudaStreamIsCapturing on entry; when a capture is in progress
use ggml_cuda_pool_alloc<half> (legal in capture). Outside capture,
behavior is unchanged so the OOM-avoidance the raw-alloc branch was
added for is preserved.

Also: ggml_cuda_error wrote only via GGML_LOG_ERROR, which llama-bench
silences with llama_null_log_callback, so the actual hipError was
invisible. Mirror the message to stderr with fflush so failures stay
diagnosable from bench. Expand the inline CUDA_CHECK around
cudaStreamEndCapture / cudaGraphInstantiate / cudaGraphLaunch to print
which graph step failed plus the cgraph's first/last op for context.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-27 08:08:12 +02:00

cmake

ggml: backend-agnostic tensor parallelism (experimental) (#19378 )

2026-04-09 16:42:19 +02:00

include

turboquant: squash-merge TheTom/llama-cpp-turboquant feature/turboquant-kv-cache

2026-05-19 15:13:49 +02:00

src

hip: fix HIP graph capture crash for FA quantized KV f16 dequant

2026-05-27 08:08:12 +02:00

.gitignore

vulkan : cmake integration (#8119 )

2024-07-13 18:12:39 +02:00

CMakeLists.txt

ggml : bump version to 0.12.0 (ggml/1494)

2026-05-16 16:11:29 +03:00