llama.cpp

Author	SHA1	Message	Date
shahondin1624	4c66df50ca	hip: fix HIP graph capture crash for FA quantized KV f16 dequant CI / build-cmake-pkg (push) Successful in 15m57s Details CI / android-arm64 (push) Failing after 15s Details CI / ubuntu-latest-rpc (push) Failing after 13s Details CI / ubuntu-latest-cuda (push) Failing after 9s Details Release / android-arm64 (push) Failing after 34s Details Server / server (default) (push) Failing after 13s Details Server / server (backend-sampling) (push) Failing after 17s Details Server (self-hosted) / server-metal (GPUx2, backend-sampling) (push) Has been cancelled Details CI (self-hosted) / Determine tag name (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-webgpu (push) Has been cancelled Details CI / macOS-latest-arm64 (push) Has been cancelled Details CI / macOS-latest-x64 (push) Has been cancelled Details CI / macOS-latest-arm64-webgpu (push) Has been cancelled Details CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (push) Has been cancelled Details CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled Details CI / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled Details CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (push) Has been cancelled Details CI / ubuntu-24-webgpu (push) Has been cancelled Details CI / ubuntu-24-webgpu-wasm (push) Has been cancelled Details CI / ubuntu-22-hip (push) Has been cancelled Details CI / ubuntu-22-musa (push) Has been cancelled Details CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (push) Has been cancelled Details CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (push) Has been cancelled Details CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (push) Has been cancelled Details CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (push) Has been cancelled Details CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (push) Has been cancelled Details CI / windows-2022-cuda (12.4) (push) Has been cancelled Details CI / windows-latest-hip (push) Has been cancelled Details CI / ubuntu-cpu-riscv64-native (push) Has been cancelled Details CI / ggml-ci-x64-cpu-low-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-low-perf (push) Has been cancelled Details CI / ggml-ci-x64-cpu-high-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-high-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-high-perf-sve (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-kleidiai (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (push) Has been cancelled Details Code Style Checker / model-naming (push) Has been cancelled Details EditorConfig Checker / editorconfig (push) Has been cancelled Details HIP quality check / ubuntu-22-hip-quality-check (push) Has been cancelled Details Release / macOS-cpu (arm64, arm64, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON, macos-14) (push) Has been cancelled Details Release / macOS-cpu (arm64, arm64-kleidiai, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_CPU_KLEIDIAI=ON, macos-14) (push) Has been cancelled Details Release / macOS-cpu (x64, x64, -DGGML_METAL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=13.3, macos-15-intel) (push) Has been cancelled Details Release / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details Release / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled Details Release / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled Details Release / ubuntu-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details Release / ubuntu-vulkan (x64, ubuntu-22.04) (push) Has been cancelled Details Release / ubuntu-24-openvino (push) Has been cancelled Details Release / windows-cpu (arm64) (push) Has been cancelled Details Release / windows-cpu (x64) (push) Has been cancelled Details Release / windows (arm64, opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON, ggml-opencl) (push) Has been cancelled Details Release / windows (x64, vulkan, -DGGML_VULKAN=ON, ggml-vulkan) (push) Has been cancelled Details Release / windows-cuda (12.4) (push) Has been cancelled Details Release / windows-cuda (13.1) (push) Has been cancelled Details Release / windows-sycl (push) Has been cancelled Details Release / ubuntu-24-sycl (fp16, ON) (push) Has been cancelled Details Release / ubuntu-24-sycl (fp32, OFF) (push) Has been cancelled Details Release / ubuntu-22-rocm (7.2.1, x64, gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201) (push) Has been cancelled Details Release / windows-hip (gfx1150;gfx1151;gfx1200;gfx1201;gfx1100;gfx1101;gfx1102;gfx1030;gfx1031;gfx1032, radeon) (push) Has been cancelled Details Release / ios-xcode-build (push) Has been cancelled Details Release / openEuler-cann (aarch64, Release, 310p, off) (push) Has been cancelled Details Release / openEuler-cann (aarch64, Release, 910b, on) (push) Has been cancelled Details Release / openEuler-cann (x86, Release, 310p, off) (push) Has been cancelled Details Release / openEuler-cann (x86, Release, 910b, on) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx2) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx1) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx1, backend-sampling) (push) Has been cancelled Details Server (self-hosted) / server-kleidiai (CPUx1, kleidiai) (push) Has been cancelled Details Server / server-windows (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-cuda (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-metal (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-webgpu (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-linux-intel-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-win-intel-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / ui-publish (push) Has been cancelled Details The HIP branch in launch_fattn used raw hipMalloc / hipFree / hipStreamSynchronize(main_stream) for the K/V f16 dequant temp buffers (introduced to avoid pool retention OOM). These three calls are illegal during HIP graph capture and abort cudaStreamEndCapture with hipErrorStreamCaptureUnsupported, manifesting as the "ROCm error" at ggml-cuda.cu:104 when running models like Qwen3.6-27B-Dense and Qwen3.6-35B-A3B-Q8 with -fa 1 on gfx1151. Workaround was GGML_CUDA_DISABLE_GRAPHS=1. Probe cudaStreamIsCapturing on entry; when a capture is in progress use ggml_cuda_pool_alloc<half> (legal in capture). Outside capture, behavior is unchanged so the OOM-avoidance the raw-alloc branch was added for is preserved. Also: ggml_cuda_error wrote only via GGML_LOG_ERROR, which llama-bench silences with llama_null_log_callback, so the actual hipError was invisible. Mirror the message to stderr with fflush so failures stay diagnosable from bench. Expand the inline CUDA_CHECK around cudaStreamEndCapture / cudaGraphInstantiate / cudaGraphLaunch to print which graph step failed plus the cgraph's first/last op for context. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-27 08:08:12 +02:00
Copilot	a581eead32	hip: skip unsupported RDNA WMMA flash-attention cases CI / build-cmake-pkg (push) Successful in 17m32s Details CI / android-arm64 (push) Failing after 12s Details CI / ubuntu-latest-rpc (push) Failing after 12s Details CI / ubuntu-latest-cuda (push) Failing after 6s Details Release / android-arm64 (push) Failing after 30s Details Server / server (default) (push) Failing after 5s Details Server / server (backend-sampling) (push) Failing after 5s Details CI / ubuntu-22-hip (push) Has been cancelled Details HIP quality check / ubuntu-22-hip-quality-check (push) Has been cancelled Details CI (self-hosted) / Determine tag name (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-webgpu (push) Has been cancelled Details CI / macOS-latest-arm64-webgpu (push) Has been cancelled Details CI / macOS-latest-arm64 (push) Has been cancelled Details CI / macOS-latest-x64 (push) Has been cancelled Details CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (push) Has been cancelled Details CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled Details CI / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled Details CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (push) Has been cancelled Details CI / ubuntu-24-webgpu (push) Has been cancelled Details CI / ubuntu-24-webgpu-wasm (push) Has been cancelled Details CI / ubuntu-22-musa (push) Has been cancelled Details CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (push) Has been cancelled Details CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (push) Has been cancelled Details CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (push) Has been cancelled Details CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (push) Has been cancelled Details CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (push) Has been cancelled Details CI / windows-2022-cuda (12.4) (push) Has been cancelled Details CI / windows-latest-hip (push) Has been cancelled Details CI / ubuntu-cpu-riscv64-native (push) Has been cancelled Details CI / ggml-ci-x64-cpu-low-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-low-perf (push) Has been cancelled Details CI / ggml-ci-x64-cpu-high-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-high-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-high-perf-sve (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-kleidiai (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (push) Has been cancelled Details Code Style Checker / model-naming (push) Has been cancelled Details EditorConfig Checker / editorconfig (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx2) (push) Has been cancelled Details Release / macOS-cpu (arm64, arm64, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON, macos-14) (push) Has been cancelled Details Release / macOS-cpu (arm64, arm64-kleidiai, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_CPU_KLEIDIAI=ON, macos-14) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx1) (push) Has been cancelled Details Release / macOS-cpu (x64, x64, -DGGML_METAL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=13.3, macos-15-intel) (push) Has been cancelled Details Release / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details Release / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled Details Release / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled Details Release / ubuntu-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details Release / ubuntu-vulkan (x64, ubuntu-22.04) (push) Has been cancelled Details Release / ubuntu-24-openvino (push) Has been cancelled Details Release / windows-cpu (arm64) (push) Has been cancelled Details Release / windows-cpu (x64) (push) Has been cancelled Details Release / windows (arm64, opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON, ggml-opencl) (push) Has been cancelled Details Release / windows (x64, vulkan, -DGGML_VULKAN=ON, ggml-vulkan) (push) Has been cancelled Details Release / windows-cuda (12.4) (push) Has been cancelled Details Release / windows-cuda (13.1) (push) Has been cancelled Details Release / windows-sycl (push) Has been cancelled Details Release / ubuntu-24-sycl (fp16, ON) (push) Has been cancelled Details Release / ubuntu-24-sycl (fp32, OFF) (push) Has been cancelled Details Release / ubuntu-22-rocm (7.2.1, x64, gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201) (push) Has been cancelled Details Release / windows-hip (gfx1150;gfx1151;gfx1200;gfx1201;gfx1100;gfx1101;gfx1102;gfx1030;gfx1031;gfx1032, radeon) (push) Has been cancelled Details Release / ios-xcode-build (push) Has been cancelled Details Release / openEuler-cann (aarch64, Release, 310p, off) (push) Has been cancelled Details Release / openEuler-cann (aarch64, Release, 910b, on) (push) Has been cancelled Details Release / openEuler-cann (x86, Release, 310p, off) (push) Has been cancelled Details Release / openEuler-cann (x86, Release, 910b, on) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx2, backend-sampling) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx1, backend-sampling) (push) Has been cancelled Details Server (self-hosted) / server-kleidiai (CPUx1, kleidiai) (push) Has been cancelled Details Server / server-windows (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-cuda (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-metal (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-webgpu (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-linux-intel-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-win-intel-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / ui-publish (push) Has been cancelled Details Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-19 23:42:54 +02:00
shahondin1624	2907ee9830	turboquant: post-merge integration fixes from test validation CI (sycl) / ubuntu-24-sycl (fp16, ON) (push) Has been cancelled Details CI (sycl) / ubuntu-24-sycl (fp32, OFF) (push) Has been cancelled Details CI (sycl) / windows-latest-sycl (push) Has been cancelled Details CI (virtgpu) / ubuntu-24-virtgpu (push) Has been cancelled Details Check vendor / check-vendor (push) Has been cancelled Details CI (vulkan) / ubuntu-24-vulkan-llvmpipe (push) Has been cancelled Details CI (3rd-party) / ubuntu-24-llguidance (push) Has been cancelled Details CI (apple) / macOS-latest-ios (push) Has been cancelled Details CI (apple) / macos-latest-ios-xcode (push) Has been cancelled Details CI (apple) / macOS-latest-tvos (push) Has been cancelled Details CI (apple) / macOS-latest-visionos (push) Has been cancelled Details CI (cann) / openEuler-latest-cann (aarch64, Release, 310p, off) (push) Has been cancelled Details CI (cann) / openEuler-latest-cann (aarch64, Release, 910b, off) (push) Has been cancelled Details CI (cann) / openEuler-latest-cann (aarch64, Release, 910b, on) (push) Has been cancelled Details CI (cann) / openEuler-latest-cann (x86, Release, 310p, off) (push) Has been cancelled Details CI (cann) / openEuler-latest-cann (x86, Release, 910b, off) (push) Has been cancelled Details CI (cann) / openEuler-latest-cann (x86, Release, 910b, on) (push) Has been cancelled Details CI (cross) / debian-13-loongarch64-cpu-cross (push) Has been cancelled Details CI (cross) / debian-13-loongarch64-vulkan-cross (push) Has been cancelled Details CI (cross) / ubuntu-24-riscv64-cpu-spacemit-ime-cross (push) Has been cancelled Details CI (openvino) / ubuntu-24-openvino-CPU (push) Has been cancelled Details CI (openvino) / ubuntu-24-openvino-GPU (push) Has been cancelled Details CI (riscv) / ubuntu-riscv64-native-sanitizer (Debug, ADDRESS) (push) Has been cancelled Details CI (riscv) / ubuntu-riscv64-native-sanitizer (Debug, THREAD) (push) Has been cancelled Details CI (riscv) / ubuntu-riscv64-native-sanitizer (Debug, UNDEFINED) (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details flake8 Lint / Lint (push) Has been cancelled Details CI (apple) / macOS-latest-swift (generic/platform=iOS) (push) Has been cancelled Details CI (apple) / macOS-latest-swift (generic/platform=macOS) (push) Has been cancelled Details CI (apple) / macOS-latest-swift (generic/platform=tvOS) (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details CI (snapdragon) / android-ndk-snapdragon (push) Failing after 2m33s Details CI (android) / android (push) Failing after 4m51s Details CI (android) / android-ndk (push) Failing after 4s Details CI (sanitize) / ubuntu-latest-sanitizer (Debug, ADDRESS) (push) Failing after 14s Details CI (sanitize) / ubuntu-latest-sanitizer (Debug, THREAD) (push) Failing after 8s Details CI (sanitize) / ubuntu-latest-sanitizer (Debug, UNDEFINED) (push) Failing after 9s Details CI (UI) / Build static output (push) Failing after 8m40s Details CI (UI) / UI Checks (push) Has been skipped Details CI (UI) / E2E Tests (push) Has been skipped Details CI (snapdragon) / linux-iot-snapdragon (push) Failing after 3m10s Details CI (snapdragon) / Test on QDC Device (QCS9075M) (push) Has been skipped Details CI (snapdragon) / Test on QDC Device (SM8750) (push) Has been skipped Details CI (snapdragon) / Test on QDC Device (SM8850) (push) Has been skipped Details CI / build-cmake-pkg (push) Successful in 15m28s Details CI / android-arm64 (push) Failing after 10s Details CI / ubuntu-latest-rpc (push) Failing after 8s Details CI / ubuntu-latest-cuda (push) Failing after 4m22s Details Release / android-arm64 (push) Failing after 1m10s Details Server (sanitize) / server (RelWithDebInfo, ADDRESS) (push) Failing after 32s Details Server (sanitize) / server (RelWithDebInfo, UNDEFINED) (push) Failing after 4s Details Server / server (default) (push) Failing after 5s Details Server / server (backend-sampling) (push) Failing after 4s Details CI (self-hosted) / ggml-ci-intel-openvino-gpu-low-perf (push) Has been cancelled Details CI (self-hosted) / Determine tag name (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-cuda (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-vulkan-cm2 (push) Has been cancelled Details CI (self-hosted) / ggml-ci-nvidia-webgpu (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-metal (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-webgpu (push) Has been cancelled Details CI (self-hosted) / ggml-ci-mac-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-linux-intel-vulkan (push) Has been cancelled Details CI (self-hosted) / ggml-ci-win-intel-vulkan (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-kleidiai-graviton4 (push) Has been cancelled Details CI / macOS-latest-arm64 (push) Has been cancelled Details CI / macOS-latest-x64 (push) Has been cancelled Details CI / macOS-latest-arm64-webgpu (push) Has been cancelled Details CI / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details CI / ubuntu-cpu (ppc64le, ubuntu-24.04-ppc64le) (push) Has been cancelled Details CI / ubuntu-24-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details CI / ubuntu-24-vulkan (x64, ubuntu-24.04) (push) Has been cancelled Details CI / windows-latest (x64, openblas-x64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DG… (push) Has been cancelled Details CI / windows-latest (x64, vulkan-x64, -DCMAKE_BUILD_TYPE=Release -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_VULKAN=ON) (push) Has been cancelled Details CI / windows-2022-cuda (12.4) (push) Has been cancelled Details CI / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled Details CI / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled Details CI / ubuntu-24-webgpu (push) Has been cancelled Details CI / ubuntu-24-webgpu-wasm (push) Has been cancelled Details CI / ubuntu-22-hip (push) Has been cancelled Details CI / ubuntu-22-musa (push) Has been cancelled Details CI / windows-latest (arm64, llvm-arm64, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON) (push) Has been cancelled Details Release / ubuntu-22-rocm (7.2.1, x64, gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1151;gfx1150;gfx1200;gfx1201) (push) Has been cancelled Details CI / windows-latest (arm64, llvm-arm64-opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON) (push) Has been cancelled Details CI / windows-latest (x64, cpu-x64 (static), -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF) (push) Has been cancelled Details CI / windows-latest-hip (push) Has been cancelled Details CI / ubuntu-cpu-riscv64-native (push) Has been cancelled Details CI / ggml-ci-x64-cpu-low-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-low-perf (push) Has been cancelled Details CI / ggml-ci-x64-cpu-high-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-high-perf (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-high-perf-sve (push) Has been cancelled Details CI / ggml-ci-arm64-cpu-kleidiai (push) Has been cancelled Details Code Style Checker / model-naming (push) Has been cancelled Details EditorConfig Checker / editorconfig (push) Has been cancelled Details HIP quality check / ubuntu-22-hip-quality-check (push) Has been cancelled Details Release / macOS-cpu (arm64, arm64-kleidiai, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON -DGGML_CPU_KLEIDIAI=ON, macos-14) (push) Has been cancelled Details Release / macOS-cpu (x64, x64, -DGGML_METAL=OFF -DCMAKE_OSX_DEPLOYMENT_TARGET=13.3, macos-15-intel) (push) Has been cancelled Details Release / ubuntu-24-sycl (fp16, ON) (push) Has been cancelled Details Release / ubuntu-24-sycl (fp32, OFF) (push) Has been cancelled Details Release / windows-hip (gfx1150;gfx1151;gfx1200;gfx1201;gfx1100;gfx1101;gfx1102;gfx1030;gfx1031;gfx1032, radeon) (push) Has been cancelled Details Release / macOS-cpu (arm64, arm64, -DGGML_METAL_USE_BF16=ON -DGGML_METAL_EMBED_LIBRARY=ON, macos-14) (push) Has been cancelled Details Release / ubuntu-cpu (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details Release / ubuntu-cpu (s390x, ubuntu-24.04-s390x) (push) Has been cancelled Details Release / ubuntu-cpu (x64, ubuntu-22.04) (push) Has been cancelled Details Release / ubuntu-vulkan (arm64, ubuntu-24.04-arm) (push) Has been cancelled Details Release / ubuntu-vulkan (x64, ubuntu-22.04) (push) Has been cancelled Details Release / ubuntu-24-openvino (push) Has been cancelled Details Release / windows-cpu (arm64) (push) Has been cancelled Details Release / windows-cpu (x64) (push) Has been cancelled Details Release / windows (arm64, opencl-adreno, -G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/arm64-windows-llvm.cmake -DCMAKE_PREFIX_PATH="$env:RUNNER_TEMP/opencl-arm64-release" -DGGML_OPENCL=ON -DGGML_OPENCL_USE_ADRENO_KERNELS=ON, ggml-opencl) (push) Has been cancelled Details Release / windows (x64, vulkan, -DGGML_VULKAN=ON, ggml-vulkan) (push) Has been cancelled Details Release / windows-cuda (12.4) (push) Has been cancelled Details Release / windows-cuda (13.1) (push) Has been cancelled Details Release / windows-sycl (push) Has been cancelled Details Release / ios-xcode-build (push) Has been cancelled Details Release / openEuler-cann (aarch64, Release, 310p, off) (push) Has been cancelled Details Release / openEuler-cann (aarch64, Release, 910b, on) (push) Has been cancelled Details Release / openEuler-cann (x86, Release, 310p, off) (push) Has been cancelled Details Release / openEuler-cann (x86, Release, 910b, on) (push) Has been cancelled Details Release / release (push) Has been cancelled Details Release / ui-publish (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx1, backend-sampling) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx2, backend-sampling) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx2) (push) Has been cancelled Details Server (self-hosted) / server-metal (GPUx1) (push) Has been cancelled Details Server (self-hosted) / server-kleidiai (CPUx1, kleidiai) (push) Has been cancelled Details Server / server-windows (push) Has been cancelled Details Two fixes surfaced by running the full test suite against the squash-merged turboquant branch, plus one CMake registration. 1. ggml-cuda/ggml-cuda.cu (GET_ROWS supports_op) Removed TQ3_1S/TQ4_1S from the CUDA/HIP GET_ROWS supports_op switch. TheTom's branch advertised these as supported but never added the matching cases to getrows.cu — a latent bug present on both his branch and master. master's test-backend-ops triggers it; the scheduler will now route get_rows on TQ types to CPU. 2. ggml-cuda/fattn.cu (HIP head-size gate) Master's get_best_fattn_kernel falls through to BEST_FATTN_KERNEL_TILE as default. On HIP, fattn-tile.cu only instantiates head sizes 64, 128, 256, 320, 512 (576/640 exceed local memory limits per #ifndef GGML_USE_HIP). Without this gate, supports_op returns true for unsupported sizes and the dispatch aborts. Now returns BEST_FATTN_KERNEL_NONE on HIP for head sizes the tile kernel cannot compile, letting the scheduler fall back to CPU. 3. tests/CMakeLists.txt (test-turbo-quant registration) TheTom added tests/test-turbo-quant.c (CPU round-trip diagnostic for turbo3/turbo4 quant→dequant→inverse-WHT) but never wired it into the build. Registered as a ctest entry linked against ggml + libm. Test status with these fixes: - CPU (build-cpu): 51/51 ctest pass, including new test-turbo-quant. - HIP (build-hip, gfx1151): 50/50 ctest pass with GGML_CUDA_DISABLE_GRAPHS=1 and test-backend-ops excluded. test-backend-ops itself runs 13674/13677 internal cases; the 3 remaining failures (CLAMP f16 → inf, bf16 FA graph capture) are pre-existing master-side regressions on RDNA3.5+HIP that reproduce on plain master and are unrelated to TurboQuant.	2026-05-19 15:13:55 +02:00
shahondin1624	ddebb5ddf6	turboquant: squash-merge TheTom/llama-cpp-turboquant feature/turboquant-kv-cache Squashes the entire TurboQuant KV-cache feature branch from https://github.com/TheTom/llama-cpp-turboquant (tip `5aeb2fdbe`) onto our master. Includes: TurboQuant KV-cache types (turbo2_0, turbo3_0, turbo4_0, tq3_1s, tq4_1s), GGML_OP_TURBO_WHT op, CUDA + Metal kernels (including TQ-rotated mul_mm path), CPU reference paths, HIP template instances, perplexity tooling, and 18 post-upstream-sync fixes (CVE-2026-21869 server clamp, HIP FA pool retention, n_head_v reshape, sparse-V CUDA gating, etc.). Conflict-resolution notes (review carefully before depending on these paths): - common/arg.cpp, common/speculative.cpp: master's refactored speculative API kept (params.speculative.types / ngram_mod struct, per-sinfo n_low/i_last). - ggml-cuda/fattn.cu: head-size exclusion lists unioned (now exclude both 192 and 640 alongside other sizes). - ggml-cuda/ggml-cuda.cu: both master's ADD/SUB/MUL/DIV F16 widening AND TurboQuant's GGML_OP_TURBO_WHT support cases kept. - ggml-metal-device.h/.cpp: master's new get_pipeline_mul_mv_ext signature (const ggml_tensor * op) kept; TurboQuant's get_pipeline_turbo_wht added. - ggml-metal-ops.cpp: TurboQuant's TQ-rotated mul_mm path preserved; non-TQ else-branch adapted to master's pipeline.nr0/nr1/nsg dispatch API. - ggml-vulkan.cpp: master's spec-constant-driven flash_attn pipeline iteration taken (over TurboQuant's CREATE_FA-per-type macro approach). TURBO3_0 added to the fa_kv_ok lambda for type validation. - ggml-vulkan/flash_attn_base.glsl, vulkan-shaders-gen.cpp: master's new spec-constant FA shader generation kept; TurboQuant's DATA_A_TURBO3_0 macro path NOT carried over. * Vulkan TURBO3_0 flash-attention paths need re-implementation against the new spec-constant API. * Vulkan TURBO3_0 inference will likely fail until that work is redone. Squash base: `7fc1c4ef78` (TheTom's last upstream merge point).	2026-05-19 15:13:49 +02:00
Georgi Gerganov	d14ce3dab4	llama : MTP clean-up (#23269 ) * llama : disable equal splits for recurrent memory with partial rollback * spec : re-enable p-min with MTP drafts * spec : re-enable ngram spec in combination with RS rollback * spec : fix ngram-map-* params * spec : fix acceptance logic in combined ngram + draft configs * graph : fix reuse for combined `token` + `embd` batches * spec : log parameters for each speculative implementation - add LOG_INF in each constructor with implementation type and parameters - extract device string logic into common_speculative_get_devices_str() - move 'adding speculative implementation' log from init into constructors Assisted-by: llama.cpp:local pi * spec : extend --spec-default with ngram-map-k4v Assisted-by: llama.cpp:local pi * minor : fix n_embd log * args : update draft.n_max == 3 + regen docs * spec : relax ngram-mod rejection thold to 0.25 @ 5 low * logs : improve * docs : update speculative decoding CLI argument documentation - Add missing draft model CPU scheduling and tensor override parameters - Update --spec-type to include all available types (excluding draft-eagle3 WIP) - Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0) - Remove deprecated options (spec-draft-ctx-size, spec-draft-replace) - Add environment variables for new parameters Assisted-by: llama.cpp:local pi * arg : step-back on adding k4v to the default spec config * cont : fix name	2026-05-19 15:32:58 +03:00
Aleksander Grygier	6db130445d	ui: Bump packages + address build warnings (#23300 ) * chore: Update vulnerable packages * chore: Formatting * refactor: Update Tailwind CSS imports * ci: Use `ubuntu-latest` for Unit/E2E UI tests * chore: Bump package * fix: Add missing tag * refactor: Enums files naming	2026-05-19 10:16:04 +02:00
Sigbjørn Skjæret	4b262ab662	ci : install libssl-dev (#23325 )	2026-05-19 11:11:04 +03:00
Sigbjørn Skjæret	00c461ce1a	ci : install server kleidiai runner dependencies (#23259 )	2026-05-19 09:06:56 +02:00
Pascal	ccee426426	server-context: guarantee there is at least 1 token to decode (#23280 )	2026-05-19 09:49:01 +03:00
Georgi Gerganov	3c81c8deea	server : print graphs reused in slot timings (#23279 ) Add graphs reused counter to the per-slot timing output, printed via llama_perf_context(). Assisted-by: llama.cpp:local pi Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>	2026-05-19 09:46:58 +03:00
Georgi Gerganov	cd963fee6a	save-load-state : refactor tests and improve readability (#23196 ) * save-load-state : refactor into separate phase functions - Split monolithic main() into 4 self-contained phase functions, each managing its own context/sampler/batch lifecycle - Each function tokenizes internally using its local ctx instance - main() is now a clean orchestrator: init -> run phases -> assert results - Proper resource cleanup on every exit path (return {} on error) Assisted-by: llama.cpp:local pi * save-load-state : use params.out_file instead of separate state_file - Remove state_file parameter from all phase functions - Each function accesses params.out_file directly - Initialize params.out_file in main alongside params.prompt Assisted-by: llama.cpp:local pi * save-load-state : use smart pointers for ctx and smpl - Replace raw llama_context* with llama_context_ptr - Replace raw llama_sampler* with llama_sampler_ptr - Remove all manual llama_free() and llama_sampler_free() calls - Keep llama_batch as raw (managed manually with llama_batch_free) Assisted-by: llama.cpp:local pi * save-load-state : add local llama_batch_ptr RAII wrapper - Add llama_batch_ptr struct holding llama_batch by value - Calls llama_batch_free() in destructor - Eliminates all manual llama_batch_free() calls Assisted-by: llama.cpp:local pi * save-load-state : replace printf/fprintf with logging macros - Add log.h include - Replace fprintf(stderr, ...) errors with LOG_ERR - Replace fprintf(stderr, ...) info with LOG_TRC - Replace printf output with LOG Assisted-by: llama.cpp:local pi * save-load-state : refactor tests to check results inline Each follow-up phase now accepts an expected result and performs the comparison internally instead of collecting results in main(). Assisted-by: llama.cpp:local pi * save-load-state : improve test output readability Add phase labels, remove redundant run prefixes, and show PASS after each test. Assisted-by: llama.cpp:local pi * pi : add rule about git signing * save-load-state : simplify llama_batch_ptr Change get() to return a reference and remove operator(). Use batch.get() throughout for consistency. Assisted-by: llama.cpp:local pi save-load-state : extract generate_tokens helper Factor out the repeated token generation loop into a shared helper function used by all phases. Assisted-by: llama.cpp:local pi * save-load-state : update comments to use test terminology Replace "Phase" with "Test" and list each test's steps as bullet points. Assisted-by: llama.cpp:local pi * save-load-state : rename test functions Rename to test_baseline, test_state_load, test_seq_cp_host, test_seq_cp_device. Update comments and logs accordingly. Assisted-by: llama.cpp:local pi * pi : add rule to never git push without confirmation Assisted-by: llama.cpp:local pi * common : add model_only option to common_init_from_params Add bool model_only parameter to skip context creation, sampler init, and context-dependent setup. Use in save-load-state to initialize only the model, with each test creating its own context. Assisted-by: llama.cpp:local pi --------- Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>	2026-05-19 09:46:34 +03:00
Georgi Gerganov	d2e179a477	llama-eval : add per-task summary stats (#23151 ) * llama-eval : add per-problem summary table to HTML reports - Add chunk_idx and problem_idx to TaskState and saved case dicts - Group completed cases by problem_idx in dump_html() - Render per-problem summary table before individual task table - Columns: Problem (zero-padded), Runs, Correct (n/r), Tokens (min/avg/max), T/s (min/avg/max), Gen s (min/avg/max) - Sorted by problem index, monospace font, right-aligned numbers - Colspan headers for grouped stats, auto width - Simulator: add /v1/models endpoint, timings in response, template-aware question matching, --dataset arg (aime/aime2025) Assisted-by: llama.cpp:local pi * llama-eval : add tabs for Detailed and Summary tables, apply monospace font globally - Wrap Detailed and Summary tables in switchable tabs (Detailed active by default) - Remove summary-section wrapper, use tab labels instead - Apply monospace font to all tables and the top bar Assisted-by: llama.cpp:local pi * llama-eval : redesign top bar as CSS grid label/value pairs - Replace flat span list with 4-column grid layout (2 pairs per row) - Labels in muted color (#888), values in dark (#222) - Bold dataset name and model name - Removed media query, always uses 4 columns Assisted-by: llama.cpp:local pi * llama-eval : use realistic token counts and throughput in simulator - comp_tokens: [30, 80] → [10000, 60000] - tps_gen: derived → uniform [90.0, 110.0] - t_gen_ms: now computed from tokens/tps Assisted-by: llama.cpp:local pi * llama-eval : color Answer column green/red based on correctness Use the same .correct/.incorrect CSS classes on the Answer column to make correct answers green and incorrect answers red. Assisted-by: llama.cpp:local pi * llama-eval : fix pyright errors from max(..., key=len) type inference Use key=lambda x: len(x) instead of key=len so the type checker infers the return type as str instead of Sized, fixing: - unresolved-attribute: Object of type Sized has no attribute lower - not-subscriptable: Cannot subscript object of type Sized Assisted-by: llama.cpp:local pi	2026-05-19 09:46:05 +03:00
Reese Levine	c85a242ed0	ggml-webgpu : extend GDN for K>1 (#23299 )	2026-05-19 09:45:41 +03:00
Neo Zhang	aabee047d8	[SCYL] add chapter for performance reference in SYCL.md (#23315 ) * add chapter for performance reference * rm unsupported GPU	2026-05-19 09:44:51 +03:00
Sigbjørn Skjæret	f1c1c5c057	convert : filter lora tensor names (#23077 )	2026-05-19 09:44:25 +03:00
Intel AI Get-to Market Customer Success and Solutions	439f1b193d	sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (#22153 ) * sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle Signed-off-by: Chun Tao <chun.tao@intel.com> * Use async mem ops for correctness when SYCL graphs are explicitly on. Signed-off-by: Tao, Chun <chun.tao@intel.com> --------- Signed-off-by: Chun Tao <chun.tao@intel.com> Signed-off-by: Tao, Chun <chun.tao@intel.com> Co-authored-by: Chun Tao <chun.tao@intel.com>	2026-05-19 09:44:02 +03:00
Radoslav Gerganov	c3e9ade6dd	rpc : keep last_graph_uid in the device context (#23273 ) With the introduction of MTP we can have multiple compute contexts for the same RPC device. In this case last_graph_uid is not updated properly when contexts are being switched. This patch fixes this by moving last_graph_uid to the device context, making sure it is always updated. closes: #23242	2026-05-19 09:42:36 +03:00
Pranav Dhinakar	9a532ae4ba	hexagon: add support for TRI op (#22822 ) * Hexagon: TRI HVX Kernel addition to ggml hexagon HTP ops and context * addressed PR review comments for TRI op * hexagon: clang format * hex-unary: remove merge conflict markers * hex-ggml: remove duplicate op cases (merge conflict) * hex-ggml: fix editor config errors --------- Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com> Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-05-18 14:04:57 -07:00
Pranav Dhinakar	b7340443d4	ggml-hexagon: add PAD op HVX kernel (#23078 ) * ggml-hexagon: add PAD op HVX kernel Implements GGML_OP_PAD on the Hexagon HTP backend using HVX vectorized kernels. Supports zero-padding and circular padding across all 4 tensor dimensions. * hex-ggml: remove duplicate op cases (merge conflict) * hex-pad: fix editorconfig checks and macro alignment --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-05-18 13:39:36 -07:00
SamareshSingh	5cbaa5e69e	docker : add OCI image labels for version and build date (#21653 ) * docker: add OCI image labels to all published images * docker: propagate OCI labels as manifest and index annotations * docker: drop hardcoded org URL and revert accidental intel version bump The OCI image url and source are now driven by build args with a sensible default. The workflow passes the actual repository url so fork builds get labels pointing at the fork instead of upstream. Also restores the IGC, compute runtime, and IGDGMM versions in the intel Dockerfile labeled stage which I accidentally bumped in the first commit. * docker: add skip_s390x workflow_dispatch input for fast test runs Lets maintainers and PR authors trigger the docker workflow without the s390x build target, which depends on the IBM Z runner and is by far the slowest job in the matrix. The flag filters the s390x row out of the build matrix before merge_matrix is derived, so the merge job sees a consistent shape too. Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com> --------- Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com>	2026-05-18 22:14:45 +02:00
Adrien Gallouët	45b455e66f	common : remove hf cache migration (#23266 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-05-18 17:11:47 +02:00
Aleksander Grygier	3a9c1b854d	ui: Update KaTeX package and clean up logs from `sass` warnings (#23275 ) * ui: migrate katex imports to @use to resolve SCSS deprecation warnings * ci: Use `ubuntu-slim` for CI (UI) workflow	2026-05-18 16:26:01 +02:00
Aleksander Grygier	b9a2170fce	feat: add scroll-to-bottom button to chat + prevent forced scroll down (#23270 )	2026-05-18 16:17:21 +02:00
Aleksander Grygier	1ff0fc1384	ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (#23236 ) * refactor: Scope console logs to `DEV` + `VITE_DEBUG` env vars * refactor: skip MCP proxy probe when no server requires it * refactor: suppress expected disconnect errors during MCP client shutdown * refactor: Deduplicate requests * refactor: deduplicate model fetching across ROUTER and MODEL modes * refactor: Clean up models logic * chore: Add `.env.example` file * refactor: replace client-side CORS proxy probe with server status flag * refactor: Post-review fixes * test: add vitest client setup with API fetch mocks	2026-05-18 16:09:40 +02:00
Aleksander Grygier	a135ec0baa	ui: Centralize monospace font styles in app.css (#23272 )	2026-05-18 15:10:14 +02:00
Martin Andersson	232f466583	webui: fix Tailwind v4 utility classes missing when built via cmake (#23253 )	2026-05-18 14:08:02 +02:00
Andrei	49c21f97cd	llama: initialize pre-norm embedding mask flag (#23256 )	2026-05-18 14:20:49 +03:00
Sigbjørn Skjæret	77e38d68f2	add myself to conversion (#23261 )	2026-05-18 12:42:56 +02:00
Martin Klacer	053e01dff6	ci : added kleidiai-server to server-self-hosted workflow (#22435 ) * kleidiai: added kleidiai-server to server-self-hosted workflow * Added KleidiAI-enabled Arm64 Linux llama-server CI/integration test workflow into the server-self-hosted.yml configuration file Signed-off-by: Martin Klacer <martin.klacer@arm.com> Change-Id: I032e33c525b7e26bc5d53719f638bee610cec1ee * Added self-hosted executor for KleidiAI server workflow Signed-off-by: Martin Klacer <martin.klacer@arm.com> * Update .github/workflows/server-self-hosted.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Martin Klacer <martin.klacer@arm.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-18 11:14:57 +02:00
Georgi Gerganov	c3f95c1f06	scripts : allow wc2wt with an existing branch (#23189 )	2026-05-18 08:57:28 +03:00
Intel AI Get-to Market Customer Success and Solutions	0caf2a1d48	sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (#22156 ) Signed-off-by: Chun Tao <chun.tao@intel.com> Co-authored-by: Chun Tao <chun.tao@intel.com>	2026-05-18 08:12:21 +03:00
Intel AI Get-to Market Customer Success and Solutions	5511965b19	sycl: route small f32 matmuls to oneMKL, bypass oneDNN (#22150 ) Signed-off-by: Chun Tao <chun.tao@intel.com> Co-authored-by: Chun Tao <chun.tao@intel.com>	2026-05-18 08:11:51 +03:00
Neo Zhang	e98bcfec28	sycl : fix error when use -mg 1 error (#23140 )	2026-05-18 08:11:19 +03:00
Incarnas	1867a0c692	update bid to match each layers MTP source (#23237 ) * update bid to match each layers MTP source * Update conversion/qwen.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-18 12:37:12 +08:00
Sigbjørn Skjæret	dd7cad7197	cmake : do not check for bin install dir (#23234 )	2026-05-18 02:33:14 +02:00
Gabe Goodhart	726704a160	feat: Support d_conv=15 for ssm-conv.cu (#23017 ) Branch: ModalityConditionalAdapters AI-usage: none Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2026-05-17 23:05:11 +02:00
Aldehir Rojas	87589042ca	cmake : fix LLAMA_BUILD_UI logic (#23190 )	2026-05-17 14:42:26 -04:00
Sigbjørn Skjæret	e0de4c2419	cmake : do not install conversion script (#23204 )	2026-05-17 18:07:21 +02:00
Oliver Simons	84c678242a	CUDA: Continue directly including cuda/iterator (#23102 ) Cont of #22936, forgot to update one site	2026-05-17 18:00:10 +02:00
Aman Gupta	3e12fbdea5	llama: avoid copying logits during prompt decode in MTP (#23198 ) * llama: avoid copying logits during prompt decode in MTP * review: update comment * llama-graph: call set_output for t_h_pre_norm	2026-05-17 23:30:25 +08:00
Aldehir Rojas	39cf5d6191	common : delegate assistant continuation to underlying template handlers (#23089 ) * common : delegate assistant continuation to template handler * server : implement echo parameter to exclude assistant prefill in the response * server : fix tests for prefill * server : use existing llama template * cont : clean up	2026-05-17 13:36:05 +02:00
Jan Ekström	a6d6183dbc	ggml-vulkan/CMakeLists: add a check for SPIRV-Headers (#22009 ) * ci/run: set explicit SPIR-V Headers search path for macOS vulkan CI For whatever reason, the files are under additional sub-path `vulkan/` under the cmake directory, which does not match either current LunarG macOS Vulkan SDK structure (`lib/cmake/SPIRV-Headers`), nor what gets installed when you run the cmake build+install for SPIRV-Headers itself on at least Linux (`share/cmake/SPIRV-Headers`). This allows for SPIRV-Headers to be found, as currently the CI runner's setup does not seem to include the relevant path in list of search locations. * ggml-vulkan/CMakeLists: add a check for SPIRV-Headers This is installed by the project if it is built and installed. Receiving an error during the configuration step is generally preferred to receiving an error in the middle of a build.	2026-05-17 13:12:11 +02:00
Pascal	fcae601e44	vulkan: add cpy bf16 -> f32 pipelines (#22677 )	2026-05-17 11:31:20 +02:00
Jeff Bolz	7ba22c6a09	vulkan: Support unaligned tensors for ROPE (#22637 )	2026-05-17 11:30:16 +02:00
Aldehir Rojas	f4cc787b9f	common : enable streaming JSON argument values (#23173 ) * common : remove atomic from json arguments * common : remove parsing logic on JSON arguments	2026-05-17 03:44:34 -05:00
Jeff Bolz	3fbadb06dc	vulkan: fuse SSM_CONV + BIAS + SILU (#22653 )	2026-05-17 10:25:50 +02:00
Rares Vernica	1a68ec9378	server : honor --embd-normalize CLI arg (#23125 ) The --embd-normalize flag was registered only for the embedding and debug examples, so llama-server rejected it and the /embedding handler used a hard-coded default of 2 (L2). Add LLAMA_EXAMPLE_SERVER to the flag's example set and read params.embd_normalize as the handler's default. The per-request "embd_normalize" body field continues to override.	2026-05-17 09:39:04 +03:00
ddh0	a16cce81d3	ngram : reduce noisy logs (#23185 ) * ngram : reduce noisy logs * ngram : reduce noisy logs	2026-05-17 09:38:17 +03:00
Judd	4f13cb7424	webui: support video files as input (#22830 )	2026-05-17 02:13:44 +02:00
Xuan-Son Nguyen	b64739ea39	server: (router) alloc tmp buffer on heap (#23159 )	2026-05-16 23:42:16 +02:00

1 2 3 4 5 ...

9239 Commits