=== SMEM M5 Benchmark: baseline === Model: Qwen3.5-35B-A3B-Q8_0.gguf Date: Sat Mar 28 21:45:40 CDT 2026 --- turbo3 @ short --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x105cffcb0 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x105cfeb30 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 6.440 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | tg128 | 78.47 ± 0.56 | build: 13afec1 (178) --- turbo3 @ 8192 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1040cfae0 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1040ce960 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.010 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | pp8192 | 2144.16 ± 30.18 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | tg128 | 78.90 ± 0.24 | build: 13afec1 (178) --- turbo3 @ 16384 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x10500fc00 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x10500ea80 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | pp16384 | 1704.41 ± 21.63 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | tg128 | 78.64 ± 0.44 | build: 13afec1 (178) --- turbo3 @ 32768 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x101c8fb00 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x101c8e980 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.013 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | pp32768 | 1238.85 ± 6.06 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo3 | turbo3 | 1 | tg128 | 78.17 ± 0.69 | build: 13afec1 (178) --- turbo4 @ short --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103c17f70 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103c16df0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | tg128 | 80.40 ± 0.72 | build: 13afec1 (178) --- turbo4 @ 8192 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103e57d30 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103e56bb0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.010 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | pp8192 | 2048.90 ± 43.42 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | tg128 | 79.84 ± 0.95 | build: 13afec1 (178) --- turbo4 @ 16384 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1060bf740 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1060be5c0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.009 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | pp16384 | 1605.18 ± 20.70 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | tg128 | 79.45 ± 1.55 | build: 13afec1 (178) --- turbo4 @ 32768 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1040ef870 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1040ee6f0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.010 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | pp32768 | 1157.30 ± 8.01 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | turbo4 | turbo4 | 1 | tg128 | 80.64 ± 0.72 | build: 13afec1 (178) --- q8_0 @ short --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1055e78c0 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1055e6740 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | tg128 | 85.48 ± 1.34 | build: 13afec1 (178) --- q8_0 @ 8192 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x105ac8540 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x105ac73c0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.010 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | pp8192 | 2106.47 ± 64.66 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | tg128 | 76.72 ± 2.13 | build: 13afec1 (178) --- q8_0 @ 16384 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103fefa70 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x103fee8f0 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | pp16384 | 1723.71 ± 28.56 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | tg128 | 78.09 ± 3.70 | build: 13afec1 (178) --- q8_0 @ 32768 --- ggml_metal_device_init: testing tensor API for f16 support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1035f7b10 | th_max = 1024 | th_width = 32 ggml_metal_device_init: testing tensor API for bfloat support ggml_metal_library_compile_pipeline: compiling pipeline: base = 'dummy_kernel', name = 'dummy_kernel' ggml_metal_library_compile_pipeline: loaded dummy_kernel 0x1035f6990 | th_max = 1024 | th_width = 32 ggml_metal_library_init: using embedded metal library ggml_metal_library_init: turbo3 sparse V dequant enabled ggml_metal_library_init: loaded in 0.008 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: MTL0 ggml_metal_device_init: GPU family: MTLGPUFamilyApple10 (1010) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = true ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 115448.73 MB | model | size | params | backend | threads | type_k | type_v | fa | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | ------: | -----: | -----: | -: | --------------: | -------------------: | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | pp32768 | 1216.99 ± 28.64 | | qwen35moe 35B.A3B Q8_0 | 34.36 GiB | 34.66 B | MTL,BLAS | 1 | q8_0 | q8_0 | 1 | tg128 | 86.83 ± 0.34 | build: 13afec1 (178) === Done: baseline ===