Ollama with Llama3.2 3B Q4_K_M, M1 got 66t/s, M4 got 116t/s, the 7900 XTX got 119t/s.
MLX Q4_K_M, M1 got 104t/s, M4 got 192t/s
llama.cpp Q4_0 for the 7900 XTX with ROCm got 109t/s, Vulkan got 112t/s
Q8_0 with ROCm got 105t/s, Vulkan got 103t/s
Ollama with Llama3.2 3B Q4_K_M, M1 got 66t/s, M4 got 116t/s, the 7900 XTX got 119t/s.
MLX Q4_K_M, M1 got 104t/s, M4 got 192t/s
llama.cpp Q4_0 for the 7900 XTX with ROCm got 109t/s, Vulkan got 112t/s
Q8_0 with ROCm got 105t/s, Vulkan got 103t/s
When running with the M4 Max (128GB) and the M1 Max Studio (32GB), it ran, kinda, but the M1 kept disconnecting. When it did work, it was extremely slow.
When running with the M4 Max (128GB) and the M1 Max Studio (32GB), it ran, kinda, but the M1 kept disconnecting. When it did work, it was extremely slow.