Craig St. Jean
craigstjean.com
Craig St. Jean
@craigstjean.com
Father, programmer, constant learner, OutSystems MVP, pluralsight author
Some stats:

Ollama with Llama3.2 3B Q4_K_M, M1 got 66t/s, M4 got 116t/s, the 7900 XTX got 119t/s.

MLX Q4_K_M, M1 got 104t/s, M4 got 192t/s

llama.cpp Q4_0 for the 7900 XTX with ROCm got 109t/s, Vulkan got 112t/s
Q8_0 with ROCm got 105t/s, Vulkan got 103t/s
March 13, 2025 at 1:27 AM
Sadly, the experiment did not work. With AMD=1, Exo loaded properly but wouldn't respond to any prompts. Also tried the amdfix branch.

When running with the M4 Max (128GB) and the M1 Max Studio (32GB), it ran, kinda, but the M1 kept disconnecting. When it did work, it was extremely slow.
March 13, 2025 at 12:14 AM