32 GB VRAM for less $1k sounds like a steal these days, and I’m sure it’s not getting cheaper any time soon.

Does anyone here use this GPU? Or any recent Arc Pros? I basically want someone to talk me out of driving to the nearest place that has it in stock and getting $1k poorer.

  • afk_strats@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 days ago

    I find llama.cpp with Vulkan EXTREMELY reliable. I can have it running for days at once without a problem. As far as tokens/sec that’s that’s a complicated question because it depends on model, quant, sepculative, kv quant, context length, and card distribution. Generally:

    Models’ typical speeds at deep context for agentic use. Simple chats will be faster

    Model Quant Prompt Processing (tok/s) Token Generation (tok/s) Hardware Quality
    Qwen 3.5 397B Q2_K_M 100-120 18-22 2 x 7900 + 4 x Mi50 ★★★★★
    Gemma4 31B or Qwen3.5 27B Q8_0 400-800 20-25 2 x 7900xtx ★★★★
    Qwen 3.6 35B Q5_K_M 1000-2500 60-100 2 x 7900xtx ★★★★
    Qwen 3.5 122B Q4_0 200-300 30-35 4 x MI50 ★★★★
    gpt-oss 120b mxfp4 (native) 500-800 50-60 3 x Mi50 ★★
    Nemotron 3 Nano 30B IQ3_K_XXS 2500-3000 150-180 1 x 7900xtx