Background

NVIDIA offers free AI model APIs via integrate.api.nvidia.com. But are they reliable during peak hours? I ran two rounds of tests to find out.

Test Method

  • 3 calls per model, fixed prompt: “Reply OK only”
  • 30-second timeout
  • Tested at 2 AM (off-peak) and 5 PM (peak)

Results

ModelOff-peakPeakAvg ResponseVerdict
mistral-small-4-119b3/32/30.69s⭐ Fastest
nemotron-3-super-120b3/33/310.1s✅ Most stable
qwen3.5-122b3/32/36.9s✅ Usable
kimi-k2.52/30/3Timeout❌ Dead at peak
deepseek-v3.20/30/3Timeout❌ Always dead
glm-4.70/30/3404❌ Endpoint missing

Conclusion

Only 3 models survive peak hours:

  1. Mistral Small 4 — 0.69s response, blazing fast
  2. Nemotron 3 Super — 100% success rate, but slow (7-14s), has reasoning
  3. Qwen 3.5 — Works but occasionally times out

The other three (Kimi, DeepSeek, GLM) are completely unusable during peak hours.