A benchmark of 140 NVIDIA-hosted model IDs using chat completion calls, ranked by response latency and grouped by success, errors, timeouts, and failures.
Errors usually mean the model is not compatible with the chat completions endpoint, requires a different endpoint, or is restricted. Timeouts usually mean the model is too slow, overloaded, or unstable for this setup.
Most usable models landed under 2 seconds, which is excellent for interactive development. Anything above 5 seconds should be reserved for quality-heavy tasks only.
mistralai/ministral-14b-instruct-2512, qwen/qwen2.5-coder-32b-instruct, mistralai/mixtral-8x22b-instruct-v0.1, meta/llama-4-maverick-17b-128e-instruct, and nvidia/nemotron-mini-4b-instruct. They were fast enough for real-time use while being more general-purpose than the ultra-fast safety/PII/translation models.| Provider | Total | Success | Error | Timeout | Fail | Avg latency | Median latency |
|---|---|---|---|---|---|---|---|
| nvidia | 46 | 20 | 26 | 0 | 0 | 0.95s | 0.82s |
| mistralai | 15 | 8 | 7 | 0 | 0 | 2.52s | 1.14s |
| 12 | 5 | 6 | 1 | 0 | 2.93s | 2.23s | |
| meta | 12 | 9 | 2 | 1 | 0 | 3.34s | 2.24s |
| deepseek-ai | 7 | 1 | 1 | 5 | 0 | 9.87s | 9.87s |
| qwen | 6 | 5 | 0 | 1 | 0 | 1.78s | 1.29s |
| microsoft | 5 | 0 | 3 | 2 | 0 | — | — |
| ibm | 4 | 0 | 4 | 0 | 0 | — | — |
| moonshotai | 4 | 2 | 0 | 2 | 0 | 1.84s | 1.84s |
| openai | 4 | 4 | 0 | 0 | 0 | 1.80s | 1.69s |
| writer | 4 | 0 | 4 | 0 | 0 | — | — |
| z-ai | 3 | 0 | 0 | 2 | 1 | — | — |
| minimaxai | 2 | 1 | 0 | 0 | 1 | 1.85s | 1.85s |
| 01-ai | 1 | 0 | 1 | 0 | 0 | — | — |
| abacusai | 1 | 1 | 0 | 0 | 0 | 2.83s | 2.83s |
| adept | 1 | 0 | 1 | 0 | 0 | — | — |
| ai21labs | 1 | 0 | 1 | 0 | 0 | — | — |
| aisingapore | 1 | 0 | 1 | 0 | 0 | — | — |
| baai | 1 | 0 | 1 | 0 | 0 | — | — |
| bigcode | 1 | 0 | 1 | 0 | 0 | — | — |
| bytedance | 1 | 1 | 0 | 0 | 0 | 2.72s | 2.72s |
| databricks | 1 | 0 | 1 | 0 | 0 | — | — |
| nv-mistralai | 1 | 0 | 1 | 0 | 0 | — | — |
| sarvamai | 1 | 1 | 0 | 0 | 0 | 3.40s | 3.40s |
| snowflake | 1 | 0 | 1 | 0 | 0 | — | — |
| stepfun-ai | 1 | 1 | 0 | 0 | 0 | 1.02s | 1.02s |
| stockmark | 1 | 1 | 0 | 0 | 0 | 1.35s | 1.35s |
| upstage | 1 | 1 | 0 | 0 | 0 | 1.78s | 1.78s |
| zyphra | 1 | 0 | 1 | 0 | 0 | — | — |
| # | Model | Provider | Status | Latency |
|---|---|---|---|---|
| 0 | 01-ai/yi-large | 01-ai | ERROR | — |
| 1 | abacusai/dracarys-llama-3.1-70b-instruct | abacusai | SUCCESS | 2.83s |
| 2 | adept/fuyu-8b | adept | ERROR | — |
| 3 | ai21labs/jamba-1.5-large-instruct | ai21labs | ERROR | — |
| 4 | aisingapore/sea-lion-7b-instruct | aisingapore | ERROR | — |
| 5 | baai/bge-m3 | baai | ERROR | — |
| 6 | bigcode/starcoder2-15b | bigcode | ERROR | — |
| 7 | bytedance/seed-oss-36b-instruct | bytedance | SUCCESS | 2.72s |
| 8 | databricks/dbrx-instruct | databricks | ERROR | — |
| 9 | deepseek-ai/deepseek-coder-6.7b-instruct | deepseek-ai | ERROR | — |
| 10 | deepseek-ai/deepseek-v3.1-terminus | deepseek-ai | TIMEOUT | — |
| 11 | deepseek-ai/deepseek-v3.2 | deepseek-ai | TIMEOUT | — |
| 12 | deepseek-ai/deepseek-v4-flash | deepseek-ai | TIMEOUT | — |
| 13 | deepseek-ai/deepseek-v4-flash | deepseek-ai | SUCCESS | 9.87s |
| 14 | deepseek-ai/deepseek-v4-pro | deepseek-ai | TIMEOUT | — |
| 15 | deepseek-ai/deepseek-v4-pro | deepseek-ai | TIMEOUT | — |
| 16 | google/codegemma-1.1-7b | ERROR | — | |
| 17 | google/codegemma-7b | ERROR | — | |
| 18 | google/deplot | ERROR | — | |
| 19 | google/gemma-2-2b-it | SUCCESS | 0.44s | |
| 20 | google/gemma-2b | ERROR | — | |
| 21 | google/gemma-3-12b-it | TIMEOUT | — | |
| 22 | google/gemma-3-27b-it | SUCCESS | 6.45s | |
| 23 | google/gemma-3-4b-it | SUCCESS | 3.64s | |
| 24 | google/gemma-3n-e2b-it | SUCCESS | 1.88s | |
| 25 | google/gemma-3n-e4b-it | SUCCESS | 2.23s | |
| 26 | google/gemma-4-31b-it | ERROR | — | |
| 27 | google/recurrentgemma-2b | ERROR | — | |
| 28 | ibm/granite-3.0-3b-a800m-instruct | ibm | ERROR | — |
| 29 | ibm/granite-3.0-8b-instruct | ibm | ERROR | — |
| 30 | ibm/granite-34b-code-instruct | ibm | ERROR | — |
| 31 | ibm/granite-8b-code-instruct | ibm | ERROR | — |
| 32 | meta/codellama-70b | meta | ERROR | — |
| 33 | meta/llama-3.1-405b-instruct | meta | TIMEOUT | — |
| 34 | meta/llama-3.1-70b-instruct | meta | SUCCESS | 2.24s |
| 35 | meta/llama-3.1-8b-instruct | meta | SUCCESS | 8.73s |
| 36 | meta/llama-3.2-11b-vision-instruct | meta | SUCCESS | 0.81s |
| 37 | meta/llama-3.2-1b-instruct | meta | SUCCESS | 9.35s |
| 38 | meta/llama-3.2-3b-instruct | meta | SUCCESS | 0.82s |
| 39 | meta/llama-3.2-90b-vision-instruct | meta | SUCCESS | 3.40s |
| 40 | meta/llama-3.3-70b-instruct | meta | SUCCESS | 3.42s |
| 41 | meta/llama-4-maverick-17b-128e-instruct | meta | SUCCESS | 0.83s |
| 42 | meta/llama-guard-4-12b | meta | SUCCESS | 0.42s |
| 43 | meta/llama2-70b | meta | ERROR | — |
| 44 | microsoft/kosmos-2 | microsoft | ERROR | — |
| 45 | microsoft/phi-3-vision-128k-instruct | microsoft | ERROR | — |
| 46 | microsoft/phi-3.5-moe-instruct | microsoft | ERROR | — |
| 47 | microsoft/phi-4-mini-instruct | microsoft | TIMEOUT | — |
| 48 | microsoft/phi-4-multimodal-instruct | microsoft | TIMEOUT | — |
| 49 | minimaxai/minimax-m2.5 | minimaxai | SUCCESS | 1.85s |
| 50 | minimaxai/minimax-m2.7 | minimaxai | FAIL | — |
| 51 | mistralai/codestral-22b-instruct-v0.1 | mistralai | ERROR | — |
| 52 | mistralai/devstral-2-123b-instruct-2512 | mistralai | SUCCESS | 1.18s |
| 53 | mistralai/magistral-small-2506 | mistralai | ERROR | — |
| 54 | mistralai/ministral-14b-instruct-2512 | mistralai | SUCCESS | 0.65s |
| 55 | mistralai/mistral-7b-instruct-v0.3 | mistralai | ERROR | — |
| 56 | mistralai/mistral-large | mistralai | ERROR | — |
| 57 | mistralai/mistral-large-2-instruct | mistralai | ERROR | — |
| 58 | mistralai/mistral-large-3-675b-instruct-2512 | mistralai | SUCCESS | 7.35s |
| 59 | mistralai/mistral-medium-3-instruct | mistralai | ERROR | — |
| 60 | mistralai/mistral-medium-3.5-128b | mistralai | SUCCESS | 5.03s |
| 61 | mistralai/mistral-nemotron | mistralai | SUCCESS | 1.06s |
| 62 | mistralai/mistral-small-4-119b-2603 | mistralai | SUCCESS | 1.09s |
| 63 | mistralai/mixtral-8x22b-instruct-v0.1 | mistralai | SUCCESS | 0.85s |
| 64 | mistralai/mixtral-8x22b-v0.1 | mistralai | ERROR | — |
| 65 | mistralai/mixtral-8x7b-instruct-v0.1 | mistralai | SUCCESS | 2.91s |
| 66 | moonshotai/kimi-k2-instruct | moonshotai | SUCCESS | 2.47s |
| 67 | moonshotai/kimi-k2-instruct-0905 | moonshotai | TIMEOUT | — |
| 68 | moonshotai/kimi-k2-thinking | moonshotai | TIMEOUT | — |
| 69 | moonshotai/kimi-k2.6 | moonshotai | SUCCESS | 1.21s |
| 70 | nv-mistralai/mistral-nemo-12b-instruct | nv-mistralai | ERROR | — |
| 71 | nvidia/ai-synthetic-video-detector | nvidia | ERROR | — |
| 72 | nvidia/cosmos-reason2-8b | nvidia | ERROR | — |
| 73 | nvidia/embed-qa-4 | nvidia | ERROR | — |
| 74 | nvidia/gliner-pii | nvidia | SUCCESS | 0.30s |
| 75 | nvidia/ising-calibration-1-35b-a3b | nvidia | SUCCESS | 0.96s |
| 76 | nvidia/llama-3.1-nemoguard-8b-content-safety | nvidia | SUCCESS | 0.50s |
| 77 | nvidia/llama-3.1-nemoguard-8b-topic-control | nvidia | SUCCESS | 0.39s |
| 78 | nvidia/llama-3.1-nemotron-51b-instruct | nvidia | ERROR | — |
| 79 | nvidia/llama-3.1-nemotron-70b-instruct | nvidia | ERROR | — |
| 80 | nvidia/llama-3.1-nemotron-nano-8b-v1 | nvidia | SUCCESS | 1.34s |
| 81 | nvidia/llama-3.1-nemotron-nano-vl-8b-v1 | nvidia | SUCCESS | 0.95s |
| 82 | nvidia/llama-3.1-nemotron-safety-guard-8b-v3 | nvidia | SUCCESS | 0.55s |
| 83 | nvidia/llama-3.1-nemotron-ultra-253b-v1 | nvidia | ERROR | — |
| 84 | nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1 | nvidia | ERROR | — |
| 85 | nvidia/llama-3.2-nemoretriever-300m-embed-v1 | nvidia | ERROR | — |
| 86 | nvidia/llama-3.2-nv-embedqa-1b-v1 | nvidia | ERROR | — |
| 87 | nvidia/llama-3.2-nv-embedqa-1b-v2 | nvidia | ERROR | — |
| 88 | nvidia/llama-3.3-nemotron-super-49b-v1 | nvidia | SUCCESS | 1.10s |
| 89 | nvidia/llama-3.3-nemotron-super-49b-v1.5 | nvidia | SUCCESS | 2.75s |
| 90 | nvidia/llama-nemotron-embed-1b-v2 | nvidia | ERROR | — |
| 91 | nvidia/llama-nemotron-embed-vl-1b-v2 | nvidia | ERROR | — |
| 92 | nvidia/llama3-chatqa-1.5-70b | nvidia | ERROR | — |
| 93 | nvidia/mistral-nemo-minitron-8b-8k-instruct | nvidia | ERROR | — |
| 94 | nvidia/nemoretriever-parse | nvidia | ERROR | — |
| 95 | nvidia/nemotron-3-content-safety | nvidia | SUCCESS | 0.42s |
| 96 | nvidia/nemotron-3-nano-30b-a3b | nvidia | SUCCESS | 0.87s |
| 97 | nvidia/nemotron-3-nano-omni-30b-a3b-reasoning | nvidia | SUCCESS | 0.91s |
| 98 | nvidia/nemotron-3-nano-omni-30b-a3b-reasoning | nvidia | SUCCESS | 0.67s |
| 99 | nvidia/nemotron-3-super-120b-a12b | nvidia | SUCCESS | 1.59s |
| 100 | nvidia/nemotron-3-super-120b-a12b | nvidia | SUCCESS | 1.17s |
| 101 | nvidia/nemotron-4-340b-instruct | nvidia | ERROR | — |
| 102 | nvidia/nemotron-4-340b-reward | nvidia | ERROR | — |
| 103 | nvidia/nemotron-content-safety-reasoning-4b | nvidia | SUCCESS | 0.62s |
| 104 | nvidia/nemotron-mini-4b-instruct | nvidia | SUCCESS | 0.61s |
| 105 | nvidia/nemotron-nano-12b-v2-vl | nvidia | SUCCESS | 0.77s |
| 106 | nvidia/nemotron-nano-3-30b-a3b | nvidia | ERROR | — |
| 107 | nvidia/nemotron-parse | nvidia | ERROR | — |
| 108 | nvidia/neva-22b | nvidia | ERROR | — |
| 109 | nvidia/nv-embed-v1 | nvidia | ERROR | — |
| 110 | nvidia/nv-embedcode-7b-v1 | nvidia | ERROR | — |
| 111 | nvidia/nv-embedqa-e5-v5 | nvidia | ERROR | — |
| 112 | nvidia/nv-embedqa-mistral-7b-v2 | nvidia | ERROR | — |
| 113 | nvidia/nvclip | nvidia | ERROR | — |
| 114 | nvidia/nvidia-nemotron-nano-9b-v2 | nvidia | SUCCESS | 2.00s |
| 115 | nvidia/riva-translate-4b-instruct | nvidia | ERROR | — |
| 116 | nvidia/riva-translate-4b-instruct-v1.1 | nvidia | SUCCESS | 0.44s |
| 117 | openai/gpt-oss-120b | openai | SUCCESS | 2.59s |
| 118 | openai/gpt-oss-120b | openai | SUCCESS | 3.12s |
| 119 | openai/gpt-oss-20b | openai | SUCCESS | 0.72s |
| 120 | openai/gpt-oss-20b | openai | SUCCESS | 0.79s |
| 121 | qwen/qwen2.5-coder-32b-instruct | qwen | SUCCESS | 0.85s |
| 122 | qwen/qwen3-coder-480b-a35b-instruct | qwen | SUCCESS | 3.47s |
| 123 | qwen/qwen3-next-80b-a3b-instruct | qwen | SUCCESS | 0.85s |
| 124 | qwen/qwen3-next-80b-a3b-thinking | qwen | SUCCESS | 1.29s |
| 125 | qwen/qwen3.5-122b-a10b | qwen | SUCCESS | 2.45s |
| 126 | qwen/qwen3.5-397b-a17b | qwen | TIMEOUT | — |
| 127 | sarvamai/sarvam-m | sarvamai | SUCCESS | 3.40s |
| 128 | snowflake/arctic-embed-l | snowflake | ERROR | — |
| 129 | stepfun-ai/step-3.5-flash | stepfun-ai | SUCCESS | 1.02s |
| 130 | stockmark/stockmark-2-100b-instruct | stockmark | SUCCESS | 1.35s |
| 131 | upstage/solar-10.7b-instruct | upstage | SUCCESS | 1.78s |
| 132 | writer/palmyra-creative-122b | writer | ERROR | — |
| 133 | writer/palmyra-fin-70b-32k | writer | ERROR | — |
| 134 | writer/palmyra-med-70b | writer | ERROR | — |
| 135 | writer/palmyra-med-70b-32k | writer | ERROR | — |
| 136 | z-ai/glm-5.1 | z-ai | TIMEOUT | — |
| 137 | z-ai/glm4.7 | z-ai | FAIL | — |
| 138 | z-ai/glm5 | z-ai | TIMEOUT | — |
| 139 | zyphra/zamba2-7b-instruct | zyphra | ERROR | — |
1. The catalog includes many non-chat models, so a high error count is expected when every model is sent to /chat/completions. 2. DeepSeek and very large models were the biggest timeout risks in this run. 3. Smaller and mid-sized instruction models gave the best developer experience. 4. Re-run the test with the same prompt 3–5 times per model before making a final production decision.