Skip to content

Tag

Latency (LLM API)

Latency (LLM API)

Latency in an LLM API is the time between sending a request and receiving a response from the model. It depends on model size, prompt length, decoding …