Tag

Latency (LLM API)

Latency (LLM API)

January 1, 1

Latency in an LLM API is the time between sending a request and receiving a response from the model. It depends on model size, prompt length, decoding …