Latency (LLM API)
Latency in an LLM API is the time between sending a request and receiving a response from the model. It depends on model size, prompt length, decoding …
Tag
Latency in an LLM API is the time between sending a request and receiving a response from the model. It depends on model size, prompt length, decoding …