Streaming LLM Requests with Python

Today I was playing around with an LLM called Mistral 7B by running it locally with Ollama. Once installed, Ollama provides a chat interface and an API that you can use and run wherever. curl http://localhost:11434/api/generate -d '{ "model": "mistral", "prompt":"tell me a joke?" }' When running this API call, I noticed that responses were streamed back to the client in a way that appears to be token by token. Take a look at running the command....

April 14, 2024 · Me