Ollama exposes a clean REST API on localhost:11434 that lets any developer integrate local LLMs into applications without cloud dependency, API costs, or privacy concerns. This tutorial covers every endpoint, with real code examples in Python, JavaScript, and curl.
FREE
Zero API cost — local
5
Core endpoints
100%
OpenAI-compatible
Quick Setup
Shell — Install & Start
# macOS / Linux curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.1 # Verify server running on :11434 curl http://localhost:11434/api/tags
Endpoint 1 — /api/generate
Single-turn text generation with optional streaming. Returns JSON lines as tokens are produced — enabling real-time UI updates.
Python — Streaming Generate
import requests, json def generate(prompt): r = requests.post( "http://localhost:11434/api/generate", json=dict(model="llama3.1", prompt=prompt, stream=True), stream=True ) for line in r.iter_lines(): if line: c = json.loads(line) print(c["response"], end="", flush=True) if c.get("done"): break generate("Explain transformers in 3 sentences.")
Endpoint 2 — /api/chat (Multi-Turn)
Python — Chat with History
msgs = [
dict(role="system", content="You are a concise coding assistant."),
dict(role="user", content="Write a Python quicksort.")
]
r = requests.post(
"http://localhost:11434/api/chat",
json=dict(model="llama3.1", messages=msgs, stream=False)
)
print(r.json()["message"]["content"])OpenAI Drop-In Replacement
Python — OpenAI SDK → Ollama
from openai import OpenAI client = OpenAI( base_url="http://localhost:11434/v1", api_key="ollama" ) resp = client.chat.completions.create( model="llama3.1", messages=[dict(role="user", content="Hello!")] ) print(resp.choices[0].message.content)
JavaScript Streaming Example
JavaScript — Fetch Streaming
const r = await fetch("http://localhost:11434/api/chat", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ model: "llama3.1", messages: [{ role: "user", content: "Hi!" }], stream: true }) }); const reader = r.body.getReader(); while (true) { const { done, value } = await reader.read(); if (done) break; const c = JSON.parse(new TextDecoder().decode(value)); process.stdout.write(c.message?.content ?? ""); }
All Endpoints Reference
| Method | Endpoint | Use Case |
|---|---|---|
| POST | /api/generate | Single-turn text generation, streaming |
| POST | /api/chat | Multi-turn conversation with history |
| GET | /api/tags | List locally available models |
| POST | /api/embeddings | Vector embeddings for RAG |
| POST | /api/pull | Pull model from Ollama registry |
| GET | /v1/chat/completions | OpenAI-compatible endpoint |
Security Note
Never expose port 11434 directly to the public internet. For team or production use, add an nginx reverse proxy with HTTPS and authentication. Ollama's built-in server has no auth by default.