Ollama API Usage Examples for Developers 2026

Ollama exposes a clean REST API on localhost:11434 that lets any developer integrate local LLMs into applications without cloud dependency, API costs, or privacy concerns. This tutorial covers every endpoint, with real code examples in Python, JavaScript, and curl.

FREE

Zero API cost — local

Core endpoints

100%

OpenAI-compatible

Quick Setup

Shell — Install & Start

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1
# Verify server running on :11434
curl http://localhost:11434/api/tags

Endpoint 1 — /api/generate

Single-turn text generation with optional streaming. Returns JSON lines as tokens are produced — enabling real-time UI updates.

Python — Streaming Generate

import requests, json

def generate(prompt):
    r = requests.post(
        "http://localhost:11434/api/generate",
        json=dict(model="llama3.1", prompt=prompt, stream=True),
        stream=True
    )
    for line in r.iter_lines():
        if line:
            c = json.loads(line)
            print(c["response"], end="", flush=True)
            if c.get("done"): break

generate("Explain transformers in 3 sentences.")

Endpoint 2 — /api/chat (Multi-Turn)

Python — Chat with History

msgs = [
    dict(role="system", content="You are a concise coding assistant."),
    dict(role="user", content="Write a Python quicksort.")
]
r = requests.post(
    "http://localhost:11434/api/chat",
    json=dict(model="llama3.1", messages=msgs, stream=False)
)
print(r.json()["message"]["content"])

OpenAI Drop-In Replacement

Python — OpenAI SDK → Ollama

from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
resp = client.chat.completions.create(
    model="llama3.1",
    messages=[dict(role="user", content="Hello!")]
)
print(resp.choices[0].message.content)

JavaScript Streaming Example

JavaScript — Fetch Streaming

const r = await fetch("http://localhost:11434/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama3.1",
    messages: [{ role: "user", content: "Hi!" }],
    stream: true
  })
});
const reader = r.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const c = JSON.parse(new TextDecoder().decode(value));
  process.stdout.write(c.message?.content ?? "");
}

All Endpoints Reference

Method	Endpoint	Use Case
POST	`/api/generate`	Single-turn text generation, streaming
POST	`/api/chat`	Multi-turn conversation with history
GET	`/api/tags`	List locally available models
POST	`/api/embeddings`	Vector embeddings for RAG
POST	`/api/pull`	Pull model from Ollama registry
GET	`/v1/chat/completions`	OpenAI-compatible endpoint

Security Note

Never expose port 11434 directly to the public internet. For team or production use, add an nginx reverse proxy with HTTPS and authentication. Ollama's built-in server has no auth by default.

→ More Ollama Articles

→ Run Llama 3 with Ollama Locally — Performance Guide → Ollama Docker Setup for Local LLM Deployment → Ollama Use Cases for Business Automation & AI Agents