Blog/Ollama
Ollama · API · Tutorial

Ollama API Usage Examples for Developers 2026

PL
Prashant Lalwani
April 16, 2026 · 13 min read
REST API · Python · JavaScript
ollama_api.py01import requests, json02# Ollama local REST API03url = "http://localhost:11434/api/generate"04payload = {05 "model": "llama3.1",06 "prompt": user_input,07 "stream": True08}09r = requests.post(url, json=payload)API Responseresponse: "Sure! Here is a Python function that returns Fibonacci numbers..." done: falseCore EndpointsPOST /api/generatePOST /api/chatGET /api/tagsPOST /api/embeddingsGET /v1/chat/completionsOllama REST API · OpenAI-Compatible · Python / JS / curl · Fully Local · Free

Ollama exposes a clean REST API on localhost:11434 that lets any developer integrate local LLMs into applications without cloud dependency, API costs, or privacy concerns. This tutorial covers every endpoint, with real code examples in Python, JavaScript, and curl.

FREE
Zero API cost — local
5
Core endpoints
100%
OpenAI-compatible

Quick Setup

Shell — Install & Start
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1
# Verify server running on :11434
curl http://localhost:11434/api/tags

Endpoint 1 — /api/generate

Single-turn text generation with optional streaming. Returns JSON lines as tokens are produced — enabling real-time UI updates.

Python — Streaming Generate
import requests, json

def generate(prompt):
    r = requests.post(
        "http://localhost:11434/api/generate",
        json=dict(model="llama3.1", prompt=prompt, stream=True),
        stream=True
    )
    for line in r.iter_lines():
        if line:
            c = json.loads(line)
            print(c["response"], end="", flush=True)
            if c.get("done"): break

generate("Explain transformers in 3 sentences.")

Endpoint 2 — /api/chat (Multi-Turn)

Python — Chat with History
msgs = [
    dict(role="system", content="You are a concise coding assistant."),
    dict(role="user", content="Write a Python quicksort.")
]
r = requests.post(
    "http://localhost:11434/api/chat",
    json=dict(model="llama3.1", messages=msgs, stream=False)
)
print(r.json()["message"]["content"])

OpenAI Drop-In Replacement

Python — OpenAI SDK → Ollama
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
resp = client.chat.completions.create(
    model="llama3.1",
    messages=[dict(role="user", content="Hello!")]
)
print(resp.choices[0].message.content)

JavaScript Streaming Example

JavaScript — Fetch Streaming
const r = await fetch("http://localhost:11434/api/chat", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama3.1",
    messages: [{ role: "user", content: "Hi!" }],
    stream: true
  })
});
const reader = r.body.getReader();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const c = JSON.parse(new TextDecoder().decode(value));
  process.stdout.write(c.message?.content ?? "");
}

All Endpoints Reference

MethodEndpointUse Case
POST/api/generateSingle-turn text generation, streaming
POST/api/chatMulti-turn conversation with history
GET/api/tagsList locally available models
POST/api/embeddingsVector embeddings for RAG
POST/api/pullPull model from Ollama registry
GET/v1/chat/completionsOpenAI-compatible endpoint
Security Note

Never expose port 11434 directly to the public internet. For team or production use, add an nginx reverse proxy with HTTPS and authentication. Ollama's built-in server has no auth by default.