Ollama AI Guides

This is a single consolidated guide for running local models with Ollama and connecting tools and agents (especially Goose) to your local Ollama server. It merges the core Desktop, CLI, and Goose workflows into one practical "happy path + operations" reference.
Ollama AI Guides

Ollama Local Hosting — CLI + Desktop Guide


Index


Overview

When to use this combined guide.

  • You want one local-only Ollama reference instead of separate desktop and terminal docs.
  • You use the desktop app sometimes but still want CLI control for scripting and diagnostics.
  • You want a single workflow for hosting models that local agents, editors, and scripts can reuse.

Mental model.

  • The desktop app is the easiest way to keep Ollama running in the background.
  • The CLI is the control surface for pulling models, inspecting them, building custom models, and testing the API.
  • The local API is the shared interface your other tools use.

Setup

1) Install Ollama

  • Install the Ollama desktop app or your platform’s preferred package.
  • Verify the install:
ollama --version

2) Choose how you run it

Pick one of these normal local patterns:

  • Desktop-first: open the Ollama app and let it keep the local server running in the background.
  • CLI-first: run ollama serve yourself in a terminal or through a service manager.

For direct terminal hosting:

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

ollama pull qwen3.5:3b
ollama run qwen3.5:3b
  • ollama pull downloads the model without opening a chat.
  • ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

curl http://localhost:11434/api/version

Check inference:

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

curl http://localhost:11434/v1/models

Beginner usage

  1. Start Ollama with either the desktop app or ollama serve.
  2. Pull a model: ollama pull qwen3.5:3b
  3. Sanity-check it locally with ollama run qwen3.5:3b
  4. Verify the API with curl http://localhost:11434/api/version
  5. Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
  6. Use the exact local model name, for example qwen3.5:3b

Fast path

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

  • Launch the Ollama app when you log in.
  • Let it keep the server running in the background.
  • Use the CLI when you need to pull, inspect, stop, or remove models.
  • Point your editors and agents at the same local API instead of running separate model hosts.

This is the simplest setup if you want Ollama always available but still want terminal control.

Daily CLI workflow

Use the terminal when you want explicit control over everything:

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps

This is the better fit for headless machines, scripted workflows, and debugging.

Model lifecycle

ollama ls
ollama ps
ollama pull <model>
ollama show <model>
ollama stop <model>
ollama rm <model[:tag]>

Typical pattern:

  • ollama ls to see what is installed
  • ollama ps to see what is loaded in memory
  • ollama show to inspect size and family details
  • ollama stop to unload a model
  • ollama rm to reclaim disk space

Custom models with Modelfile

Use a Modelfile when you want a reusable local model preset:

FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.

Build and run it:

ollama create my-local-coder -f Modelfile
ollama run my-local-coder

OpenAI-compatible clients

Many local tools can connect through Ollama’s OpenAI-compatible endpoint:

  • Base URL: http://localhost:11434/v1/
  • API key: often required by the client, but ignored by Ollama
  • Model: the exact local model name you installed

Example:

export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama

Background service patterns

  • macOS/Windows: the desktop app is usually the easiest background service.
  • Linux: a system service is usually cleaner than a long-lived shell.

Linux example:

sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f

Docker hosting

Use Docker when you want isolation or a predictable local deployment:

docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b

LAN access use sparingly

The safe default is local-only on 127.0.0.1:11434.

If you intentionally need LAN access:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Only do this on trusted networks, and preferably behind a proxy with auth.


Cost savings guide

  • Use smaller local models for routine tasks and larger ones only when needed.
  • Pre-pull the models you use often so you avoid cold-start delays.
  • Run one heavy model at a time on constrained machines.
  • Reuse one local Ollama host across your agents, editors, and scripts.

Privacy guide

For direct terminal hosting:

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

ollama pull qwen3.5:3b
ollama run qwen3.5:3b
  • ollama pull downloads the model without opening a chat.
  • ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

curl http://localhost:11434/api/version

Check inference:

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

curl http://localhost:11434/v1/models

Beginner usage

  1. Start Ollama with either the desktop app or ollama serve.
  2. Pull a model: ollama pull qwen3.5:3b
  3. Sanity-check it locally with ollama run qwen3.5:3b
  4. Verify the API with curl http://localhost:11434/api/version
  5. Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
  6. Use the exact local model name, for example qwen3.5:3b

Fast path

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

  • Launch the Ollama app when you log in.
  • Let it keep the server running in the background.
  • Use the CLI when you need to pull, inspect, stop, or remove models.
  • Point your editors and agents at the same local API instead of running separate model hosts.

No comments yet.