Ollama AI Guides

By plebdevs.com March 6, 2026

This is a single consolidated guide for running local models with Ollama and connecting tools and agents (especially Goose) to your local Ollama server. It merges the core Desktop, CLI, and Goose workflows into one practical "happy path + operations" reference.

Ollama Local Hosting — CLI + Desktop Guide

Ollama Local Hosting — CLI + Desktop Guide

Overview

When to use this combined guide.

You want one local-only Ollama reference instead of separate desktop and terminal docs.
You use the desktop app sometimes but still want CLI control for scripting and diagnostics.
You want a single workflow for hosting models that local agents, editors, and scripts can reuse.

Mental model.

The desktop app is the easiest way to keep Ollama running in the background.
The CLI is the control surface for pulling models, inspecting them, building custom models, and testing the API.
The local API is the shared interface your other tools use.

Setup

1) Install Ollama

Install the Ollama desktop app or your platform’s preferred package.
Verify the install:

ollama --version

2) Choose how you run it

Pick one of these normal local patterns:

Desktop-first: open the Ollama app and let it keep the local server running in the background.
CLI-first: run ollama serve yourself in a terminal or through a service manager.

For direct terminal hosting:

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

curl http://localhost:11434/api/version

Check inference:

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

This is the simplest setup if you want Ollama always available but still want terminal control.

Daily CLI workflow

Use the terminal when you want explicit control over everything:

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps

This is the better fit for headless machines, scripted workflows, and debugging.

Model lifecycle

ollama ls
ollama ps
ollama pull <model>
ollama show <model>
ollama stop <model>
ollama rm <model[:tag]>

Typical pattern:

ollama ls to see what is installed
ollama ps to see what is loaded in memory
ollama show to inspect size and family details
ollama stop to unload a model
ollama rm to reclaim disk space

Custom models with Modelfile

Use a Modelfile when you want a reusable local model preset:

FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.

Build and run it:

ollama create my-local-coder -f Modelfile
ollama run my-local-coder

OpenAI-compatible clients

Many local tools can connect through Ollama’s OpenAI-compatible endpoint:

Base URL: http://localhost:11434/v1/
API key: often required by the client, but ignored by Ollama
Model: the exact local model name you installed

Example:

export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama

Background service patterns

macOS/Windows: the desktop app is usually the easiest background service.
Linux: a system service is usually cleaner than a long-lived shell.

Linux example:

sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f

Docker hosting

Use Docker when you want isolation or a predictable local deployment:

docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b

LAN access use sparingly

The safe default is local-only on 127.0.0.1:11434.

If you intentionally need LAN access:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Only do this on trusted networks, and preferably behind a proxy with auth.

Cost savings guide

Use smaller local models for routine tasks and larger ones only when needed.
Pre-pull the models you use often so you avoid cold-start delays.
Run one heavy model at a time on constrained machines.
Reuse one local Ollama host across your agents, editors, and scripts.

Privacy guide

For direct terminal hosting:

ollama serve

That starts the local API on http://localhost:11434.

3) Pull and run a starter model

ollama pull qwen3.5:3b
ollama run qwen3.5:3b

ollama pull downloads the model without opening a chat.
ollama run downloads it if needed and starts an interactive local session.

4) Verify the local API

Check the server itself:

curl http://localhost:11434/api/version

Check inference:

curl http://localhost:11434/api/generate \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'

If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:

curl http://localhost:11434/v1/models

Beginner usage

Start Ollama with either the desktop app or ollama serve.
Pull a model: ollama pull qwen3.5:3b
Sanity-check it locally with ollama run qwen3.5:3b
Verify the API with curl http://localhost:11434/api/version
Point your agent or tool at http://localhost:11434 or http://localhost:11434/v1
Use the exact local model name, for example qwen3.5:3b

Fast path

ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b

Pro usage

Daily desktop workflow

Launch the Ollama app when you log in.
Let it keep the server running in the background.
Use the CLI when you need to pull, inspect, stop, or remove models.
Point your editors and agents at the same local API instead of running separate model hosts.

Reference: https://github.com/pleb-devs/ai-guides

#ai #guide #ollama #document

No comments yet.

Ollama AI Guides

§Ollama Local Hosting — CLI + Desktop Guide

§Index

§Overview

§Setup

§1) Install Ollama

§2) Choose how you run it

§3) Pull and run a starter model

§4) Verify the local API

§Beginner usage

§Pro usage

§Daily desktop workflow

§Daily CLI workflow

§Model lifecycle

§Custom models with Modelfile

§OpenAI-compatible clients

§Background service patterns

§Docker hosting

§LAN access use sparingly

§Cost savings guide

§Privacy guide

§3) Pull and run a starter model

§4) Verify the local API

§Beginner usage

§Pro usage

§Daily desktop workflow

Player-Coach DRI Audition v2 Addendum — 2-File SKILL Drift Fix

Mapping the Agent-Native Network: Why Only 1% of AIBTC Agents Have Nostr

Player-Coach DRI Audition — results-agent / Spectral Wolf

Ollama Local Hosting — CLI + Desktop Guide

Index

Overview

Setup

1) Install Ollama

2) Choose how you run it

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow

Daily CLI workflow

Model lifecycle

Custom models with Modelfile

OpenAI-compatible clients

Background service patterns

Docker hosting

LAN access use sparingly

Cost savings guide

Privacy guide

3) Pull and run a starter model

4) Verify the local API

Beginner usage

Pro usage

Daily desktop workflow