Ollama AI Guides
- Ollama Local Hosting — CLI + Desktop Guide
Ollama Local Hosting — CLI + Desktop Guide
Index
- Overview
- Setup
- Beginner usage
- Pro usage
- Cost savings guide
- Privacy guide
- Security guide
- Appendix
- Troubleshooting quick hits
Overview
When to use this combined guide.
- You want one local-only Ollama reference instead of separate desktop and terminal docs.
- You use the desktop app sometimes but still want CLI control for scripting and diagnostics.
- You want a single workflow for hosting models that local agents, editors, and scripts can reuse.
Mental model.
- The desktop app is the easiest way to keep Ollama running in the background.
- The CLI is the control surface for pulling models, inspecting them, building custom models, and testing the API.
- The local API is the shared interface your other tools use.
Setup
1) Install Ollama
- Install the Ollama desktop app or your platform’s preferred package.
- Verify the install:
ollama --version
2) Choose how you run it
Pick one of these normal local patterns:
- Desktop-first: open the Ollama app and let it keep the local server running in the background.
- CLI-first: run
ollama serveyourself in a terminal or through a service manager.
For direct terminal hosting:
ollama serve
That starts the local API on http://localhost:11434.
3) Pull and run a starter model
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama pulldownloads the model without opening a chat.ollama rundownloads it if needed and starts an interactive local session.
4) Verify the local API
Check the server itself:
curl http://localhost:11434/api/version
Check inference:
curl http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'
If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:
curl http://localhost:11434/v1/models
Beginner usage
- Start Ollama with either the desktop app or
ollama serve. - Pull a model:
ollama pull qwen3.5:3b - Sanity-check it locally with
ollama run qwen3.5:3b - Verify the API with
curl http://localhost:11434/api/version - Point your agent or tool at
http://localhost:11434orhttp://localhost:11434/v1 - Use the exact local model name, for example
qwen3.5:3b
Fast path
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
Pro usage
Daily desktop workflow
- Launch the Ollama app when you log in.
- Let it keep the server running in the background.
- Use the CLI when you need to pull, inspect, stop, or remove models.
- Point your editors and agents at the same local API instead of running separate model hosts.
This is the simplest setup if you want Ollama always available but still want terminal control.
Daily CLI workflow
Use the terminal when you want explicit control over everything:
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama ps
This is the better fit for headless machines, scripted workflows, and debugging.
Model lifecycle
ollama ls
ollama ps
ollama pull <model>
ollama show <model>
ollama stop <model>
ollama rm <model[:tag]>
Typical pattern:
ollama lsto see what is installedollama psto see what is loaded in memoryollama showto inspect size and family detailsollama stopto unload a modelollama rmto reclaim disk space
Custom models with Modelfile
Use a Modelfile when you want a reusable local model preset:
FROM qwen3.5:3b
PARAMETER temperature 0.1
SYSTEM You are a terse local coding assistant.
Build and run it:
ollama create my-local-coder -f Modelfile
ollama run my-local-coder
OpenAI-compatible clients
Many local tools can connect through Ollama’s OpenAI-compatible endpoint:
- Base URL:
http://localhost:11434/v1/ - API key: often required by the client, but ignored by Ollama
- Model: the exact local model name you installed
Example:
export OPENAI_BASE_URL=http://localhost:11434/v1/
export OPENAI_API_KEY=ollama
Background service patterns
- macOS/Windows: the desktop app is usually the easiest background service.
- Linux: a system service is usually cleaner than a long-lived shell.
Linux example:
sudo systemctl enable ollama
sudo systemctl start ollama
journalctl -u ollama -f
Docker hosting
Use Docker when you want isolation or a predictable local deployment:
docker run -d --name ollama -p 11434:11434 -v ollama:/root/.ollama ollama/ollama
docker exec -it ollama ollama run qwen3.5:3b
LAN access use sparingly
The safe default is local-only on 127.0.0.1:11434.
If you intentionally need LAN access:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Only do this on trusted networks, and preferably behind a proxy with auth.
Cost savings guide
- Use smaller local models for routine tasks and larger ones only when needed.
- Pre-pull the models you use often so you avoid cold-start delays.
- Run one heavy model at a time on constrained machines.
- Reuse one local Ollama host across your agents, editors, and scripts.
Privacy guide
For direct terminal hosting:
ollama serve
That starts the local API on http://localhost:11434.
3) Pull and run a starter model
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
ollama pulldownloads the model without opening a chat.ollama rundownloads it if needed and starts an interactive local session.
4) Verify the local API
Check the server itself:
curl http://localhost:11434/api/version
Check inference:
curl http://localhost:11434/api/generate \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3.5:3b","prompt":"Say hello from local Ollama","stream":false}'
If you use tools that expect OpenAI-compatible APIs, Ollama also exposes:
curl http://localhost:11434/v1/models
Beginner usage
- Start Ollama with either the desktop app or
ollama serve. - Pull a model:
ollama pull qwen3.5:3b - Sanity-check it locally with
ollama run qwen3.5:3b - Verify the API with
curl http://localhost:11434/api/version - Point your agent or tool at
http://localhost:11434orhttp://localhost:11434/v1 - Use the exact local model name, for example
qwen3.5:3b
Fast path
ollama serve
ollama pull qwen3.5:3b
ollama run qwen3.5:3b
Pro usage
Daily desktop workflow
- Launch the Ollama app when you log in.
- Let it keep the server running in the background.
- Use the CLI when you need to pull, inspect, stop, or remove models.
- Point your editors and agents at the same local API instead of running separate model hosts.
- Reference: https://github.com/pleb-devs/ai-guides