Ollama

Run large language models locally. Pull, run, create, and manage AI models on your own hardware.

38 commands

Browse by Topic

Getting Started

Quick setup and installation

4commands

Models

Model management

7commands

Run

Running models

7commands

Server

Server management

6commands

API

REST API

7commands

Modelfile

Custom models

7commands

Install Ollama (macOS)

$ brew install ollama

Install Ollama using Homebrew on macOS

Install Ollama (Linux)

$ curl -fsSL https://ollama.com/install.sh | sh

Install Ollama on Linux using the official installer

Pull a model

$ ollama pull llama2

Download the Llama 2 model for local inference

Run chat

$ ollama run llama2

Start an interactive chat with Llama 2

Pull a model

$ ollama pull llama2

Download a model from the Ollama library.

Pull specific version

$ ollama pull llama2:13b

Download a specific model variant or size.

List installed models

$ ollama list

Show all locally available models with sizes.

Remove a model

$ ollama rm llama2:7b

Delete a model from local storage.

Show model info

$ ollama show llama2

Display model details including parameters and license.

Copy model

$ ollama cp llama2 my-llama2

Create a copy of a model with a new name.

List running models

$ ollama ps

Show models currently loaded in memory.

Start chat session

$ ollama run llama2

Start interactive chat with a model.

One-shot prompt

$ ollama run llama2 "Explain quantum computing in simple terms"

Run a single prompt and exit.

Pipe input

$ cat document.txt | ollama run llama2 "Summarize this text"

Process file content through the model.

Run with verbose output

$ ollama run llama2 --verbose

Show generation statistics and timing.

Run code model

$ ollama run codellama "Write a Python function to sort a list"

Use code-specialized model for programming tasks.

Run with system prompt

$ ollama run llama2 --system "You are a helpful coding assistant"

Set a persistent system prompt for the session.

Run vision model

$ ollama run llava "Describe this image" ./photo.jpg

Analyze images with multimodal models.

Start server

$ ollama serve

Start Ollama API server on default port 11434.

Bind to all interfaces

$ OLLAMA_HOST=0.0.0.0:11434 ollama serve

Allow connections from other machines.

Custom models directory

$ OLLAMA_MODELS=/path/to/models ollama serve

Store models in a custom location.

Check server status

$ curl http://localhost:11434

Verify the server is running.

Enable GPU layers

$ OLLAMA_NUM_GPU=35 ollama serve

Configure number of layers to offload to GPU.

Set max loaded models

$ OLLAMA_MAX_LOADED_MODELS=2 ollama serve

Limit concurrent models in memory.

Generate completion

$ curl http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hello"}'

Generate text completion via REST API.

Chat completion

$ curl http://localhost:11434/api/chat -d '{"model": "llama2", "messages": [{"role": "user", "content": "Hi"}]}'

Multi-turn chat via REST API.

Generate embeddings

$ curl http://localhost:11434/api/embeddings -d '{"model": "llama2", "prompt": "text to embed"}'

Create vector embeddings for text.

Non-streaming response

$ curl http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hi", "stream": false}'

Get complete response without streaming.

List models via API

$ curl http://localhost:11434/api/tags

Get list of available models via REST.

OpenAI-compatible endpoint

$ curl http://localhost:11434/v1/chat/completions -d '{"model": "llama2", "messages": [{"role": "user", "content": "Hi"}]}'

Use OpenAI-compatible API format.

Set generation parameters

$ curl http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hi", "options": {"temperature": 0.7, "top_p": 0.9}}'

Configure sampling parameters in API call.

Create custom model

$ ollama create my-assistant -f Modelfile

Build a custom model from a Modelfile.

Basic Modelfile

$ FROM llama2 SYSTEM You are a helpful assistant. PARAMETER temperature 0.7

Simple Modelfile with system prompt and parameters.

Set context length

$ FROM llama2 PARAMETER num_ctx 4096

Configure model context window size.

Custom template

$ FROM llama2 TEMPLATE "{{ .System }}\n\nUser: {{ .Prompt }}\nAssistant:"

Define custom prompt template format.

Push to registry

$ ollama push username/my-model

Upload custom model to Ollama registry.

From quantized base

$ FROM llama2:7b-q4_0 SYSTEM You are a coding assistant.

Create model from quantized variant for efficiency.

Set stop tokens

$ FROM llama2 PARAMETER stop "User:" PARAMETER stop "Assistant:"

Define tokens that stop generation.

Discussion

Loading comments...

Ollama

Run large language models locally. Pull, run, create, and manage AI models on your own hardware.

38 commands

Browse by Topic

Getting Started

Quick setup and installation

4commands

Models

Model management

7commands

Run

Running models

7commands

Server

Server management

6commands

API

REST API

7commands

Modelfile

Custom models

7commands

Install Ollama (macOS)

$ brew install ollama

Install Ollama using Homebrew on macOS

Install Ollama (Linux)

$ curl -fsSL https://ollama.com/install.sh | sh

Install Ollama on Linux using the official installer

Pull a model

$ ollama pull llama2

Download the Llama 2 model for local inference

Run chat

$ ollama run llama2

Start an interactive chat with Llama 2

Pull a model

$ ollama pull llama2

Download a model from the Ollama library.

Pull specific version

$ ollama pull llama2:13b

Download a specific model variant or size.

List installed models

$ ollama list

Show all locally available models with sizes.

Remove a model

$ ollama rm llama2:7b

Delete a model from local storage.

Show model info

$ ollama show llama2

Display model details including parameters and license.

Copy model

$ ollama cp llama2 my-llama2

Create a copy of a model with a new name.

List running models

$ ollama ps

Show models currently loaded in memory.

Start chat session

$ ollama run llama2

Start interactive chat with a model.

One-shot prompt

$ ollama run llama2 "Explain quantum computing in simple terms"

Run a single prompt and exit.

Pipe input

$ cat document.txt | ollama run llama2 "Summarize this text"

Process file content through the model.

Run with verbose output

$ ollama run llama2 --verbose

Show generation statistics and timing.

Run code model

$ ollama run codellama "Write a Python function to sort a list"

Use code-specialized model for programming tasks.

Run with system prompt

$ ollama run llama2 --system "You are a helpful coding assistant"

Set a persistent system prompt for the session.

Run vision model

$ ollama run llava "Describe this image" ./photo.jpg

Analyze images with multimodal models.

Start server

$ ollama serve

Start Ollama API server on default port 11434.

Bind to all interfaces

$ OLLAMA_HOST=0.0.0.0:11434 ollama serve

Allow connections from other machines.

Custom models directory

$ OLLAMA_MODELS=/path/to/models ollama serve

Store models in a custom location.

Check server status

$ curl http://localhost:11434

Verify the server is running.

Enable GPU layers

$ OLLAMA_NUM_GPU=35 ollama serve

Configure number of layers to offload to GPU.

Set max loaded models

$ OLLAMA_MAX_LOADED_MODELS=2 ollama serve

Limit concurrent models in memory.

Generate completion

$ curl http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hello"}'

Generate text completion via REST API.

Chat completion

$ curl http://localhost:11434/api/chat -d '{"model": "llama2", "messages": [{"role": "user", "content": "Hi"}]}'

Multi-turn chat via REST API.

Generate embeddings

$ curl http://localhost:11434/api/embeddings -d '{"model": "llama2", "prompt": "text to embed"}'

Create vector embeddings for text.

Non-streaming response

$ curl http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hi", "stream": false}'

Get complete response without streaming.

List models via API

$ curl http://localhost:11434/api/tags

Get list of available models via REST.

OpenAI-compatible endpoint

$ curl http://localhost:11434/v1/chat/completions -d '{"model": "llama2", "messages": [{"role": "user", "content": "Hi"}]}'

Use OpenAI-compatible API format.

Set generation parameters

$ curl http://localhost:11434/api/generate -d '{"model": "llama2", "prompt": "Hi", "options": {"temperature": 0.7, "top_p": 0.9}}'

Configure sampling parameters in API call.

Create custom model

$ ollama create my-assistant -f Modelfile

Build a custom model from a Modelfile.

Basic Modelfile

$ FROM llama2 SYSTEM You are a helpful assistant. PARAMETER temperature 0.7

Simple Modelfile with system prompt and parameters.

Set context length

$ FROM llama2 PARAMETER num_ctx 4096

Configure model context window size.

Custom template

$ FROM llama2 TEMPLATE "{{ .System }}\n\nUser: {{ .Prompt }}\nAssistant:"

Define custom prompt template format.

Push to registry

$ ollama push username/my-model

Upload custom model to Ollama registry.

From quantized base

$ FROM llama2:7b-q4_0 SYSTEM You are a coding assistant.

Create model from quantized variant for efficiency.

Set stop tokens

$ FROM llama2 PARAMETER stop "User:" PARAMETER stop "Assistant:"

Define tokens that stop generation.

Discussion

Loading comments...