Ollama
Run large language models locally. Pull, run, create, and manage AI models on your own hardware.
38 commands
Browse by Topic
Install Ollama (macOS)
Install Ollama using Homebrew on macOS
Install Ollama (Linux)
Install Ollama on Linux using the official installer
Pull a model
Download the Llama 2 model for local inference
Run chat
Start an interactive chat with Llama 2
Pull a model
Download a model from the Ollama library.
Pull specific version
Download a specific model variant or size.
List installed models
Show all locally available models with sizes.
Remove a model
Delete a model from local storage.
Show model info
Display model details including parameters and license.
Copy model
Create a copy of a model with a new name.
List running models
Show models currently loaded in memory.
Start chat session
Start interactive chat with a model.
One-shot prompt
Run a single prompt and exit.
Pipe input
Process file content through the model.
Run with verbose output
Show generation statistics and timing.
Run code model
Use code-specialized model for programming tasks.
Run with system prompt
Set a persistent system prompt for the session.
Run vision model
Analyze images with multimodal models.
Start server
Start Ollama API server on default port 11434.
Bind to all interfaces
Allow connections from other machines.
Custom models directory
Store models in a custom location.
Check server status
Verify the server is running.
Enable GPU layers
Configure number of layers to offload to GPU.
Set max loaded models
Limit concurrent models in memory.
Generate completion
Generate text completion via REST API.
Chat completion
Multi-turn chat via REST API.
Generate embeddings
Create vector embeddings for text.
Non-streaming response
Get complete response without streaming.
List models via API
Get list of available models via REST.
OpenAI-compatible endpoint
Use OpenAI-compatible API format.
Set generation parameters
Configure sampling parameters in API call.
Create custom model
Build a custom model from a Modelfile.
Basic Modelfile
Simple Modelfile with system prompt and parameters.
Set context length
Configure model context window size.
Custom template
Define custom prompt template format.
Push to registry
Upload custom model to Ollama registry.
From quantized base
Create model from quantized variant for efficiency.
Set stop tokens
Define tokens that stop generation.
Discussion
Loading comments...