API Integration Rules
Intermediate
Standards for integrating Ollama's REST API into applications — health checks, error handling, timeout configuration, streaming patterns, and request parameter validation.
File Patterns
**/*.ts**/*.js**/*.py
This rule applies to files matching the patterns above.
Rule Content
rule-content.md
# API Integration Rules
## Rule
All Ollama API integrations MUST implement health checks, proper error handling, configurable timeouts, and streaming for interactive use cases.
## Health Check (Required)
```typescript
// ALWAYS check server health before first request
async function checkOllamaHealth(): Promise<boolean> {
try {
const response = await fetch('http://localhost:11434/api/version', {
signal: AbortSignal.timeout(5000),
});
return response.ok;
} catch {
return false;
}
}
```
## Error Handling (Required)
```typescript
// Good — comprehensive error handling
async function ollamaChat(model: string, messages: Message[]) {
const healthy = await checkOllamaHealth();
if (!healthy) {
throw new OllamaError('Ollama server not running. Start with: ollama serve');
}
try {
const response = await fetch('http://localhost:11434/api/chat', {
method: 'POST',
signal: AbortSignal.timeout(60000), // 60s for model loading
body: JSON.stringify({ model, messages, stream: false }),
});
if (!response.ok) {
const error = await response.json();
throw new OllamaError(`Ollama error: ${error.error}`);
}
return await response.json();
} catch (error) {
if (error instanceof DOMException && error.name === 'TimeoutError') {
throw new OllamaError('Request timed out — model may be loading');
}
throw error;
}
}
// Bad — no error handling
async function ollamaChatBad(prompt: string) {
const res = await fetch('http://localhost:11434/api/chat', {
method: 'POST',
body: JSON.stringify({ model: 'llama3.1', messages: [{ role: 'user', content: prompt }] }),
});
return res.json(); // No error handling, no timeout, no health check
}
```
## Timeout Configuration
| Operation | Recommended Timeout |
|-----------|-------------------|
| Health check | 5s |
| First request (model loading) | 60s |
| Subsequent requests | 30s |
| Streaming (per chunk) | 10s |
| Embedding generation | 15s |
## Streaming Rules
- ALWAYS use streaming (stream: true) for interactive UIs
- Parse each line as independent JSON
- Handle partial JSON chunks gracefully
- Implement a cancel mechanism for long-running generations
## Base URL Configuration
```typescript
// Good — configurable base URL
const OLLAMA_BASE_URL = process.env.OLLAMA_HOST || 'http://localhost:11434';
// Bad — hardcoded URL
const url = 'http://localhost:11434'; // Can't change without code edit
```
## Anti-Patterns
- No health check before requests (silent failures)
- No timeout (requests hang forever if server is down)
- Hardcoded localhost:11434 (can't use remote Ollama)
- Not handling model loading delay on first request
- Using stream: false for interactive applicationsFAQ
Discussion
Loading comments...