Transformers Js
Transformers.js enables running state-of-the-art machine learning models directly in JavaScript, both in browsers and Node.
Content
Transformers.js enables running state-of-the-art machine learning models directly in JavaScript, both in browsers and Node.js environments, with no server required.
When to Use This Skill
Use this skill when you need to:
- -Run ML models for text analysis, generation, or translation in JavaScript
- -Perform image classification, object detection, or segmentation
- -Implement speech recognition or audio processing
- -Build multimodal AI applications (text-to-image, image-to-text, etc.)
- -Run models client-side in the browser without a backend
Installation
NPM Installation
Browser Usage (CDN)
Core Concepts
1. Pipeline API
The pipeline API is the easiest way to use models. It groups together preprocessing, model inference, and postprocessing:
⚠️ Memory Management: All pipelines must be disposed with pipe.dispose() when finished to prevent memory leaks. See examples in Code Examples for cleanup patterns across different environments.
2. Model Selection
You can specify a custom model as the second argument:
Finding Models:
Browse available Transformers.js models on Hugging Face Hub:
- -All models: https://huggingface.co/models?library=transformers.js&sort=trending
- -By task: Add
pipeline_tagparameter - -Text generation: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending
- -Image classification: https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending
- -Speech recognition: https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending
Tip: Filter by task type, sort by trending/downloads, and check model cards for performance metrics and usage examples.
3. Device Selection
Choose where to run the model:
4. Quantization Options
Control model precision vs. performance:
Supported Tasks
Note: All examples below show basic usage.
Natural Language Processing
#### Text Classification
#### Named Entity Recognition (NER)
#### Question Answering
#### Text Generation
For streaming and chat: See [Text Generation Guide](./references/TEXT_GENERATION.md) for:
- -Streaming token-by-token output with
TextStreamer - -Chat/conversation format with system/user/assistant roles
- -Generation parameters (temperature, top_k, top_p)
- -Browser and Node.js examples
- -React components and API endpoints
#### Translation
#### Summarization
#### Zero-Shot Classification
Computer Vision
#### Image Classification
#### Object Detection
#### Image Segmentation
#### Depth Estimation
#### Zero-Shot Image Classification
Audio Processing
#### Automatic Speech Recognition
#### Audio Classification
#### Text-to-Speech
Multimodal
#### Image-to-Text (Image Captioning)
#### Document Question Answering
#### Zero-Shot Object Detection
Feature Extraction (Embeddings)
Finding and Choosing Models
Browsing the Hugging Face Hub
Discover compatible Transformers.js models on Hugging Face Hub:
Base URL (all models):
Filter by task using the pipeline_tag parameter:
| Task | URL |
|---|---|
| **Text Generation** | https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending |
| **Text Classification** | https://huggingface.co/models?pipeline_tag=text-classification&library=transformers.js&sort=trending |
| **Translation** | https://huggingface.co/models?pipeline_tag=translation&library=transformers.js&sort=trending |
| **Summarization** | https://huggingface.co/models?pipeline_tag=summarization&library=transformers.js&sort=trending |
| **Question Answering** | https://huggingface.co/models?pipeline_tag=question-answering&library=transformers.js&sort=trending |
| **Image Classification** | https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending |
| **Object Detection** | https://huggingface.co/models?pipeline_tag=object-detection&library=transformers.js&sort=trending |
| **Image Segmentation** | https://huggingface.co/models?pipeline_tag=image-segmentation&library=transformers.js&sort=trending |
| **Speech Recognition** | https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending |
| **Audio Classification** | https://huggingface.co/models?pipeline_tag=audio-classification&library=transformers.js&sort=trending |
| **Image-to-Text** | https://huggingface.co/models?pipeline_tag=image-to-text&library=transformers.js&sort=trending |
| **Feature Extraction** | https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers.js&sort=trending |
| **Zero-Shot Classification** | https://huggingface.co/models?pipeline_tag=zero-shot-classification&library=transformers.js&sort=trending |
Sort options:
- -
&sort=trending- Most popular recently - -
&sort=downloads- Most downloaded overall - -
&sort=likes- Most liked by community - -
&sort=modified- Recently updated
Choosing the Right Model
Consider these factors when selecting a model:
1. Model Size
- -Small (< 100MB): Fast, suitable for browsers, limited accuracy
- -Medium (100MB - 500MB): Balanced performance, good for most use cases
- -Large (> 500MB): High accuracy, slower, better for Node.js or powerful devices
2. Quantization
Models are often available in different quantization levels:
- -
fp32- Full precision (largest, most accurate) - -
fp16- Half precision (smaller, still accurate) - -
q8- 8-bit quantized (much smaller, slight accuracy loss) - -
q4- 4-bit quantized (smallest, noticeable accuracy loss)
3. Task Compatibility
Check the model card for:
- -Supported tasks (some models support multiple tasks)
- -Input/output formats
- -Language support (multilingual vs. English-only)
- -License restrictions
4. Performance Metrics
Model cards typically show:
- -Accuracy scores
- -Benchmark results
- -Inference speed
- -Memory requirements
Example: Finding a Text Generation Model
Tips for Model Selection
1. Start Small: Test with a smaller model first, then upgrade if needed
2. Check ONNX Support: Ensure the model has ONNX files (look for onnx folder in model repo)
3. Read Model Cards: Model cards contain usage examples, limitations, and benchmarks
4. Test Locally: Benchmark inference speed and memory usage in your environment
5. Community Models: Look for models by Xenova (Transformers.js maintainer) or onnx-community
6. Version Pin: Use specific git commits in production for stability:
Advanced Configuration
Environment Configuration (env)
The env object provides comprehensive control over Transformers.js execution, caching, and model loading.
Quick Overview:
Configuration Patterns:
For complete documentation on all configuration options, caching strategies, cache management, pre-downloading models, and more, see:
→ [Configuration Reference](./references/CONFIGURATION.md)
Working with Tensors
Batch Processing
Browser-Specific Considerations
WebGPU Usage
WebGPU provides GPU acceleration in browsers:
Note: WebGPU is experimental. Check browser compatibility and file issues if problems occur.
WASM Performance
Default browser execution uses WASM:
Progress Tracking & Loading Indicators
Models can be large (ranging from a few MB to several GB) and consist of multiple files. Track download progress by passing a callback to the pipeline() function:
Progress Info Properties:
For complete examples including browser UIs, React components, CLI progress bars, and retry logic, see:
→ [Pipeline Options - Progress Callback](./references/PIPELINE_OPTIONS.md#progress-callback)
Error Handling
Performance Tips
1. Reuse Pipelines: Create pipeline once, reuse for multiple inferences
2. Use Quantization: Start with q8 or q4 for faster inference
3. Batch Processing: Process multiple inputs together when possible
4. Cache Models: Models are cached automatically (see [Caching Reference](./references/CACHE.md) for details on browser Cache API, Node.js filesystem cache, and custom implementations)
5. WebGPU for Large Models: Use WebGPU for models that benefit from GPU acceleration
6. Prune Context: For text generation, limit max_new_tokens to avoid memory issues
7. Clean Up Resources: Call pipe.dispose() when done to free memory
Memory Management
IMPORTANT: Always call pipe.dispose() when finished to prevent memory leaks.
When to dispose:
- -Application shutdown or component unmount
- -Before loading a different model
- -After batch processing in long-running apps
Models consume significant memory and hold GPU/CPU resources. Disposal is critical for browser memory limits and server stability.
For detailed patterns (React cleanup, servers, browser), see [Code Examples](./references/EXAMPLES.md)
Troubleshooting
Model Not Found
- -Verify model exists on Hugging Face Hub
- -Check model name spelling
- -Ensure model has ONNX files (look for
onnxfolder in model repo)
Memory Issues
- -Use smaller models or quantized versions (
dtype: 'q4') - -Reduce batch size
- -Limit sequence length with
max_length
WebGPU Errors
- -Check browser compatibility (Chrome 113+, Edge 113+)
- -Try
dtype: 'fp16'iffp32fails - -Fall back to WASM if WebGPU unavailable
Reference Documentation
This Skill
- -[Pipeline Options](./references/PIPELINE_OPTIONS.md) - Configure
pipeline()withprogress_callback,device,dtype, etc. - -[Configuration Reference](./references/CONFIGURATION.md) - Global
envconfiguration for caching and model loading - -[Caching Reference](./references/CACHE.md) - Browser Cache API, Node.js filesystem cache, and custom cache implementations
- -[Text Generation Guide](./references/TEXT_GENERATION.md) - Streaming, chat format, and generation parameters
- -[Model Architectures](./references/MODEL_ARCHITECTURES.md) - Supported models and selection tips
- -[Code Examples](./references/EXAMPLES.md) - Real-world implementations for different runtimes
Official Transformers.js
- -Official docs: https://huggingface.co/docs/transformers.js
- -API reference: https://huggingface.co/docs/transformers.js/api/pipelines
- -Model hub: https://huggingface.co/models?library=transformers.js
- -GitHub: https://github.com/huggingface/transformers.js
- -Examples: https://github.com/huggingface/transformers.js/tree/main/examples
Best Practices
1. Always Dispose Pipelines: Call pipe.dispose() when done - critical for preventing memory leaks
2. Start with Pipelines: Use the pipeline API unless you need fine-grained control
3. Test Locally First: Test models with small inputs before deploying
4. Monitor Model Sizes: Be aware of model download sizes for web applications
5. Handle Loading States: Show progress indicators for better UX
6. Version Pin: Pin specific model versions for production stability
7. Error Boundaries: Always wrap pipeline calls in try-catch blocks
8. Progressive Enhancement: Provide fallbacks for unsupported browsers
9. Reuse Models: Load once, use many times - don't recreate pipelines unnecessarily
10. Graceful Shutdown: Dispose models on SIGTERM/SIGINT in servers
Quick Reference: Task IDs
| Task | Task ID |
|---|---|
| Text classification | `text-classification` or `sentiment-analysis` |
| Token classification | `token-classification` or `ner` |
| Question answering | `question-answering` |
| Fill mask | `fill-mask` |
| Summarization | `summarization` |
| Translation | `translation` |
| Text generation | `text-generation` |
| Text-to-text generation | `text2text-generation` |
| Zero-shot classification | `zero-shot-classification` |
| Image classification | `image-classification` |
| Image segmentation | `image-segmentation` |
| Object detection | `object-detection` |
| Depth estimation | `depth-estimation` |
| Image-to-image | `image-to-image` |
| Zero-shot image classification | `zero-shot-image-classification` |
| Zero-shot object detection | `zero-shot-object-detection` |
| Automatic speech recognition | `automatic-speech-recognition` |
| Audio classification | `audio-classification` |
| Text-to-speech | `text-to-speech` or `text-to-audio` |
| Image-to-text | `image-to-text` |
| Document question answering | `document-question-answering` |
| Feature extraction | `feature-extraction` |
| Sentence similarity | `sentence-similarity` |
---
This skill enables you to integrate state-of-the-art machine learning capabilities directly into JavaScript applications without requiring separate ML servers or Python environments.
FAQ
Discussion
Loading comments...