Hugging Face Dataset Viewer
Use this skill for Hugging Face Dataset Viewer API workflows that fetch subset/split metadata, paginate rows, search text, apply filters, download parquet URLs, and read size or statistics.
Content
Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.
Core workflow
1. Optionally validate dataset availability with /is-valid.
2. Resolve config + split with /splits.
3. Preview with /first-rows.
4. Paginate content with /rows using offset and length (max 100).
5. Use /search for text matching and /filter for row predicates.
6. Retrieve parquet links via /parquet and totals/metadata via /size and /statistics.
Defaults
- -Base URL:
https://datasets-server.huggingface.co - -Default API method:
GET - -Query params should be URL-encoded.
- -
offsetis 0-based. - -
lengthmax is usually100for row-like endpoints. - -Gated/private datasets require
Authorization: Bearer <HF_TOKEN>.
Dataset Viewer
- -
Validate dataset:/is-valid?dataset=<namespace/repo> - -
List subsets and splits:/splits?dataset=<namespace/repo> - -
Preview first rows:/first-rows?dataset=<namespace/repo>&config=<config>&split=<split> - -
Paginate rows:/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int> - -
Search text:/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int> - -
Filter with predicates:/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int> - -
List parquet shards:/parquet?dataset=<namespace/repo> - -
Get size totals:/size?dataset=<namespace/repo> - -
Get column statistics:/statistics?dataset=<namespace/repo>&config=<config>&split=<split> - -
Get Croissant metadata (if available):/croissant?dataset=<namespace/repo>
Pagination pattern:
When pagination is partial, use response fields such as num_rows_total, num_rows_per_page, and partial to drive continuation logic.
Search/filter notes:
- -
/searchmatches string columns (full-text style behavior is internal to the API). - -
/filterrequires predicate syntax inwhereand optional sort inorderby. - -Keep filtering and searches read-only and side-effect free.
Querying Datasets
Use npx parquetlens with Hub parquet alias paths for SQL querying.
Parquet alias shape:
Derive <config>, <split>, and <shard> from Dataset Viewer /parquet:
Run SQL query:
SQL export
- -CSV:
--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.csv' (FORMAT CSV, HEADER, DELIMITER ',')" - -JSON:
--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.json' (FORMAT JSON)" - -Parquet:
--sql "COPY (SELECT * FROM data LIMIT 1000) TO 'export.parquet' (FORMAT PARQUET)"
Creating and Uploading Datasets
Use one of these flows depending on dependency constraints.
Zero local dependencies (Hub UI):
- -Create dataset repo in browser:
https://huggingface.co/new-dataset - -Upload parquet files in the repo "Files and versions" page.
- -Verify shards appear in Dataset Viewer:
Low dependency CLI flow (npx @huggingface/hub / hfjs):
- -Set auth token:
- -Upload parquet folder to a dataset repo (auto-creates repo if missing):
- -Upload as private repo on creation:
After upload, call /parquet to discover <config>/<split>/<shard> values for querying with @~parquet.
FAQ
Discussion
Loading comments...