Text Processing Pipeline with sed, awk, cut, and sort
Build text processing pipelines to transform, extract, and aggregate data from CSVs, TSVs, and structured text files using standard Unix tools.
Prerequisites
- -Bash with GNU coreutils
- -Structured text files (CSV, TSV, logs)
Steps
Extract columns from CSV/TSV data
Use cut or awk to extract specific columns from delimited files.
Use awk -F',' '{print $1, $3}' for more control over output formatting and delimiters.
Transform text with sed substitutions
Use sed to find and replace patterns, remove lines, or reformat text.
Use -E for extended regex (no need to escape parentheses). Use -i.bak to edit in place with a backup.
Sort and deduplicate data
Sort by specific columns and remove duplicate entries.
-t sets the delimiter, -k2,2 sorts by column 2, -k1,1n sorts column 1 numerically.
Aggregate and compute with awk
Calculate sums, averages, and counts from columnar data.
Build a multi-stage pipeline
Chain tools together to filter, transform, and summarize data in one command.
Read pipelines left to right: extract IPs, sort them, count unique occurrences, sort by count descending, show top 10.
Full Script
FAQ
Discussion
Loading comments...