Text Processing with awk in Zsh — Complete Guide for Beginners
Video: Text Processing with awk in Zsh — Complete Guide for Beginners by Taught by Celeste AI - AI Coding Coach
Zsh Lesson 19: awk — Text Processing for Beginners
awk '{print $1}'— print first field.-F,sets the field separator.$0is whole line,$1...$NFare fields,NRis line number,NFis field count.BEGIN { ... } { ... } END { ... }blocks. The right tool for "do something with column N."
awk is a text processor. Each line is split into fields (default: whitespace-separated); you write small programs that act on each line.
Print whole line
cat employees.txt
# Alice Engineer 95000
# Bob Manager 110000
# Charlie Designer 75000
awk '{print $0}' employees.txt
# Alice Engineer 95000
# Bob Manager 110000
# Charlie Designer 75000
$0 is the whole line. awk '{print $0}' file is essentially cat.
Print specific fields
awk '{print $1, $3}' employees.txt
# Alice 95000
# Bob 110000
# Charlie 75000
$1, $2, ... $N are the fields. print separates with the output field separator (default: space). The comma in print $1, $3 produces space; without comma (print $1 $3) you'd get concatenation.
Reorder columns
awk '{print $2, $1}' employees.txt
# Engineer Alice
# Manager Bob
# Designer Charlie
Fields can be in any order in print.
Custom separator: -F
echo "alice:1001:/home/alice" | awk -F: '{print $1, $3}'
# alice /home/alice
-F: sets the input field separator. For CSV: -F,. For tab: -F'\t'.
For /etc/passwd-style:
awk -F: '{print $1, $7}' /etc/passwd
# users and their default shells
NF: number of fields
awk '{print NF, $0}' employees.txt
# 3 Alice Engineer 95000
# 3 Bob Manager 110000
# 3 Charlie Designer 75000
NF is the number of fields on the current line. $NF is the last field.
NR: line number
awk '{print NR": "$0}' employees.txt
# 1: Alice Engineer 95000
# 2: Bob Manager 110000
# 3: Charlie Designer 75000
awk 'NR > 1' file.csv # skip header line
NR (record number) is incremented on each line.
Patterns: act on matching lines
awk '/Engineer|Designer/ {print $1}' employees.txt
# Alice
# Charlie
Without an action, the default action is {print}:
awk '/Engineer/' employees.txt
# Alice Engineer 95000
Same as grep "Engineer" — but with awk's field-aware power available when needed.
Conditions
awk '$3 > 80000 {print $1, $3}' employees.txt
# Alice 95000
# Bob 110000
$3 > 80000 is the condition; {print ...} is the action. Combine with && and ||:
awk 'NR > 1 && $2 > 20 {print $1, $2}' sales.csv
BEGIN and END blocks
awk 'BEGIN {print "=== Report ==="} {print $1} END {print "=== Done ==="}' employees.txt
# === Report ===
# Alice
# Bob
# Charlie
# === Done ===
BEGIN { ... } runs once before any input. END { ... } runs once after all input. Use for headers, footers, summaries.
Aggregating: sum a column
awk -F, 'NR > 1 {sum += $2} END {print "Total qty:", sum}' sales.csv
# Total qty: 120
The most useful awk pattern. Variables auto-initialize to 0 (numeric) or "" (string). Accumulate during processing, print in END.
Average:
awk 'NR > 1 {sum += $2; n++} END {print sum/n}' sales.csv
Counting occurrences
awk '{count[$1]++} END {for (key in count) print key, count[key]}' access.log
Awk has built-in associative arrays. count[$1]++ increments the counter keyed by the first field. After processing, iterate keys.
# Top 10 IPs by request count
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' access.log \
| sort -rn | head
printf for formatting
awk '{printf "%-10s $%d\n", $1, $3}' employees.txt
# Alice $95000
# Bob $110000
# Charlie $75000
printf like in C — %s string, %d int, %f float, %-10s left-padded width 10.
If/else and ternary
awk 'NR > 1 {print $1, ($2 >= 30 ? "HIGH" : "LOW")}' sales.csv
# Mouse HIGH
# Keyboard LOW
Ternary cond ? a : b works inside actions. Or use if/else:
awk '{
if ($3 >= 90000) print $1, "senior"
else if ($3 >= 70000) print $1, "mid"
else print $1, "junior"
}' employees.txt
Multi-line awk programs
For long awk scripts, use a file:
# script.awk
BEGIN {
print "Processing..."
}
NR == 1 { next } # skip header
$3 > 80000 {
high++
total += $3
}
END {
print "High earners:", high
print "Total HE salary:", total
}
Run:
awk -f script.awk employees.csv
A real-world example: log analysis
# access.log format: IP user date method url status size
awk '$5 >= 400 {print $1, $4, $5}' access.log # 4xx/5xx requests
awk '{
hits[$1]++
bytes[$1] += $7
} END {
for (ip in hits) printf "%-15s %5d hits %10d bytes\n", ip, hits[ip], bytes[ip]
}' access.log | sort -k2 -rn | head
Aggregate by IP, sort by hit count.
In pipes
df -h | awk 'NR > 1 {print $5, $6}'
# 27% /
# 12% /System/Volumes/Data
# ...
ps aux | awk '$3 > 5' # processes using > 5% CPU
ls -la | awk '{print $9, $5}' # filename and size
awk chains naturally with other commands.
awk vs cut vs sed
# Print 3rd column (whitespace-separated)
awk '{print $3}' file
cut -d' ' -f3 file # cut requires a SINGLE delimiter character
# CSV
awk -F, '{print $3}' file
cut -d, -f3 file # works fine for clean CSV
# Sub
sed 's/old/new/g'
awk '{gsub(/old/, "new"); print}' # awk equivalent
cut is faster for simple "extract column N" with a single-char delimiter. awk handles whitespace-runs (any number of spaces/tabs) and complex logic.
Common stumbles
Single-quote inside the awk program. awk '{print "It's"}' breaks shell quoting. Use \047 for ' inside awk strings, or use double quotes outside: awk "{print \"It's\"}" (uglier).
Field separator confusion. -F sets INPUT separator. Output separator is OFS (default space). Set with -v OFS=,:
awk -F, -v OFS='|' '{print $1, $3}' file
# replaces commas with pipes in output
$1 is per-line, not per-call. Each record (line) is split into fields. $1 is the first field of the current line.
Zero-indexed? No — $1 is first, $0 is whole line. Bash arrays are 0-indexed; awk fields are 1-indexed.
Comparison with strings. $3 > "80000" does string comparison. $3 + 0 > 80000 forces numeric. Or $3 >= 80000 works too if the value looks numeric — awk coerces.
gsub for regex replace. awk '{gsub(/pattern/, "replacement"); print}' is awk's sed s///g equivalent.
Semicolons inside actions. awk '{x++; print x}' — semicolons separate statements. Newlines also work.
No semicolons between patterns. awk '/a/{print 1} /b/{print 2}' — no separator needed between rules. Spaces are fine.
What's next
Lesson 20: error handling. Exit codes, set -euo pipefail, trap.
Recap
awk '{action}' file — runs action per line. $0 whole line, $1...$NF fields, NF count, NR line number. -F, sets field separator. BEGIN { ... } and END { ... } for header/footer/aggregation. Auto-initializing counters (count[$1]++) and associative arrays. printf for formatted output. awk shines for column extraction and aggregation; sed for substitution; grep for searching.
Next lesson: error handling.