Back to Blog

Text Processing with awk in Zsh — Complete Guide for Beginners

Sandy LaneSandy Lane

Video: Text Processing with awk in Zsh — Complete Guide for Beginners by Taught by Celeste AI - AI Coding Coach

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Zsh Lesson 19: awk — Text Processing for Beginners

awk '{print $1}' — print first field. -F, sets the field separator. $0 is whole line, $1...$NF are fields, NR is line number, NF is field count. BEGIN { ... } { ... } END { ... } blocks. The right tool for "do something with column N."

awk is a text processor. Each line is split into fields (default: whitespace-separated); you write small programs that act on each line.

Print whole line

cat employees.txt
# Alice Engineer 95000
# Bob Manager 110000
# Charlie Designer 75000

awk '{print $0}' employees.txt
# Alice Engineer 95000
# Bob Manager 110000
# Charlie Designer 75000

$0 is the whole line. awk '{print $0}' file is essentially cat.

Print specific fields

awk '{print $1, $3}' employees.txt
# Alice 95000
# Bob 110000
# Charlie 75000

$1, $2, ... $N are the fields. print separates with the output field separator (default: space). The comma in print $1, $3 produces space; without comma (print $1 $3) you'd get concatenation.

Reorder columns

awk '{print $2, $1}' employees.txt
# Engineer Alice
# Manager Bob
# Designer Charlie

Fields can be in any order in print.

Custom separator: -F

echo "alice:1001:/home/alice" | awk -F: '{print $1, $3}'
# alice /home/alice

-F: sets the input field separator. For CSV: -F,. For tab: -F'\t'.

For /etc/passwd-style:

awk -F: '{print $1, $7}' /etc/passwd
# users and their default shells

NF: number of fields

awk '{print NF, $0}' employees.txt
# 3 Alice Engineer 95000
# 3 Bob Manager 110000
# 3 Charlie Designer 75000

NF is the number of fields on the current line. $NF is the last field.

NR: line number

awk '{print NR": "$0}' employees.txt
# 1: Alice Engineer 95000
# 2: Bob Manager 110000
# 3: Charlie Designer 75000

awk 'NR > 1' file.csv     # skip header line

NR (record number) is incremented on each line.

Patterns: act on matching lines

awk '/Engineer|Designer/ {print $1}' employees.txt
# Alice
# Charlie

Without an action, the default action is {print}:

awk '/Engineer/' employees.txt
# Alice Engineer 95000

Same as grep "Engineer" — but with awk's field-aware power available when needed.

Conditions

awk '$3 > 80000 {print $1, $3}' employees.txt
# Alice 95000
# Bob 110000

$3 > 80000 is the condition; {print ...} is the action. Combine with && and ||:

awk 'NR > 1 && $2 > 20 {print $1, $2}' sales.csv

BEGIN and END blocks

awk 'BEGIN {print "=== Report ==="} {print $1} END {print "=== Done ==="}' employees.txt
# === Report ===
# Alice
# Bob
# Charlie
# === Done ===

BEGIN { ... } runs once before any input. END { ... } runs once after all input. Use for headers, footers, summaries.

Aggregating: sum a column

awk -F, 'NR > 1 {sum += $2} END {print "Total qty:", sum}' sales.csv
# Total qty: 120

The most useful awk pattern. Variables auto-initialize to 0 (numeric) or "" (string). Accumulate during processing, print in END.

Average:

awk 'NR > 1 {sum += $2; n++} END {print sum/n}' sales.csv

Counting occurrences

awk '{count[$1]++} END {for (key in count) print key, count[key]}' access.log

Awk has built-in associative arrays. count[$1]++ increments the counter keyed by the first field. After processing, iterate keys.

# Top 10 IPs by request count
awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' access.log \
  | sort -rn | head

printf for formatting

awk '{printf "%-10s $%d\n", $1, $3}' employees.txt
# Alice      $95000
# Bob        $110000
# Charlie    $75000

printf like in C — %s string, %d int, %f float, %-10s left-padded width 10.

If/else and ternary

awk 'NR > 1 {print $1, ($2 >= 30 ? "HIGH" : "LOW")}' sales.csv
# Mouse HIGH
# Keyboard LOW

Ternary cond ? a : b works inside actions. Or use if/else:

awk '{
  if ($3 >= 90000) print $1, "senior"
  else if ($3 >= 70000) print $1, "mid"
  else print $1, "junior"
}' employees.txt

Multi-line awk programs

For long awk scripts, use a file:

# script.awk
BEGIN {
  print "Processing..."
}

NR == 1 { next }    # skip header

$3 > 80000 {
  high++
  total += $3
}

END {
  print "High earners:", high
  print "Total HE salary:", total
}

Run:

awk -f script.awk employees.csv

A real-world example: log analysis

# access.log format: IP user date method url status size

awk '$5 >= 400 {print $1, $4, $5}' access.log    # 4xx/5xx requests

awk '{
  hits[$1]++
  bytes[$1] += $7
} END {
  for (ip in hits) printf "%-15s %5d hits %10d bytes\n", ip, hits[ip], bytes[ip]
}' access.log | sort -k2 -rn | head

Aggregate by IP, sort by hit count.

In pipes

df -h | awk 'NR > 1 {print $5, $6}'
# 27% /
# 12% /System/Volumes/Data
# ...

ps aux | awk '$3 > 5'           # processes using > 5% CPU

ls -la | awk '{print $9, $5}'    # filename and size

awk chains naturally with other commands.

awk vs cut vs sed

# Print 3rd column (whitespace-separated)
awk '{print $3}' file
cut -d' ' -f3 file        # cut requires a SINGLE delimiter character

# CSV
awk -F, '{print $3}' file
cut -d, -f3 file           # works fine for clean CSV

# Sub
sed 's/old/new/g'
awk '{gsub(/old/, "new"); print}'    # awk equivalent

cut is faster for simple "extract column N" with a single-char delimiter. awk handles whitespace-runs (any number of spaces/tabs) and complex logic.

Common stumbles

Single-quote inside the awk program. awk '{print "It's"}' breaks shell quoting. Use \047 for ' inside awk strings, or use double quotes outside: awk "{print \"It's\"}" (uglier).

Field separator confusion. -F sets INPUT separator. Output separator is OFS (default space). Set with -v OFS=,:

awk -F, -v OFS='|' '{print $1, $3}' file
# replaces commas with pipes in output

$1 is per-line, not per-call. Each record (line) is split into fields. $1 is the first field of the current line.

Zero-indexed? No — $1 is first, $0 is whole line. Bash arrays are 0-indexed; awk fields are 1-indexed.

Comparison with strings. $3 > "80000" does string comparison. $3 + 0 > 80000 forces numeric. Or $3 >= 80000 works too if the value looks numeric — awk coerces.

gsub for regex replace. awk '{gsub(/pattern/, "replacement"); print}' is awk's sed s///g equivalent.

Semicolons inside actions. awk '{x++; print x}' — semicolons separate statements. Newlines also work.

No semicolons between patterns. awk '/a/{print 1} /b/{print 2}' — no separator needed between rules. Spaces are fine.

What's next

Lesson 20: error handling. Exit codes, set -euo pipefail, trap.

Recap

awk '{action}' file — runs action per line. $0 whole line, $1...$NF fields, NF count, NR line number. -F, sets field separator. BEGIN { ... } and END { ... } for header/footer/aggregation. Auto-initializing counters (count[$1]++) and associative arrays. printf for formatted output. awk shines for column extraction and aggregation; sed for substitution; grep for searching.

Next lesson: error handling.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.