Part of Learn Lua with NeoVim

Lua with Neowin: String Patterns — %a %d %s %w, Quantifiers, match & gmatch | Episode 25

Sandy LaneSandy Lane

Video: Lua with Neowin: String Patterns — %a %d %s %w, Quantifiers, match & gmatch | Episode 25 by Taught by Celeste AI - AI Coding Coach

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

Lua String Patterns: %a, %d, %s, %w, and Quantifiers

Lua's regex-lite: %d digits, %a letters, %s whitespace, %w word chars. Quantifiers +, *, ?. string.match for one match; string.gmatch for iteration. Captures with parentheses.

Lua patterns are like simplified regular expressions — enough power for most parsing without the Cthulhu-summoning complexity of full regex. They're a separate language from Lua itself, used by string.find, string.match, string.gmatch, string.gsub.

Character classes

Patterns use %X (percent + letter) for character classes:

Pattern Matches
%a letters (a-z, A-Z)
%d digits (0-9)
%s whitespace (space, tab, newline)
%w word chars (letters + digits)
%p punctuation
%l lowercase letters
%u uppercase letters
%c control characters
%x hex digits

The uppercase variant negates: %D is "not a digit," %S is "not whitespace," etc.

% itself is the escape character. To match a literal %, use %%.

Extracting numbers

local text = "I have 3 cats and 12 dogs"
for num in string.gmatch(text, "%d+") do
  print("Found: " .. num)
end

Output:

Found: 3
Found: 12

string.gmatch(s, pattern) returns an iterator over each match. The pattern %d+ is "one or more digits."

The + is a quantifier: "one or more of the previous pattern." Other quantifiers:

  • * — zero or more.
  • + — one or more.
  • - — zero or more, non-greedy.
  • ? — zero or one (optional).

Matching words

local sentence = "Hello World from Lua"
for word in string.gmatch(sentence, "%a+") do
  print("Word: " .. word)
end

Output:

Word: Hello
Word: World
Word: from
Word: Lua

%a+ is "one or more letters." Whitespace and punctuation aren't matched, so they break the words apart.

Captures with parentheses

local date = "2025-03-15"
local year, month, day = string.match(date, "(%d+)-(%d+)-(%d+)")
print("Year: " .. year)    -- "2025"
print("Month: " .. month)  -- "03"
print("Day: " .. day)      -- "15"

Parentheses ( ... ) capture the matched portion. string.match returns one value per capture group.

If there are no captures, string.match returns the entire match.

Validating an email (loosely)

local emails = { "alice@example.com", "bad-email", "bob@test.org", "@missing.com" }

for i = 1, #emails do
  local email = emails[i]
  local match = string.match(email, "%w+@%w+%.%w+")
  if match then
    print("Valid: " .. email)
  else
    print("Invalid: " .. email)
  end
end

Pattern: %w+@%w+%.%w+ — word chars, @, word chars, . (escaped as %.), word chars.

Note %. not . — in patterns, . matches any character. Escape with % to match a literal dot.

This is a simplified email pattern; real email validation is much harder. It catches most obviously malformed inputs.

Custom character sets

for item in string.gmatch("apple,banana,cherry", "([^,]+)") do
  print("Item: " .. item)
end

Output:

Item: apple
Item: banana
Item: cherry

[^,]+ is "one or more characters that are not a comma." Square brackets define a set; ^ at the start negates. The capture parentheses extract just the matched substring.

This is the standard "split on delimiter" pattern in Lua.

Anchors

  • ^ at the start of a pattern: match the start of the string.
  • $ at the end: match the end of the string.
print(string.match("hello world", "^hello"))   -- "hello"
print(string.match("hello world", "world$"))   -- "world"
print(string.match("hello world", "^world"))   -- nil

For "starts with" or "ends with" checks, use anchors.

string.gsub: substitute

local s = "hello world"
print(string.gsub(s, "o", "0"))   -- "hell0 w0rld"   2 (count of subs)

string.gsub(s, pattern, repl) replaces every match. Returns the new string and the number of substitutions.

The replacement can also be a function:

local s = "i have 3 cats and 12 dogs"
print(string.gsub(s, "%d+", function(n)
  return tonumber(n) * 2
end))
-- "i have 6 cats and 24 dogs"

The function receives each match and returns the replacement.

Greedy vs non-greedy

local s = "<b>bold</b> text"
print(string.match(s, "<.+>"))   -- "<b>bold</b>"  (greedy: matches as much as possible)
print(string.match(s, "<.->"))   -- "<b>"           (non-greedy: matches as little as possible)

+ and * are greedy. - is the non-greedy zero-or-more (Lua doesn't have a non-greedy +).

For HTML/XML scraping, non-greedy is usually what you want. (Though for serious HTML, use a real parser.)

Pattern caveats

Lua patterns are not full regex:

  • No alternation (no | for "this OR that"). To match "cat" or "dog," you'd run two string.find calls or use string.gsub with a function.
  • No backreferences in patterns (but %1 works in replacement strings for gsub).
  • No lookahead/lookbehind.

For 90% of parsing, that's fine. For complex regex, use the lrexlib library or call out to a different language.

Common stumbles

Forgetting %. for literal dot. . matches any character. To match a literal ., escape it.

Treating patterns as regex. Many regex features (|, \b, \d) don't exist; the syntax is %a not \a.

Greedy matches gobbling too much. <.+> over <b>bold</b> matches everything. Use <.-> for the smallest match.

string.match returns nil on no match. Always check before using the result: local x = string.match(...) ; if x then ... end.

Using string.find for extraction. string.find returns positions; use string.match for the matched text.

What's next

Episode 26: file I/O. Read and write files with io.open, read, write, lines. The line-based read pattern that's the bread and butter of text processing.

Recap

Lua patterns: %d digits, %a letters, %s whitespace, %w word chars, uppercase to negate. Quantifiers + * ? -. Captures via parentheses. %. for literal dot. [^X]+ for "not X." ^ start, $ end. string.match for one match (with captures), string.gmatch iterates all, string.gsub substitutes. Patterns aren't full regex — no alternation, no lookahead.

Next episode: file I/O.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.