Lua with Neowin: String Patterns — %a %d %s %w, Quantifiers, match & gmatch | Episode 25
Video: Lua with Neowin: String Patterns — %a %d %s %w, Quantifiers, match & gmatch | Episode 25 by Taught by Celeste AI - AI Coding Coach
Lua String Patterns: %a, %d, %s, %w, and Quantifiers
Lua's regex-lite:
%ddigits,%aletters,%swhitespace,%wword chars. Quantifiers+,*,?.string.matchfor one match;string.gmatchfor iteration. Captures with parentheses.
Lua patterns are like simplified regular expressions — enough power for most parsing without the Cthulhu-summoning complexity of full regex. They're a separate language from Lua itself, used by string.find, string.match, string.gmatch, string.gsub.
Character classes
Patterns use %X (percent + letter) for character classes:
| Pattern | Matches |
|---|---|
%a |
letters (a-z, A-Z) |
%d |
digits (0-9) |
%s |
whitespace (space, tab, newline) |
%w |
word chars (letters + digits) |
%p |
punctuation |
%l |
lowercase letters |
%u |
uppercase letters |
%c |
control characters |
%x |
hex digits |
The uppercase variant negates: %D is "not a digit," %S is "not whitespace," etc.
% itself is the escape character. To match a literal %, use %%.
Extracting numbers
local text = "I have 3 cats and 12 dogs"
for num in string.gmatch(text, "%d+") do
print("Found: " .. num)
end
Output:
Found: 3
Found: 12
string.gmatch(s, pattern) returns an iterator over each match. The pattern %d+ is "one or more digits."
The + is a quantifier: "one or more of the previous pattern." Other quantifiers:
*— zero or more.+— one or more.-— zero or more, non-greedy.?— zero or one (optional).
Matching words
local sentence = "Hello World from Lua"
for word in string.gmatch(sentence, "%a+") do
print("Word: " .. word)
end
Output:
Word: Hello
Word: World
Word: from
Word: Lua
%a+ is "one or more letters." Whitespace and punctuation aren't matched, so they break the words apart.
Captures with parentheses
local date = "2025-03-15"
local year, month, day = string.match(date, "(%d+)-(%d+)-(%d+)")
print("Year: " .. year) -- "2025"
print("Month: " .. month) -- "03"
print("Day: " .. day) -- "15"
Parentheses ( ... ) capture the matched portion. string.match returns one value per capture group.
If there are no captures, string.match returns the entire match.
Validating an email (loosely)
local emails = { "alice@example.com", "bad-email", "bob@test.org", "@missing.com" }
for i = 1, #emails do
local email = emails[i]
local match = string.match(email, "%w+@%w+%.%w+")
if match then
print("Valid: " .. email)
else
print("Invalid: " .. email)
end
end
Pattern: %w+@%w+%.%w+ — word chars, @, word chars, . (escaped as %.), word chars.
Note %. not . — in patterns, . matches any character. Escape with % to match a literal dot.
This is a simplified email pattern; real email validation is much harder. It catches most obviously malformed inputs.
Custom character sets
for item in string.gmatch("apple,banana,cherry", "([^,]+)") do
print("Item: " .. item)
end
Output:
Item: apple
Item: banana
Item: cherry
[^,]+ is "one or more characters that are not a comma." Square brackets define a set; ^ at the start negates. The capture parentheses extract just the matched substring.
This is the standard "split on delimiter" pattern in Lua.
Anchors
^at the start of a pattern: match the start of the string.$at the end: match the end of the string.
print(string.match("hello world", "^hello")) -- "hello"
print(string.match("hello world", "world$")) -- "world"
print(string.match("hello world", "^world")) -- nil
For "starts with" or "ends with" checks, use anchors.
string.gsub: substitute
local s = "hello world"
print(string.gsub(s, "o", "0")) -- "hell0 w0rld" 2 (count of subs)
string.gsub(s, pattern, repl) replaces every match. Returns the new string and the number of substitutions.
The replacement can also be a function:
local s = "i have 3 cats and 12 dogs"
print(string.gsub(s, "%d+", function(n)
return tonumber(n) * 2
end))
-- "i have 6 cats and 24 dogs"
The function receives each match and returns the replacement.
Greedy vs non-greedy
local s = "<b>bold</b> text"
print(string.match(s, "<.+>")) -- "<b>bold</b>" (greedy: matches as much as possible)
print(string.match(s, "<.->")) -- "<b>" (non-greedy: matches as little as possible)
+ and * are greedy. - is the non-greedy zero-or-more (Lua doesn't have a non-greedy +).
For HTML/XML scraping, non-greedy is usually what you want. (Though for serious HTML, use a real parser.)
Pattern caveats
Lua patterns are not full regex:
- No alternation (no
|for "this OR that"). To match "cat" or "dog," you'd run twostring.findcalls or usestring.gsubwith a function. - No backreferences in patterns (but
%1works in replacement strings forgsub). - No lookahead/lookbehind.
For 90% of parsing, that's fine. For complex regex, use the lrexlib library or call out to a different language.
Common stumbles
Forgetting %. for literal dot. . matches any character. To match a literal ., escape it.
Treating patterns as regex. Many regex features (|, \b, \d) don't exist; the syntax is %a not \a.
Greedy matches gobbling too much. <.+> over <b>bold</b> matches everything. Use <.-> for the smallest match.
string.match returns nil on no match. Always check before using the result: local x = string.match(...) ; if x then ... end.
Using string.find for extraction. string.find returns positions; use string.match for the matched text.
What's next
Episode 26: file I/O. Read and write files with io.open, read, write, lines. The line-based read pattern that's the bread and butter of text processing.
Recap
Lua patterns: %d digits, %a letters, %s whitespace, %w word chars, uppercase to negate. Quantifiers + * ? -. Captures via parentheses. %. for literal dot. [^X]+ for "not X." ^ start, $ end. string.match for one match (with captures), string.gmatch iterates all, string.gsub substitutes. Patterns aren't full regex — no alternation, no lookahead.
Next episode: file I/O.