Part of C in 100s

C in 100 Seconds: String Tokenizing with strtok

Celest KimCelest Kim

Video: C in 100 Seconds: String Tokenizing — strtok | Episode 33 by Taught by Celeste AI - AI Coding Coach

Take the quiz on the full lesson page
Test what you've read · interactive walkthrough

C strtok: String Tokenizing

strtok(str, delim) returns the first token; subsequent calls with NULL return the next. Modifies the input string. Not thread-safe — use strtok_r for that.

strtok splits a string into tokens by replacing delimiters with \0. It's the standard way to parse simple delimited input — but it has surprising side effects.

The basic shape

#include <stdio.h>
#include <string.h>

int main() {
  char sentence[] = "The quick brown fox";
  char *word = strtok(sentence, " ");
  while (word != NULL) {
    printf("%s\n", word);
    word = strtok(NULL, " ");
  }

  return 0;
}

Output:

The
quick
brown
fox

How it works

strtok is stateful — it remembers the last position internally between calls.

char *strtok(char *str, const char *delim);
  • First call: str is the string to tokenize. Returns the first token.
  • Subsequent calls: pass NULL as str. Returns the next token.
  • End: returns NULL when no more tokens.

Each call:

  1. Skip leading delimiters.
  2. Find the next delimiter character.
  3. Replace it with \0.
  4. Save the position for next call.
  5. Return pointer to the token.

strtok modifies the input

char str[] = "a,b,c";
strtok(str, ",");
// str is now "a\0b,c" — the first comma replaced with \0

The function inserts \0 characters into the original string. After tokenizing, the original is destroyed — what's left is a series of separate strings sharing storage.

For string literals: strtok("a,b,c", ",") is undefined — string literals are read-only.

If you need the original, copy first:

char *copy = strdup(original);
strtok(copy, ",");
// ... use tokens ...
free(copy);

Multiple delimiters

char data[] = "red;green,blue:yellow";
char *color = strtok(data, ";,:");
while (color != NULL) {
  printf("%s\n", color);
  color = strtok(NULL, ";,:");
}

Output:

red
green
blue
yellow

The delimiter argument is a character set. Any character in the set works as a separator. ";,:" means "; or , or :".

Useful for parsing input with multiple separators (like whitespace + punctuation).

CSV parsing

char csv[] = "Alice,30,Engineer";
char *field = strtok(csv, ",");
while (field != NULL) {
  printf("Field: %s\n", field);
  field = strtok(NULL, ",");
}

Works for simple CSV — no quoting, no escaped commas. Real CSV (RFC 4180) needs proper quoting handling: "Alice, Smith",30,"Engineer". For real CSV, use a library or write a hand-coded parser.

strtok is not thread-safe

// Thread 1
strtok(str1, ",");
strtok(NULL, ",");

// Thread 2 (concurrent)
strtok(str2, ",");
// HAS race condition — both threads share the internal state

strtok keeps state in a global. Two threads tokenizing different strings will interfere.

The thread-safe version: strtok_r (POSIX) or strtok_s (C11):

char *saveptr;
char *word = strtok_r(str, " ", &saveptr);
while (word != NULL) {
  printf("%s\n", word);
  word = strtok_r(NULL, " ", &saveptr);
}

strtok_r takes an extra char ** to store state — no global. Each thread gets its own saveptr.

For new code, prefer strtok_r.

Empty fields

char data[] = "a,,b";
char *p = strtok(data, ",");   // "a"
p = strtok(NULL, ",");          // "b" (skipped the empty middle!)

strtok skips empty tokens — multiple consecutive delimiters count as one. Useful for whitespace handling, surprising for CSV (where "a,,b" should give three fields including the empty middle).

For "preserve empty fields," strtok doesn't work. You'd write a manual loop or use strsep (BSD/POSIX):

char data[] = "a,,b";
char *p = data;
char *token;
while ((token = strsep(&p, ",")) != NULL) {
  printf("[%s]\n", token);
}
// Output: [a] [] [b]

strsep preserves empties; less portable than strtok.

Common mistakes

Calling strtok(NULL, ...) on the wrong "thread of state." strtok has one internal saveptr. If you start tokenizing string A, then start tokenizing string B in the middle, A's state is lost.

strtok(strA, ",");
char *aTok = strtok(NULL, ",");   // OK — continues strA

strtok(strB, ",");                 // resets to strB
strtok(NULL, ",");                 // continues strB; strA's progress is lost!

For two interleaved tokenizations, use strtok_r.

Modifying a string literal. strtok("a,b", ",") — undefined.

Forgetting strtok modifies the input. After tokenizing, your original string has \0s in it. Copy first if you need the original intact.

Skipped empty fields surprise. "a,,b" → "a", "b" (no empty middle). Use strsep if you need empties.

Not checking NULL. First call returns NULL only if the input is all-delimiters or empty. Subsequent calls return NULL when done.

What's next

Episode 34: sprintf and snprintf. Format into a string buffer instead of stdout. Critical for building log messages, paths, structured output.

Recap

strtok(str, delim) returns first token; subsequent strtok(NULL, delim) continue. Stateful — uses internal saveptr. Modifies input — replaces delimiters with \0. Skips empty tokens (multiple delimiters = one). Not thread-safe — use strtok_r (POSIX) or strtok_s (C11). For empty-preserving splits, use strsep. For real CSV with quotes, use a library.

Next episode: sprintf and snprintf.

Ready? Take the quiz on the full lesson page →
Test what you've learned. Watch the lesson and try the interactive quiz on the same page.