Draft
Deterministic Finite Automata
Recognize simple string patterns with a finite set of states and one deterministic transition per input symbol.
Hook problem
Imagine a signup form that accepts a deliberately tiny email-like pattern: one or more local characters, one @, one or more domain characters, one ., then one or more suffix characters.
For this teaching model, letters and digits are char, @ is at, . is dot, and everything else is other. Dots are banned in the local part. This is not real RFC email validation; it is a small language for learning finite memory.
Accepted toy pattern
Rejected: dot too early
First naive idea
The first scan might use flags: seenAt, seenDot, hasLocal, hasDomain, and hasSuffix. That can be made correct, and it is a fair starting point.
The pain is that correctness becomes hidden inside combinations. hasLocal && seenAt && !hasDomain means “the next useful thing must be a domain character.” hasLocal && seenAt && hasDomain && seenDot && hasSuffix means “accepting if the input ends now.” The flags are not wrong; their meanings are just unnamed.
| Boolean combination | Named state | Meaning made explicit |
|---|---|---|
hasLocal && !seenAt | in-local | local part exists; @ is now allowed |
hasLocal && seenAt && !hasDomain | need-domain | the next useful symbol must be a domain char |
hasLocal && seenAt && hasDomain && !seenDot | in-domain | domain chars are flowing; one dot is now allowed |
hasLocal && seenAt && hasDomain && seenDot && !hasSuffix | need-suffix | the dot appeared; suffix must start |
hasLocal && seenAt && hasDomain && seenDot && hasSuffix | in-suffix | accepting if the input ends now |
Where it breaks
Small mistakes expose the hidden cases. ana@.ai has a local part and an @, but it reads dot before any domain character. ana@@cs.ai reads a second @. ana@cs. reaches the dot but never gets a suffix character.
A deterministic finite automaton makes each fragile case explicit: after ana@, the machine is in need-domain; on dot, the only next state is dead.
| Prefix | Read | Symbol | State | Meaning |
|---|---|---|---|---|
| ε | - | - | need-local | Nothing useful has been read yet; the next symbol must be a letter or digit. |
| a | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| an | n | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| ana | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| ana@ | @ | at | need-domain | The @ has appeared; the next symbol must be the first domain character. |
| ana@. | . | dot | dead | A fatal pattern mistake has happened; every later symbol keeps the machine rejected. |
| ana@.a | a | char | dead | A fatal pattern mistake has happened; every later symbol keeps the machine rejected. |
| ana@.ai | i | char | dead | A fatal pattern mistake has happened; every later symbol keeps the machine rejected. |
The core invention
A deterministic finite automaton, or DFA, replaces scattered flags with one current state. A state is a named summary of the prefix read so far: what facts matter, and what must still happen next.
For the toy validator, need-local means “no local character yet.” in-domain means “we have a local part, exactly one @, and at least one domain character, so the dot is now allowed.” dead means a fatal mistake has happened and this input cannot be repaired.
Nothing useful has been read yet; the next symbol must be a letter or digit.
At least one local character has appeared; more local characters or one @ can follow.
The @ has appeared; the next symbol must be the first domain character.
At least one domain character has appeared; more domain characters or one dot can follow.
The domain dot has appeared; the next symbol must be the first suffix character.
At least one suffix character has appeared; this is accepting only if the input ends here.
A fatal pattern mistake has happened; every later symbol keeps the machine rejected.
Visual model
The whole machine is a table: every state has exactly one next state for every input symbol category. That “exactly one” is the deterministic part.
For example, , while . The transition table is total: no case is left to an implicit default.
| State | char | at | dot | other |
|---|---|---|---|---|
| need-local | in-local | dead | dead | dead |
| in-local | in-local | need-domain | dead | dead |
| need-domain | in-domain | dead | dead | dead |
| in-domain | in-domain | dead | need-suffix | dead |
| need-suffix | in-suffix | dead | dead | dead |
| in-suffix | in-suffix | dead | dead | dead |
| dead | dead | dead | dead | dead |
Interactive visual demo
Try stepping through ana@cs.ai, ana@.ai, and ana@cs.ai.. Watch the text status as well as the colors: being in in-suffix is only enough when the input is exhausted.
Pending: acceptance is checked after all input is consumed.
- Current state
- need-local (need local char)
- Consumed prefix
- ε
- Remaining input
- ana@cs.ai
- Next symbol
- a -> char
Start before reading input. The machine still needs the first local character.
| Current | Prefix | Read | Symbol | State | Status |
|---|---|---|---|---|---|
| now | ε | - | - | need-local | Nothing useful has been read yet; the next symbol must be a letter or digit. |
| a | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. | |
| an | n | char | in-local | At least one local character has appeared; more local characters or one @ can follow. | |
| ana | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. | |
| ana@ | @ | at | need-domain | The @ has appeared; the next symbol must be the first domain character. | |
| ana@c | c | char | in-domain | At least one domain character has appeared; more domain characters or one dot can follow. | |
| ana@cs | s | char | in-domain | At least one domain character has appeared; more domain characters or one dot can follow. | |
| ana@cs. | . | dot | need-suffix | The domain dot has appeared; the next symbol must be the first suffix character. | |
| ana@cs.a | a | char | in-suffix | At least one suffix character has appeared; this is accepting only if the input ends here. | |
| ana@cs.ai | i | char | in-suffix | At least one suffix character has appeared; this is accepting only if the input ends here. |
| State | char | at | dot | other |
|---|---|---|---|---|
| need-local | in-local | dead | dead | dead |
| in-local | in-local | need-domain | dead | dead |
| need-domain | in-domain | dead | dead | dead |
| in-domain | in-domain | dead | need-suffix | dead |
| need-suffix | in-suffix | dead | dead | dead |
| in-suffix | in-suffix | dead | dead | dead |
| dead | dead | dead | dead | dead |
Formal version
After the trace intuition is in place, the mathematical object is compact:
Here Q is the finite set of states, Σ is the finite input alphabet, δ is the transition function, q0 is the start state, and F is the set of accepting states.
In this node, Q is the seven named states, Σ = { char, at, dot, other }, q0 = need-local, and F = { in-suffix }. The function δ is exactly the transition table above.
need-local, in-local, need-domain, in-domain, need-suffix, in-suffix, dead
char, at, dot, other
the table above
need-local
{ in-suffix }
Implementation sketch
The implementation mirrors the visible trace: classify one character, look up one table cell, replace the current state, and check acceptance only after the loop.
type DfaSymbol = "char" | "at" | "dot" | "other";
type DfaState =
| "need-local"
| "in-local"
| "need-domain"
| "in-domain"
| "need-suffix"
| "in-suffix"
| "dead";
function classify(input: string): DfaSymbol {
if (/^[A-Za-z0-9]$/.test(input)) return "char";
if (input === "@") return "at";
if (input === ".") return "dot";
return "other";
}
function accepts(input: string) {
let state: DfaState = "need-local";
for (const char of input) {
const symbol = classify(char);
state = transitionTable[state][symbol];
}
return state === "in-suffix";
}
classify one input character
look up transition[state][symbol]
replace the current state
accept only if final state is in F
Prefix invariant
After each prefix, the current state summarizes everything the rest of the computation needs to know. It does not remember the literal string ana@cs; it remembers the relevant phase: local part is done, @ is done, domain has started, dot has not happened yet.
That is the core invariant: for any two prefixes that land in the same state, every possible remaining suffix will lead to the same accept/reject answer.
| Prefix | Read | Symbol | State | Meaning |
|---|---|---|---|---|
| ε | - | - | need-local | Nothing useful has been read yet; the next symbol must be a letter or digit. |
| a | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| an | n | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| ana | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| ana@ | @ | at | need-domain | The @ has appeared; the next symbol must be the first domain character. |
| ana@c | c | char | in-domain | At least one domain character has appeared; more domain characters or one dot can follow. |
| ana@cs | s | char | in-domain | At least one domain character has appeared; more domain characters or one dot can follow. |
| ana@cs. | . | dot | need-suffix | The domain dot has appeared; the next symbol must be the first suffix character. |
| ana@cs.a | a | char | in-suffix | At least one suffix character has appeared; this is accepting only if the input ends here. |
| ana@cs.ai | i | char | in-suffix | At least one suffix character has appeared; this is accepting only if the input ends here. |
Complexity
For input length n, the DFA performs one transition per character, so the running time is . The transition table has entries; here that is 7 * 4. The working memory is beyond the input because the machine stores only the current state.
An input of length n is read left to right, one symbol at a time.
Each symbol triggers exactly one transition lookup, so time is O(n).
The finite table has |Q||Σ| cells: seven states times four symbol categories.
The runner stores only the current state beyond the input, so working memory is O(1).
Common confusions
Acceptance is final-state based. During ana@cs.ai., the prefix ana@cs.ai reaches in-suffix, which is accepting at that moment. But the input still has one more .. The machine must read it, and from in-suffix a dot moves to dead, so the whole string is rejected.
After the prefix ana@cs.ai the state is in-suffix, but the machine still must read the final dot. The last transition decides the whole string.
| Prefix | Read | Symbol | State | Meaning |
|---|---|---|---|---|
| ε | - | - | need-local | Nothing useful has been read yet; the next symbol must be a letter or digit. |
| a | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| an | n | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| ana | a | char | in-local | At least one local character has appeared; more local characters or one @ can follow. |
| ana@ | @ | at | need-domain | The @ has appeared; the next symbol must be the first domain character. |
| ana@c | c | char | in-domain | At least one domain character has appeared; more domain characters or one dot can follow. |
| ana@cs | s | char | in-domain | At least one domain character has appeared; more domain characters or one dot can follow. |
| ana@cs. | . | dot | need-suffix | The domain dot has appeared; the next symbol must be the first suffix character. |
| ana@cs.a | a | char | in-suffix | At least one suffix character has appeared; this is accepting only if the input ends here. |
| ana@cs.ai | i | char | in-suffix | At least one suffix character has appeared; this is accepting only if the input ends here. |
| ana@cs.ai. | . | dot | dead | A fatal pattern mistake has happened; every later symbol keeps the machine rejected. |
The dead state is not a punishment; it is a memory shortcut. Once a fatal mistake has happened, no remaining input can make this toy pattern valid, so every symbol loops back to dead.
| State | char | at | dot | other |
|---|---|---|---|---|
| need-local | in-local | dead | dead | dead |
| in-local | in-local | need-domain | dead | dead |
| need-domain | in-domain | dead | dead | dead |
| in-domain | in-domain | dead | need-suffix | dead |
| need-suffix | in-suffix | dead | dead | dead |
| in-suffix | in-suffix | dead | dead | dead |
| dead | dead | dead | dead | dead |
Connections in the graph
Graph basics and DFA diagrams both use circles and arrows, but they model different things. A graph edge can be any relationship among objects. A DFA arrow is a labeled transition between memory states while reading input.
Future nodes can build from here toward designing DFAs, regular languages, NFAs, and pumping-lemma limits. This page stays bounded to deterministic finite memory and trace-by-input acceptance.
Exercises
- Predict the final state for
a@b.cbefore checking the trace. - Why does
ana.@cs.aienterdeadbefore it ever sees@? - Which table entry rejects
ana@cs!? - What would need to change if local-part dots were allowed?
Will the final state accept?
Reveal answer
accept
Which state appears right after the local-part dot?
Reveal answer
dead
Which table row rejects the exclamation mark?
Reveal answer
δ(in-domain, other) = dead
Graph connections : Deterministic Finite Automata