Regular language
A regular language is a formal language (i.e., a possibly infinite set of finite sequences of symbols from a finite alphabet) that satisfies the following equivalent properties:
- it can be accepted by a deterministic finite state machine
- it can be accepted by a nondeterministic finite state machine
- it can be accepted by an alternating finite automaton
- it can be described by a regular expression
- it can be generated by a regular grammar
- it can be generated by a prefix grammar
- it can be accepted by a read-only Turing machine
- it can be defined in monadic second-order logic
| Contents |
Regular languages over an alphabet
The collection of regular languages over an alphabet Σ is defined recursively as follows:
- the empty language Ø is a regular language.
- the empty string language { ε } is a regular language.
- For each a ∈ Σ, the singleton language { a } is a regular language.
- If A and B are regular languages, then A U B, A • B, and A* are regular languages.
- No other languages over Σ are regular.
All finite languages are regular. Other typical examples include the language consisting of all strings over the alphabet {a, b} which contain an even number of a's, or the language consisting of all strings of the form: several a's followed by several b's.
Closure properties
The results of the union, intersection and set-difference operations when applied to regular languages is itself a regular language; the complement of every regular language over its alphabet is a regular language as well. Reversing every string in a regular language yields another regular language. Concatenating two regular languages (in the sense of concatenating every string from the first language with every string from the second one) also yields a regular language. The shuffle operation, when applied to two regular languages, yields another regular language. The right quotient and the left quotient of a regular language by an arbitrary language is also regular.
Deciding whether a language is regular
To locate the regular languages in the Chomsky hierarchy, one notices that every regular language is context-free. The converse is not true: for example the language consisting of all strings having the same number of a's as b's is context-free but not regular. To prove that a language such as this is not regular, one uses the Myhill-Nerode theorem or the pumping lemma.
There are two purely algebraic approaches to defining regular languages. If Σ is a finite alphabet and Σ* denotes the free monoid over Σ consisting of all strings over Σ, f : Σ* → M is a monoid homomorphism where M is a finite monoid, and S is a subset of M, then the set f −1(S) is regular. Every regular language arises in this fashion.
If L is any subset of Σ*, one defines an equivalence relation ~ on Σ* as follows: u ~ v is defined to mean
- uw ∈ L if and only if vw ∈ L for all w ∈ Σ*
The language L is regular if and only if the number of equivalence classes of ~ is finite; if this is the case, this number is equal to the number of states of the minimal deterministic finite automaton accepting L.
External resource
- Department of Computer Science at the University of Western Ontario: Grail+, http://www.csd.uwo.ca/research/grail/. A software package to manipulate regular expressions, finite-state machines and finite languages. Free for non-commercial use.
| Automata theory: formal languages and formal grammars | |||
|---|---|---|---|
| Chomsky hierarchy | Grammars | Languages | Minimal automaton |
| Type-0 | Unrestricted | Recursively enumerable | Turing machine |
| Unrestricted | Recursive | Turing machine | |
| Type-1 | Context-sensitive | Context-sensitive | Linear-bounded |
| Type-2 | Context-free | Context-free | Pushdown |
| Type-3 | Regular | Regular | Finite |
| Each category of languages or grammars is a proper superset of the category directly beneath it. | |||
