Context Free Grammar

A context-free grammar is a 4-tuple

  • a finite set of variables
  • a finite set disjoint from of terminals
  • a set of rules, consisting of a variable and a string of variables and terminals
  • is the start variable

Given strings and a rule , yields , or derives or if or there are where

The language of the grammar is . A context-free language is a language that can be generated by a CFG.

Any regular language can be turned into a context free language. For arrow from state with character , add rule

A derivation is a left-most derivation if at every step, the leftmost terminal is expanded first. A grammar is ambiguous if multiple left-most derivations can generate the same string

Chomsky Normal Form

A CFG is in Chomsky normal form if every rule is of the following form, where is a terminal and are variables, and are not the start variable. Furthermore, is allowed

Every CFG can be written in Chomsky normal form

  1. Create a new start variable
  2. For all , remove this rule. Then for all , add (note that can contain ). Iterate
  3. For all , remove this rule. Whenever appears, add unless this was a previously removed unit rule.
  4. For all with , add rules , and . Replace terminals with and add

Pushdown Automata

A pushdown automata is a 6-tuple

  • , a finite set of states
  • , a finite input alphabet
  • , a finite stack alphabet
  • a transition function
  • the start state
  • the set of accept states

A pushdown automata accepts input where and there is a sequence of states and strings such that , , for , , where and for and , and .


CFLs are PDAs

Given a CFG , define a PDA . Given states and and symbols , . In order to transition from to by reading , popping , and reading the string , generate additional states and transitions such that

By this construction, writing means if is the state and is the next input symbol, then the PDA can read , pop then push onto the stack and go on to state .

Take , where is the set of additional states from the above construction. Let \textdollar is only supported in math mode\delta(q_{start},\varepsilon,\varepsilon)=\{(q_{loop},S,\text\textdollar)\}.

  1. If the stack has a variable,
  2. If the stack has a terminal,
  3. If the stack is the empty \textdollar is only supported in math mode\text\textdollar, \textdollar is only supported in math mode\delta(q_{loop},\varepsilon,\text\textdollar)=\{(q_{accept},\varepsilon)\}

PDAs are CFLs

Let PDA . Construct with and start variable

  1. For , , and , if contains and contains , add the rule
  2. For , add
  3. For , add the rule

Pumping Lemma for CFL

If is a CFL, then there is (pumping length) where if is a string in where , then can be divided into 5 pieces, where

  1. for each ,
  2. , and

Given a derivation , we see that , and we can repeat and as many or as few times as needed. Note that this only happens if the length of the string is such that the height of the parse tree must exceed the number of variables.