Usage:
cfg-parser <path_to_cfg_file>
CFG(Context Free Grammar) parser cli app. Parses a CFG into an IR. From the IR you can:
- Add Production
- Transform to Chomsky Normal Form Start.
- Transform to Chomsky Normal Form Term.
- Transform to Chomsky Normal Form BIN.
- Apply Chomsky Normal Form Unit Transform.
Many available CFG parser use a slightly altered syntax.
This parser stays true to the standard EBNF grammar of the CFG notation.
See tests/basic_expr.txt
and cand.cfg
for example CFG files.
I made this to validate the CFG of the C& programming language for the C& Compiler's parser.
There are many great online CFG parsers, but due to the fact that they are online - tend to bug out with large inputs. Specifically a full(complex) language's cfg will generate titanic LR tables that crash the website.
When designing a language at the start its easy to prototype a simple parser. Once you get to handling the operators including precedence,associativity and operation a proper CFG and EBNF grammar is reccomended for reference. Its not possible to wholly respresent a non context free language with, CFG but it allows you to break down the language into peices and reference them when implementing the parser without second guessing the outcome.
This was speed-coded in a 2 days as I was deep into the C& parser and studying automata theory at the time. Not sure if there are any edge cases that arent handled but it works.
EBNF notation for a context free grammar's grammar.
<syntax> ::= <rule_decl>*
<rule_decl> ::= <nonterminal> <opt_blank> <RIGHT_ARROW> <opt_blank> <or> <opt_blank> <FULL_STOP> <opt_blank> <opt_newline>
<alternative> ::= <op_set> <opt_blank> (<VERTICAL_LINE> <opt_blank> <alternative>)*
<alternative> ::= <LEFT_PARENTHESIS> <opt_blank> <alternative> <opt_blank> <RIGHT_PARENTHESIS>
<alternative> ::= <op_set>
<op_set> ::= ( <terminal> <opt_blank> | <nonterminal> <opt_blank> )*
<nonterminal> ::= <KW_E> | <IDENTIFIER>
<terminal> ::= <STRING_LITERAL>
<opt_blank> ::= <WHITESPACE>*
<opt_newline> ::= <NEWLINE>*
<KW_E> ::= "E"
<RIGHT_ARROW> ::= <HYPHEN_MINUS> <GREATER_THAN_SIGN>
Chomsky Normal Form : Start
Assert the special syntax rule 'S' exists.
Add new start symbol: S0 → S
Chomsky Normal Form : Term
Expand rules with non-solitary terminals:
RULE -> NON-TERM(0) NON-TERM(TERM-1) TERM NONTERM(TERM+1) NONTERM(N)
For each first occurence of TERM create new rule:
NS[N] -> TERM
Replace all TERMs with a matching NS[N] inside the current rule.
Chomsky Normal Form : BIN
Expand right hand sides with more than 2 non-terminals
@note after the first step all left-hand-sides should be only
non-terminals, or a single terminal, E or S
R -> [non-term](i) [non-term](i+1)... [non-term](i+N)
Create new rules for i = 0 to (n-2) :
@R[i] -> [[non-term][i]] @R[i+1]
@R[i+1] -> [[non-term][i]] @R[i+2]
...
@R[i+[n-2]] -> [[non-term][n-1]] @R[n]
Apply Chomsky Normal Form Unit Transform
Replace all unit rules with their right hand side definition:
R -> [non-term]
For each alternative rule of [non-term] add a new rule inlining the
alternative definition:
[[R]_[non-term]_[i]] -> [[op-set][i]] for [i + N] in [R...]
Example Output for basic_expr
# parse
[kiSyntax]
[kiRuleDecl]
[ktIdentifier, S]
[kiAlternative]
[kiOpSet]
[ktIdentifier, term]
[kiOpSet]
[ktIdentifier, S]
[ktIdentifier, op_add]
[ktIdentifier, term]
[kiOpSet]
[ktIdentifier, op_add]
[ktIdentifier, term]
[kiRuleDecl]
[ktIdentifier, term]
[kiAlternative]
[kiOpSet]
[ktIdentifier, factor]
[kiOpSet]
[ktIdentifier, term]
[ktIdentifier, op_mul]
[ktIdentifier, factor]
[kiRuleDecl]
[ktIdentifier, factor]
[kiAlternative]
[kiOpSet]
[ktIdentifier, operand]
[kiOpSet]
[ktIdentifier, factor]
[ktStringLiteral, "^"]
[ktIdentifier, operand]
[kiRuleDecl]
[ktIdentifier, operand]
[kiAlternative]
[kiOpSet]
[ktIdentifier, number]
[kiOpSet]
[ktIdentifier, variable]
[kiOpSet]
[ktStringLiteral, "("]
[ktIdentifier, S]
[ktStringLiteral, ")"]
[kiRuleDecl]
[ktIdentifier, op_add]
[kiAlternative]
[kiOpSet]
[ktStringLiteral, "+"]
[kiOpSet]
[ktStringLiteral, "-"]
[kiRuleDecl]
[ktIdentifier, op_mul]
[kiAlternative]
[kiOpSet]
[ktStringLiteral, "*"]
[kiOpSet]
[ktStringLiteral, "/"]
S -> term
S -> S op_add term
S -> op_add term
factor -> operand
factor -> factor "^" operand
op_add -> "+"
op_add -> "-"
op_mul -> "*"
op_mul -> "/"
operand -> number
operand -> variable
operand -> "(" S ")"
term -> factor
term -> term op_mul factor
# start
R -> S
S -> term
S -> S op_add term
S -> op_add term
factor -> operand
factor -> factor "^" operand
op_add -> "+"
op_add -> "-"
op_mul -> "*"
op_mul -> "/"
operand -> number
operand -> variable
operand -> "(" S ")"
term -> factor
term -> term op_mul factor
# term
NS0 -> "^"
NS1 -> "("
NS2 -> ")"
R -> S
S -> term
S -> S op_add term
S -> op_add term
factor -> operand
factor -> factor NS0 operand
op_add -> "+"
op_add -> "-"
op_mul -> "*"
op_mul -> "/"
operand -> number
operand -> variable
operand -> NS1 S NS2
term -> factor
term -> term op_mul factor
# bin
@S -> S op_add
@S_1_0 -> op_add term
@factor -> factor NS0
@factor_1_0 -> NS0 operand
@operand -> NS1 S
@operand_1_0 -> S NS2
@term -> term op_mul
@term_1_0 -> op_mul factor
NS0 -> "^"
NS1 -> "("
NS2 -> ")"
R -> S
S -> term
S -> op_add term
factor -> operand
op_add -> "+"
op_add -> "-"
op_mul -> "*"
op_mul -> "/"
operand -> number
operand -> variable
term -> factor
# unit
@R -> term
@R1 -> op_add term
@S -> S op_add
@S -> factor
@S_1_0 -> op_add term
@factor -> factor NS0
@factor -> number
@factor1 -> variable
@factor_1_0 -> NS0 operand
@operand -> NS1 S
@operand_1_0 -> S NS2
@term -> term op_mul
@term_1_0 -> op_mul factor
NS0 -> "^"
NS1 -> "("
NS2 -> ")"
R -> S
S -> op_add term
op_add -> "+"
op_add -> "-"
op_mul -> "*"
op_mul -> "/"