-
Notifications
You must be signed in to change notification settings - Fork 1
Grammar rules
The grammar rules specify context-free grammar for the generated parser. A context-free grammar means the left hand side of each production only contains one symbol.
Syntax of the grammar rule is similiar to yacc:
[non terminal] : [rule] ( | [rule] )* ;
where [rule]
is a sequence of the following items:
- An identifier is a non terminal.
- A string is a token (terminal), referenced by its alias.
- An identifier parenthesised by
<>
is a token, referenced by its name. - Action. See Actions for a detailed explanation.
An action can appear both in the middle of a rule and in the end of it.
Since each symbol in a rule has a sematic value, they can be accessed by the sematic action by define a variable for them. You could use =
to assign a sematic value to a variable and use it:
expr: a = expr '+' b = expr { $$ = a + b; }
A non terminal should have a value, so by default, this value is the value of the first symbol of the rule, and null
if this rule is empty.
An action can also appear in the middle of a rule. In this case, only variables appear before this action can be used in this action.
Technically, there are only end-of-rule actions, and they are executed when reducing with this rule. So to handle middle rule actions, augment rules will be generated. For example,
A: 'a' { console.log("saw an 'a'"); } 'b';
will be converted to
A: 'a' @0 'b';
@0: { console.log("saw an 'a'"); };
after a token has been shifted, the parser will perform reductions util there's more than one possible choice. So in this example the block will be executed after the parser sees an a
. In this case if you need to transite lexical states after a certain token is encountered, be sure there's only one possible reduction after that token was read.
When a shift/reduce or reduce/reduce conflict is detected, tscc-compiler will first try to resolve it using operator precedence and rule priority (see below), if failed, it will chooce to shift if it is a shift/reduce conflict and chooce the rule that appears later to reduce with if it is a reduce/reduce conflict, and finally, report a warning.
Operator precedence is used to resolve shift/reduce conflicts, and provides a simpler way to define grammars. Use the following directives to define the associativities the tokens:
%left [token]+
%right [token]+
%nonassoc [token]+
where [token]
could reference a real token (use a string or an identifier parenthesised with <>
), or an identifier, which will define a pseudo token and set its priority. A pseudo token can be used to define rule priority, see below.
These three directives define the priority at the same time. Tokens appear in one directive have the same priority, while tokens appear in directives that appear later has higher priority.
To understand associativity and priority, consider the operator +, together with a production a: a + a
and an expression a + a + a + ...
. Without associativity, we certainly don't know how to reduce this array since there are many ways. But if + is left or right associative, we would repeatly reduce the leftmost or rightmost a + a
to a
, respectively. While if it is non associative, only one + is allowed in the expression at this level.
If a token's priority is defined, rules that contain this token would also be assigned with a priority, which is taken from the priority of the last token appears in a rule. So when a shift/reduce conflict is detected, the parser will choose to shift or reduce if the shift token's priority is greater or less than the rule's priority, respectively. If they are equal, the choice is based on the associativity of the shift token: do shift if right associative, reduce if left associative, and syntax error if non associative.
A rule's priority can also be defined manually. Use %prec [token]
at the end of a rule to let the rule's priority equal to [token]
. The most famous example for this is the grammar of if-else statement:
%right 'else'
%%
else_statement:
%prec 'else'
| 'else' statement
;
here 'else' is defined to be right associative, so the parser will choose to shift when it encounters a 'else' token.
You may also use pseudo tokens to define a rule's priority. For example, the unary expressions:
%left '+' '-'
%left '*' '/'
%left UNARY
%%
E:
E '+' E
| E '-' E
| E '*' E
| E '/' E
| '+' E %prec UNARY
| '-' E %prec UNARY
;
Because the priorities of unary + and - is higher than all the infix operators, but * and / are higher than + and -, makes them more prior than the two unary operators, which is undesired. So we must define the priorities of the two unary rules manually using a token more prior than * and /.