Proposed parsing refactor

**Describe the concerns**
There are a number of outstanding parsing/evaluation issues, such as #2631, which it would be convenient to work on in conjunction with #3374 (in particular the implementation of the `quotient` operator to be represented also by infix `//` ala Python -- that needs some specialized TeX representation, and right now it is tricky to to make that TeX handling uniform between `quotient(a,b)` and `a // b`). These items generally point to unifying FunctionNode and OperatorNode -- so I am planning to go ahead with that, presuming you are still willing on that, and I was thinking of calling the unified node a CallNode. (It's easiest to use a third name to make sure all instances of each have been handled in the unification.)

Anyhow, working on the unification of FunctionNode and OperatorNode led me to realize that there is currently redundancy between the big table in src/expression/operators.js and a number of small tables in src/expression/parse.js, such as the one in the (somewhat awkwardly named) function parseMultiplyDivideModulusPercentage. In addition, although there is a mechanism for adding custom nodes to the parser, it is not particularly well documented, and it is limited in scope: basically the custom nodes can only be created by syntax that looks like a function call. Thus, for example in our color computations, we would like to make expressions like `#FF80C0` be color constants; this doesn't really interfere with using the `#` for comments, because you just need to avoid having 6 or 8 hex digits immediately after the `#` for a comment, and it reads very naturally. But right now there is no way to extend the parser to handle such things.

**Proposal**
* Obtain all of the parser-driving symbol/precedence information from a single unified source of truth, like the one in operators.js.
* Expose that big table in the mathjs configuration. Design question: should it be a big complicated property, perhaps named `operators`, directly in the `config` object? Or should we have a separate exposed `parseConfig` object, parallel to `config` that is just concerned with parsing? If so, should `number` and `numberFallback` properties move to `parseConfig` since they really only concern parsing expressions, so that `config` can deal with computation and `parseConfig` with parsing? Note that having the table of operators in the configuration would resolve #2722, for example -- someone wanting to parse `&&` as `and` would just need to insert the appropriate entry in the table for logical operators before calling `parse`.
* Recommended addition to this proposal: move the tokenizer or at least tables for the tokenizer into `config` or  `parseConfiguration` as well, so that syntax extensions like `#FF80C0` for a color constant can be supported.
* Recommended addition to this proposal: move each precedence level's "local parsing" function, e.g, `parseAddSubtract`, into the operators.js-style table, with an automatic facility to call the function at one higher level of precedence to get the subexpressions of the current precedence level; this additional refactor would facilitate run-time tweaking of the parser, easing development of parsing improvements and enabling other unforeseen syntax extensions on the part of clients of mathjs, beyond just the current custom nodes facility.
* Mildly recommended addition to this proposal: eliminate the CustomNodes facility because once the precedence table is exposed, you can implement custom nodes by just inserting an entry in the table at the appropriate precedence that recognizes your custom syntax and returns a node of your choice (custom or not). For example, if you prefer `EXPR if COND else ALTERNATE` to `COND ? EXPR : ALTERNATE`, you could add an entry to the table just before the one for the ternary that recognizes this alternate syntax, but just returns a ConditionalNode.
* fully document all customization opportunities that result from the refactor, including the custom nodes facility if it is kept.

I am going to embark on these refactors, and let you know how it goes; I welcome feedback in the meantime. Design question: do you consider the unification of OperatorNode and FunctionNode to CallNode a breaking change in and of itself, because mathjs does document and expose its list of valid node types? If so, would you like this refactor, presuming it is OK with you, to go in two steps, one that doesn't unify the nodes but does make the parser more DRY and configurable, followed by one that does unify the nodes that would have to go into mathjs 15?

Thanks for your thoughts on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposed parsing refactor #3420

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Proposed parsing refactor #3420

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions