Skip to content

Proposed parsing refactor #3420

@gwhitney

Description

@gwhitney

Describe the concerns
There are a number of outstanding parsing/evaluation issues, such as #2631, which it would be convenient to work on in conjunction with #3374 (in particular the implementation of the quotient operator to be represented also by infix // ala Python -- that needs some specialized TeX representation, and right now it is tricky to to make that TeX handling uniform between quotient(a,b) and a // b). These items generally point to unifying FunctionNode and OperatorNode -- so I am planning to go ahead with that, presuming you are still willing on that, and I was thinking of calling the unified node a CallNode. (It's easiest to use a third name to make sure all instances of each have been handled in the unification.)

Anyhow, working on the unification of FunctionNode and OperatorNode led me to realize that there is currently redundancy between the big table in src/expression/operators.js and a number of small tables in src/expression/parse.js, such as the one in the (somewhat awkwardly named) function parseMultiplyDivideModulusPercentage. In addition, although there is a mechanism for adding custom nodes to the parser, it is not particularly well documented, and it is limited in scope: basically the custom nodes can only be created by syntax that looks like a function call. Thus, for example in our color computations, we would like to make expressions like #FF80C0 be color constants; this doesn't really interfere with using the # for comments, because you just need to avoid having 6 or 8 hex digits immediately after the # for a comment, and it reads very naturally. But right now there is no way to extend the parser to handle such things.

Proposal

  • Obtain all of the parser-driving symbol/precedence information from a single unified source of truth, like the one in operators.js.
  • Expose that big table in the mathjs configuration. Design question: should it be a big complicated property, perhaps named operators, directly in the config object? Or should we have a separate exposed parseConfig object, parallel to config that is just concerned with parsing? If so, should number and numberFallback properties move to parseConfig since they really only concern parsing expressions, so that config can deal with computation and parseConfig with parsing? Note that having the table of operators in the configuration would resolve Facilitate defining operator synonyms such as && for 'and', || for 'or' #2722, for example -- someone wanting to parse && as and would just need to insert the appropriate entry in the table for logical operators before calling parse.
  • Recommended addition to this proposal: move the tokenizer or at least tables for the tokenizer into config or parseConfiguration as well, so that syntax extensions like #FF80C0 for a color constant can be supported.
  • Recommended addition to this proposal: move each precedence level's "local parsing" function, e.g, parseAddSubtract, into the operators.js-style table, with an automatic facility to call the function at one higher level of precedence to get the subexpressions of the current precedence level; this additional refactor would facilitate run-time tweaking of the parser, easing development of parsing improvements and enabling other unforeseen syntax extensions on the part of clients of mathjs, beyond just the current custom nodes facility.
  • Mildly recommended addition to this proposal: eliminate the CustomNodes facility because once the precedence table is exposed, you can implement custom nodes by just inserting an entry in the table at the appropriate precedence that recognizes your custom syntax and returns a node of your choice (custom or not). For example, if you prefer EXPR if COND else ALTERNATE to COND ? EXPR : ALTERNATE, you could add an entry to the table just before the one for the ternary that recognizes this alternate syntax, but just returns a ConditionalNode.
  • fully document all customization opportunities that result from the refactor, including the custom nodes facility if it is kept.

I am going to embark on these refactors, and let you know how it goes; I welcome feedback in the meantime. Design question: do you consider the unification of OperatorNode and FunctionNode to CallNode a breaking change in and of itself, because mathjs does document and expose its list of valid node types? If so, would you like this refactor, presuming it is OK with you, to go in two steps, one that doesn't unify the nodes but does make the parser more DRY and configurable, followed by one that does unify the nodes that would have to go into mathjs 15?

Thanks for your thoughts on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions