-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Describe the concerns
There are a number of outstanding parsing/evaluation issues, such as #2631, which it would be convenient to work on in conjunction with #3374 (in particular the implementation of the quotient
operator to be represented also by infix //
ala Python -- that needs some specialized TeX representation, and right now it is tricky to to make that TeX handling uniform between quotient(a,b)
and a // b
). These items generally point to unifying FunctionNode and OperatorNode -- so I am planning to go ahead with that, presuming you are still willing on that, and I was thinking of calling the unified node a CallNode. (It's easiest to use a third name to make sure all instances of each have been handled in the unification.)
Anyhow, working on the unification of FunctionNode and OperatorNode led me to realize that there is currently redundancy between the big table in src/expression/operators.js and a number of small tables in src/expression/parse.js, such as the one in the (somewhat awkwardly named) function parseMultiplyDivideModulusPercentage. In addition, although there is a mechanism for adding custom nodes to the parser, it is not particularly well documented, and it is limited in scope: basically the custom nodes can only be created by syntax that looks like a function call. Thus, for example in our color computations, we would like to make expressions like #FF80C0
be color constants; this doesn't really interfere with using the #
for comments, because you just need to avoid having 6 or 8 hex digits immediately after the #
for a comment, and it reads very naturally. But right now there is no way to extend the parser to handle such things.
Proposal
- Obtain all of the parser-driving symbol/precedence information from a single unified source of truth, like the one in operators.js.
- Expose that big table in the mathjs configuration. Design question: should it be a big complicated property, perhaps named
operators
, directly in theconfig
object? Or should we have a separate exposedparseConfig
object, parallel toconfig
that is just concerned with parsing? If so, shouldnumber
andnumberFallback
properties move toparseConfig
since they really only concern parsing expressions, so thatconfig
can deal with computation andparseConfig
with parsing? Note that having the table of operators in the configuration would resolve Facilitate defining operator synonyms such as && for 'and', || for 'or' #2722, for example -- someone wanting to parse&&
asand
would just need to insert the appropriate entry in the table for logical operators before callingparse
. - Recommended addition to this proposal: move the tokenizer or at least tables for the tokenizer into
config
orparseConfiguration
as well, so that syntax extensions like#FF80C0
for a color constant can be supported. - Recommended addition to this proposal: move each precedence level's "local parsing" function, e.g,
parseAddSubtract
, into the operators.js-style table, with an automatic facility to call the function at one higher level of precedence to get the subexpressions of the current precedence level; this additional refactor would facilitate run-time tweaking of the parser, easing development of parsing improvements and enabling other unforeseen syntax extensions on the part of clients of mathjs, beyond just the current custom nodes facility. - Mildly recommended addition to this proposal: eliminate the CustomNodes facility because once the precedence table is exposed, you can implement custom nodes by just inserting an entry in the table at the appropriate precedence that recognizes your custom syntax and returns a node of your choice (custom or not). For example, if you prefer
EXPR if COND else ALTERNATE
toCOND ? EXPR : ALTERNATE
, you could add an entry to the table just before the one for the ternary that recognizes this alternate syntax, but just returns a ConditionalNode. - fully document all customization opportunities that result from the refactor, including the custom nodes facility if it is kept.
I am going to embark on these refactors, and let you know how it goes; I welcome feedback in the meantime. Design question: do you consider the unification of OperatorNode and FunctionNode to CallNode a breaking change in and of itself, because mathjs does document and expose its list of valid node types? If so, would you like this refactor, presuming it is OK with you, to go in two steps, one that doesn't unify the nodes but does make the parser more DRY and configurable, followed by one that does unify the nodes that would have to go into mathjs 15?
Thanks for your thoughts on this.