meet unexpected error when parsing the markdown, antlr preview shows different result from antlr-runtime #4846
-
hey, guys lexer grammar MarkdownLexer;
Letter: [a-zA-Z];
Digit: [0-9];
Space: ' ';
Tab: '\t';
Newline: '\r'? '\n';
WhiteSpace: [\r\n]+; grammar MarkdownRule;
import MarkdownLexer;
// TODO add url link
inline: plainText | inlineCode | emphasis;
// indent: Space{2,} | Tab;
indent: Space+ | Tab;
plainText: (Letter | Digit)+;
inlineCode: '`' plainText '`';
emphasis: bold | boldItalic | italic | strikeThrough | underline;
// TODO add urllink
boldTag: '**';
boldElement: (plainText | inlineCode | strikeThrough | italic);
bold: boldTag boldElement boldTag # boldSingle
| boldTag boldElement (indent boldElement)+ boldTag # boldMulti
;
// TODO add urllink
italicTag: '*';
italicElement: (bold | inlineCode | strikeThrough | plainText);
italic: italicTag italicElement italicTag # italicSingle
| italicTag italicElement (indent italicElement)+ italicTag # italicMutli
;
// TODO add urllink
strikeTag: '~~';
strikeThroughElement: (italic | bold | inlineCode | plainText);
strikeThrough: strikeTag strikeThroughElement strikeTag # strikeThroughSingle
| strikeTag strikeThroughElement (indent strikeThroughElement)+ strikeTag # strikeThroughMulti
;
// TODO adjust here
boldItalicTag: '***';
boldItalicElement: plainText;
boldItalic: boldItalicTag boldItalicElement boldItalicTag # boldItalicSingle
| boldItalicTag boldItalicElement (indent boldItalicElement)+ boldItalicTag # boldItalicMulti
;
// TODO add urllink
underlineTag: '__';
underlineElement: (plainText | bold | italic | inlineCode);
underline: underlineTag underlineElement underlineTag # underlineSingle
| underlineTag underlineElement (indent underlineElement)+ underlineTag # underlineMutli
;
when test with antlr preview, for example, test the nothing wrong, right ?? and you can check my visitor here now I am going to test with code private fun visitMarkdownElement(charStream: CharStream): MarkdownElement {
val lexer = MarkdownLexer(charStream)
val tokens = CommonTokenStream(lexer)
val parser = MarkdownRuleParser(tokens)
val tree = parser.inline()
val visitor = MarkdownVisitor()
return visitor.visit(tree)
} @Test
fun testBold() {
val input = CharStreams.fromString("**hello**")
val markdown = visitMarkdownElement(input)
println(markdown.toHTML())
} the result is
test again with @Test
fun testBold() {
val input = CharStreams.fromString("__hello__")
val markdown = visitMarkdownElement(input)
println(markdown.toHTML())
} the result is
you can check the full log
very strange, all the rules point to |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
plugins {
kotlin("jvm") version "2.1.20"
antlr
}
group = "com.aurora.markdown"
version = "1.0-SNAPSHOT"
repositories {
mavenCentral()
}
dependencies {
// https://mvnrepository.com/artifact/org.antlr/antlr4
antlr("org.antlr:antlr4:4.13.2")
// https://mvnrepository.com/artifact/org.antlr/antlr4-runtime
implementation("org.antlr:antlr4-runtime:4.13.2")
testImplementation(kotlin("test"))
}
sourceSets {
main {
antlr {
setSrcDirs(listOf("grammar"))
}
}
}
tasks.test {
useJUnitPlatform()
}
tasks.generateGrammarSource {
outputDirectory = file("src/main/java/com/aurora/markdown/grammar")
arguments = arguments + listOf("-visitor", "-package", "com.aurora.markdown.grammar")
}
tasks.generateTestGrammarSource {
outputDirectory = file("src/main/java/com/aurora/markdown/grammar")
arguments = arguments + listOf("-visitor", "-package", "com.aurora.markdown.grammar")
}
tasks.compileKotlin {
dependsOn(tasks.generateGrammarSource)
}
tasks.compileTestKotlin {
dependsOn(tasks.generateGrammarSource)
dependsOn(tasks.generateTestGrammarSource)
}
tasks.compileJava {
dependsOn(tasks.generateGrammarSource)
}
tasks.compileTestJava {
dependsOn(tasks.generateGrammarSource)
dependsOn(tasks.generateTestGrammarSource)
}
kotlin {
jvmToolchain(17)
} modified this |
Beta Was this translation helpful? Give feedback.
-
You're writing a grammar that is not following the standards to avoid all sorts of beginner issues. Here's what you should do.
|
Beta Was this translation helpful? Give feedback.
-
MArkdown is not a good candidate for an ANTLR grammar as it is very ambiguous within the language. It wasn't meant to be parsed like this. It definitely isn't a candidate for your first grammar - as you are seeing with the ambiguity. But don't think about the operators and context - think about the lexemes; '' is always a '' - in markdown, it's meaning depends on where you see it. Unless you have a specific reason to write another markdown parser, then I would pick some other language to start with. |
Beta Was this translation helpful? Give feedback.
You're writing a grammar that is not following the standards to avoid all sorts of beginner issues. Here's what you should do.
parser grammar MarkdownRuleParser;
.import MarkdownLexer;
withoptions { tokenVocab = MarkdownLexer; }
.