Sublime Forum

Do I need a lexer/parser for my markup language?

#1

Hi All –

I have a custom markup language, and I’ve written a .sublime-syntax syntax for it… My furture plan is to build some functionality within Sublime Text to allow users to convert a document with this markup to other formats (like HTML)…

My question: What’s the best way for me to go forward here?

I’ve been assuming for awhile now that I’ll have to write a lexer (parser?) to accomplish the conversion (and also power other features, like linting, etc…) – but now that I’ve written the ST syntax, I’m wondering if I can somehow leverage it (or maybe some other python or built-in ST tooling) instead of reinventing the wheel…?

I’m hoping y’all can help me out here – I’ve been a web developer for 15 years, but this is my first time coding A) in python and B) a ST package… I don’t know what the best practices would be here, or how best to move forward and leverage what may already exist…

Thanks in advance for any insight you can provide!

0 Likes

#2

The syntax definition can help you on your way, but you’re going to need to write a parser.

Generally there are two parts to parsing: lexical analysis and the actual parser. The syntax definition is mostly lexical analysis, roughly speaking.

Writing a parsing will also help you understand your syntax in detail (and parsing languages in general) and will help you avoid problems in your design.

1 Like

#3

view.extract_tokens_with_scopes(sublime.Region(0, len(view))) yields something like

[
# (Region, scope)
((0, 1), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown punctuation.definition.heading.begin.markdown '), 
((1, 2), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown '), 
((2, 12), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown entity.name.section.markdown '), 
((12, 20), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown entity.name.section.markdown '), 
((20, 26), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown entity.name.section.markdown '), 
((26, 30), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown entity.name.section.markdown '), 
((30, 31), 'text.html.markdown meta.block-level.markdown markup.heading.1.markdown meta.whitespace.newline.markdown '), 
((31, 32), 'text.html.markdown '), 
((32, 33), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.begin.markdown '), 
((33, 35), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.begin.markdown '), 
((35, 42), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((42, 51), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((51, 58), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((58, 59), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((59, 66), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((66, 67), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.end.markdown '), 
((67, 68), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((68, 90), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((90, 91), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((91, 97), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((97, 98), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((98, 106), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((106, 107), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((107, 113), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((113, 114), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((114, 132), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((132, 133), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((133, 149), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((149, 150), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((150, 154), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((154, 155), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((155, 179), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((179, 180), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.end.markdown '), 
((180, 181), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.end.markdown '), 
((181, 182), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((182, 200), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((200, 201), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((201, 219), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((219, 220), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((220, 236), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((236, 237), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((237, 244), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((244, 245), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.end.markdown '), 
((245, 246), 'text.html.markdown meta.paragraph.markdown '), 
((246, 247), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.begin.markdown '), 
((247, 249), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.begin.markdown '), 
((249, 258), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((258, 259), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.end.markdown '), 
((259, 260), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((260, 282), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((282, 283), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((283, 292), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((292, 293), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((293, 295), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((295, 296), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((296, 314), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((314, 315), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((315, 342), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((342, 343), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.end.markdown '), 
((343, 344), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.end.markdown '), 
((344, 345), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((345, 366), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((366, 367), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((367, 375), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((375, 376), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((376, 394), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((394, 395), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((395, 404), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((404, 405), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.end.markdown '), 
((405, 406), 'text.html.markdown meta.paragraph.markdown '), 
((406, 407), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.begin.markdown '), 
((407, 409), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.begin.markdown '), 
((409, 419), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((419, 426), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((426, 427), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.end.markdown '), 
((427, 428), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((428, 450), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((450, 451), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((451, 460), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((460, 461), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((461, 462), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((462, 463), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((463, 481), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((481, 482), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((482, 509), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((509, 510), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.end.markdown '), 
((510, 511), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.end.markdown '), 
((511, 512), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((512, 533), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((533, 534), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((534, 542), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((542, 543), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((543, 561), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((561, 562), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((562, 571), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((571, 572), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.end.markdown '), 
((572, 573), 'text.html.markdown meta.paragraph.markdown '), 
((573, 574), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.begin.markdown '), 
((574, 576), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.begin.markdown '), 
((576, 584), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((584, 591), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((591, 592), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.end.markdown '), 
((592, 593), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((593, 615), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((615, 616), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((616, 622), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((622, 623), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((623, 630), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((630, 631), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((631, 649), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((649, 650), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((650, 684), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((684, 685), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.end.markdown '), 
((685, 686), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.end.markdown '), 
((686, 687), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((687, 705), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((705, 706), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((706, 724), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((724, 725), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((725, 741), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((741, 742), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((742, 746), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((746, 747), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((747, 753), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((753, 754), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((754, 761), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((761, 762), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.end.markdown '), 
((762, 763), 'text.html.markdown meta.paragraph.markdown '), 
((763, 764), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.begin.markdown '), 
((764, 766), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.begin.markdown '), 
((766, 773), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((773, 778), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((778, 779), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.end.markdown '), 
((779, 780), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((780, 802), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((802, 803), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((803, 809), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((809, 810), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((810, 815), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((815, 816), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((816, 834), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((834, 835), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((835, 881), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((881, 882), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.end.markdown '), 
((882, 883), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.end.markdown '), 
((883, 884), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((884, 902), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((902, 903), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((903, 921), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((921, 922), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((922, 938), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((938, 939), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((939, 949), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((949, 950), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.end.markdown '), 
((950, 951), 'text.html.markdown meta.paragraph.markdown '), 
((951, 952), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.begin.markdown '), 
((952, 954), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.begin.markdown '), 
((954, 961), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((961, 964), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((964, 969), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((969, 977), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((977, 983), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((983, 989), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.description.markdown '), 
((989, 990), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.image.end.markdown '), 
((990, 991), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((991, 1013), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((1013, 1014), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((1014, 1019), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((1019, 1020), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((1020, 1072), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown markup.underline.link.image.markdown '), 
((1072, 1073), 'text.html.markdown meta.paragraph.markdown meta.link.inline.description.markdown meta.image.inline.markdown punctuation.definition.metadata.end.markdown '), 
((1073, 1074), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.link.end.markdown '), 
((1074, 1075), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.begin.markdown '), 
((1075, 1096), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((1096, 1097), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((1097, 1105), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((1105, 1106), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((1106, 1110), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown markup.underline.link.markdown '), 
((1110, 1111), 'text.html.markdown meta.paragraph.markdown meta.link.inline.markdown punctuation.definition.metadata.end.markdown ')
]

But unless your syntax definition is really detailed and deterministic, it may be not helpful.

1 Like

#4

@nutjob2 thanks! I think that helps me understand the difference between lexing and parsing (and which one the syntax def qualifies as)

@jfcherng thanks, that looks pretty rad! if I may, can I ask you to expand on what you mean re: a syntax definition being deterministic? how might I determine my syntax def’s level of determinism?


so far, it seems like the general sentiment is that if regexes are enough to parse my language at or above a certain threshold, and if my syntax definition is well-built enough, then I might be able to use the lexed results of the syntax (by writing a simple parser against any available ‘extract scopes’ APIs available within ST)… and if not, then I might need to write a lexer and/or parser for myself… correct?

if I do end-up needing a more complex solution, does anyone know of any good python tools (either built-in or third-party) that I can leverage, so hopefully I won’t have to start entirely from scratch…?

0 Likes

#5

I don’t come up with a good example but a stupid one.

Consider you have the following code, which is obviously invalid, in JavaScript.

if if

Usually, the ST syntax only highlight keywords with regex like \bif\b so both of ifs will be highlighted as valid keywords. Their grammars are not strictly checked. This usually gives better “visual results” (so codes after that will not be flickering while typing).

Determinism is like something must be followed after another, otherwise invalid. Sush as \s*\( must be followed after if. Doing this, you are basically already writing a parser…

1 Like

#6

@jfcherng I see – that’s perfect, thank you for explaining :+1:

I think I’ve got enough info to work on, but if anyone else has anything to add, I’d be grateful for all the help/info/tips I can get :stuck_out_tongue:

0 Likes

#7

I’ve utilized extract_tokens_with_scopes for a custom DSL for the Piano plugin, seeing how I did that may be helpful:

1 Like

#8

@kingkeith very cool! this’ll definitely help, thx :+1:

0 Likes

#9

Writing a lexer and parser is a non-trivial task, and lightweight markup syntaxes tend to be quite difficult to parse using the standard approaches — from what I gather, all markdown parsers (for example) need custom code to handle edge cases, as opposed to the more classical BNF-Grammar approach using parser generators.

On the other hand, I seem to understand that you’r goal is converting to and from your syntax and other similar formats, which somehow simplifies your task — e.g. performance is not an issue, since translations are a once-only operation.

I advice you to consider relying on the pandoc CLI tool:

https://pandoc.org/

pandoc is a powerful cross-format converter supporting many syntaxes to convert to and from, and also outputs and accepts document in its own AST format. You could write a custom Lua filter (or in Haskell, Python or other languages) to handle the conversions — i.e. write a tool that converts a document from your syntax to pandoc’s AST format, and a filter that converts a document in pandoc AST to a well formatted document in your custom syntax.

I believe the pandoc approach is easier and more practical, for leveraging the AST you don’t have to write a parser and lexer, and you get the benefit of being able to convert to/from all formats supported by pandoc (present and future).

0 Likes

#10

@tajmone Thank you for this – I certainly knew about Pandoc, but have never used it and didn’t know it had it’s own intermediate format (for lack of a better term)… That’s really good to know!

0 Likes