Sublime Forum

Complicated Syntax YAML Structure

#21

That is what I’ve done, actually. There was ambiguity in what I said in the bit you quoted. When I said there was no way to do that in the language, what I meant if I used two pushes, there’s no way to find a construct that will end the declaration stage. There’s only the one lexical construct that ends the whole block. Thus if I push after begin then I’ve got no way to come back. The end statement ends everything.

Sorry for the confusion. Using set does a pop and push so I never go above 1 context on the stack.

0 Likes

#22

Note that due how the meta declarations work, your begin token will receive the meta scopes of both contexts. Unfortunately there is no easy way around that and you’ll probably need an additional empty context like @rwols said with an empty match , which will then “eat” the meta scope of the second context it pushes.

0 Likes

#23

Well, it’s working okay with the 1 push at the outset and then a set at the trasitional point mid structure. Only thing I wish I could do is properly check for the closing identifier validity, but without a way to create a persistent capture, I’ve given up on that goal.

It’s been a good effort because the exact same structure happens with processes (a small block of sequential behavior in a larger block of concurrent behavior) and then a more complicated variation with if/then/else/elsif/end it and there I’ve managed to identify the conditional and statements sections of the blocks along with nested behavior so I think it’s a decent model.

0 Likes

#24

That structure looks very similar to Ada which I am just starting to attempt. See my new post:
https://forum.sublimetext.com/t/is-is-possible-to-create-syntax-definitions-where-contexts-can-refer-up-the-context-stack/32065

I initially did the multiple keyword match in one regexp, but don’t you loose the ability to put those keywords on separate lines because the match only matches single lines at a time. I assume that single parser tokens can be separated by any white-space including new-lines allowing a valid syntax expressed by a single parser token on its own line.

Have you come up with an elegant solution for this kind of structure? I’m also like to match the identifier after the ‘end’ keyword with the one in the definition - arch-identifier in your case.

1 Like

#25

Yeah, VHDL was based upon Ada, so they’re going to share very similar code structures.

Ultimately, no, I did not want to try to do single token at a time matches. It would be extremely tedious and the number of branching paths due to optional structures would turn it into a nightmare.

I opted for trying to stay with ‘expected usage’ which does require someone to stick to a coding style to some degree, but it’s one they should probably ALREADY be following, so I didn’t feel like it was too onerous a task. If someone wants to create hideous code, they can do so without my help :wink:

If you want to see what I’ve done, you can take a peek at https://github.com/Remillard/VHDL-Mode

Sorry it’s late and I keep picking up various things in your post. In several instances where I wished to have a declaration and body scope, I lose the capability to detect an mismatched end clause identifier. Without a way to temporarily save a capture, after you push the declaration context, and then set the body context, you lose the capture. I haven’t tried it because it’s theoretically “bad” practice to stack scopes for a structure, but I have to wonder if there was an overall meta.block context for a structure, and the subcontexts for declaration and body, if it might still be possible to capture the end . Plus this makes expand selection to scope work a little better. But according to scope naming practice, you don’t keep layering scopes for this reason. (Of course, you can do whatever you like).

Anyhow, I am reasonably satisfied with the lexing results I was able to achieve. It’s not as robust as a compiler per se where I’d be doing full tokenizing, but this is for an editor and like I said, we care about creating maintainable expressive code.

0 Likes

#26

If someone wants to create hideous code, they can do so without my help.

haha

Is it bad practice to have many scope levels? Does it slow the editor down? I was really hoping to follow the Ada BNF production rules and I would expect that would result in A LOT of scope levels.

After learning a bit more about this YAML stuff, it still sounds like it’s not going to do what I hoped it would. It doesn’t sound like it’s going to let me identify syntax problems without A LOT of effort and planning and then the result, I feel, would not be maintainable for when new features are added to the language. I created an Ada syntax highlighter for Notepad++ in about 15 minutes. I’m not sure spending 200 hours on this is going to get me anything better without new features like the ones I proposed. It sounds like you discovered the same thing.

I’ll take a look at your VHDL-Mode syntax. Thanks for sharing it.

0 Likes

#27

Well, the YAML is just a slightly different format (and less cumbersome) than the TextMate XML file. The basic lexical searching is the same; same context and scope stacks and that sort of thing. I don’t know about Notepad++ but we’re not just coloring keywords here, we’re actually putting in lexical structure in the background that (theoretically) can be leveraged to do other neat things.

Just sort of depends on what neat things you want to do I guess.

You could do just a keyword coloration in probably about 15 minutes or less because it’s quite easy to put in a list of those and just have it scope them as keyword.other.ada and bam, you’re done. And maybe that’s enough!

As for scoping, it’s not really a matter of speed or number of scopes, just I was going by the Scope Naming best practices in Sublime’s documentation. When you get way down in a structure, there are a lot of layered scopes.

So, just as a for instance, given this VHDL code (which should look a lot like Ada)

entity foobar is
			
end entity foobar;

architecture rtl of foobar is

begin
	
	MY_PROCESS : process (clk, reset)
	begin
		if (reset = '1') then
			a <= b;
			c <= d;
		elsif rising_edge(clk) then
			if (ce = '1') then
				e <= f;
			end if;
		end if;
	end process MY_PROCESS;
end architecture rtl;

If I put the cursor down at the e <= f; line I have the following scope:

vhdl-mode: source.vhdl meta.block.architecture.body.vhdl meta.block.process.body.vhdl meta.block.if.body.vhdl meta.block.if.body.vhdl meta.statement.assignment.signal.vhdl keyword.operator.assignment.vhdl 

That’s pretty fine grained, and that’s what I wanted to have. I think the nesting scope naming rules are to keep a single lexical structure from generating too many scopes. Where you see meta.block.architecture.body.vhdl honestly I could have probably gone with meta.block.architecture.vhdl meta.group.architecture.body.vhdl and subscoped the architecture structure. Same with process there because it too has a declaration portion and a body portion (even though this example didn’t use it. The relevant documentation that explains this is in the scope naming document (https://www.sublimetext.com/docs/3/scope_naming.html)

The entire scope of a function should be covered by one of the following scopes. Each variant should be applied to a specific part, and not stacked. For example, meta.function.php meta.function.parameters.php should never occur, but instead the scopes should alternate between meta.function.php then meta.function.parameters.php and back to meta.function.php.

I took this to read that I shouldn’t layer like I said. However, there is an issue with expand selection to scope where it starts popping scopes off the stack for selection and doing that, it’ll never get the entire meta.block though I think Thom has a fix for that.

Hope that helps! I do think you can do quite a lot with the syntax and it’s possible to make it simple, or really elaborate, as you will.

For what it’s worth, I do have an Ada book here. I don’t use it, but I have it mainly because it tickled me to have it because it was the starting baseline for VHDL. However point being, I can probably help out a bit if you still want to take on the syntax file task.

0 Likes

#28

That’s not the whole thing though. I mean yes, the old TextMate syntaxes also operate on a stack, but all contexts must be nested cleanly. You cannot replace the current context on the stack with another one. You cannot push multiple contexts at once. You cannot have more than a single pop pattern. You cannot have that pop pattern at any position other than either the first or the last pattern of a context.

Nothing more to say about this. You don’t need to use more than a single context (main) if you just want to highlight keywords, operators, numeric literals, strings et al. Highlighting will be very inaccurate, but the syntax will be really easy to look at, understand and modify.

0 Likes