I’ve written a plugin to help analyze a log file. The log file can big, so the plugin needs to do some time-consuming operations. As a result, the plugin does a lot of caching. I’m wondering if what I’m doing is a best practice or if there is an example out there that I can pattern-match.
My current strategy is that all of my caches are contained in an object, and my central data structure is a dictionary. The keys of the dictionary are buffer_ids and the values are my big cache objects.
I have an EventListener subclass that implements on_load and on_modified (modifications to the log file are rare, so this is just to be safe). When either of those methods fire, it just pops the dictionary entry for that buffer_id. The caches will be regenerated if they are used.
My first issue is that on_modified is of course being called a lot. My plugin only works with buffers with a certain syntax, so it would be nice if the on_modified event never got generated for buffers with a different syntax. I don’t see a way to do that, though.
Right now, the caches are built lazily. If you try to use a particular feature, the plugin crunches on data until it can do what has been asked for (and the user waits), but then it caches the result in case the user asks for something similar again. It would be nice if the caches could be built in the background so the UI doesn’t pause. I’m very familiar with multi-threaded programming in general, but I’m new to python. In particular, I’m not sure how exactly the GIL is implemented (i.e., at what granularity thread switches can happen), or if that affects the necessary synchronization.
I assume that I can spawn a thread to start building data structures. I assume those structures would have to be locked so they aren’t used while they are being built, so the main thread might have to wait to acquire the lock, but once things are built, it should be very snappy.
Is there a primer or an example for how to build a data structure in a background thread for use in the main thread?
If you use a ViewEventListener instead of EventListener, you can tie the listener to the settings of a particular view. For example:
def is_applicable(cls, settings):
return "Python.sublime-syntax" in settings.get("syntax")
print("Modification in a python file: %s" % self.view.file_name())
In this listener, on_modified only gets triggered for Python files and not for other files. As view settings change, is_applicable() is re-invoked to see if the listener still applies, so it will create and drop listeners as needed.
If you weren’t using a specific syntax for your logs, you could apply any custom setting you want and check for that in is_applicable(); so you could have a regular listener with an on_load() that checks the path or filename of the file and sets a setting on the view which would make the other event listener pick it up and use it as well for example.
I’m not sure of anything specific regarding this off the top of my head, but anything that’s out there for Python 3.3.6 should apply to the plugin ecosystem.
Thanks for your help. If I make it work that way, will a new event listener be created/destroyed if the syntax for a view is changed? Or do I need something else to trigger when that happens?
As for threads, it looks like sublime wants to create/manage threads itself using the various *_async() methods, rather than having a plugin explicitly create threads. Is the basic idea to call sublime.set_timeout_async() to start a thread (or use one of the other async calls in an EventListener) and then have the threads call sublime.set_timeout() when they want to run something on the main thread? Do the threads need any explicit management like waiting for them to finish, or is that handled by sublime?
Every time the settings in a view change, the is_applicable method will be called and given a settings object for you to inspect, and the return value indicates if that listener applies or not. As long as the function returns True, the event listener is either created or left alone. If you return False, the event listener isn’t created at all, or destroyed if it previously was.
So in the example above, as soon as the syntax changes the event listener gets dropped, and it will get created afresh when the syntax changes back (or is initially set). You can also override __init__ to know when you’re being created and __del__ to know when you’re being garbage collected, if you explicitly want to know that. The signature for __init__ is __init__(self, view) if you want to go that route.
Internally Sublime has two threads that it uses (that are exposed to us, anyway), the main thread and the async thread. The main thread is where the UI runs, and is the thread in which commands and non-async events run while the async thread is the thread where async events trigger.
Both sublime.set_timeout() and sublime.set_timeout_async() allow you to add an item to a callback list that should be executed “sometime later”. The main thread handles execution of items from set_timeout() and the async thread handles items from set_timeout_async().
The important thing is that existing threads are responsible for executing items in each list (no new threads are created), so:
sublime.set_timeout(lambda: time.sleep(5), 1000)
If you do this, then after a 1 second delay the Sublime UI will hang for 5 seconds because the main thread is pausing for that long; as such if you have some non-trivial code that you want to execute, this isn’t the way to go.
On the other hand:
sublime.set_timeout_async(lambda: time.sleep(5), 1000)
Now after a 1 second delay, the code still executes but the UI doesn’t hang because it’s the async thread that is executing the sleep instead; in this case the interactive response to Sublime isn’t really affected, but any _async event in an event handler is going to be paused 5 seconds instead.
If what you’re going to do is semi-trivial and short lived, then this is a good way to offload that to have it happen without the UI being interrupted. If you know what you’re going to do is take some serious amount of time, it’s sub-optimal because you’re hanging other things instead.
That said, if the plugin is only for your own use, that may not be an issue; in the larger ecosystem doing something like the above affects not only you but also any other plugin that might be expecting to do a tiny amount of work in the background in a timely manner.
If you really want to do some sort of long-lived work in the background, you need to spawn your own thread to do it for you just as you would in a “normal” Python program. For example:
from threading import Thread
print("Doing some work")
sublime.set_timeout(lambda: print("Work is complete"))
When the plugin loads, it creates and starts a thread in the background that does a bit of work. That thread is handled by the Python runtime in the plugin host, so there’s no management needed on your part; once it’s done executing the Thread object gets garbage collected and goes away.
Depending on what your thread is doing, you may need to communicate back to something else that it’s completed now. Standard Python threading rules apply for how you’d do that, but this example uses set_timeout() as a cheap way to ensure that the final print gets handed off the main thread for execution.
I think I understand what’s doing on, but debugging has exposed one more difficulty.
I have a ViewEventListener with an on_modified() method that kicks off regeneration of my data structures. The way the regeneration works is:
Create a new thread object to start the computation
Inside a lock: if there is an existing thread computing results, send it an event to tell it to quit. Also, invalidate any existing data structures.
Call set_timeout_async (with zero delay) to schedule thread_management.
The thread management function
Does a join to clean up after any old computation task that was asked to quit.
Calls start() on the new thread.
The point of this is that I want to mark my data structures as invalid in the main thread (so they won’t accidentally be used by subsequent commands), but do potentially time-consuming operations like a thread join in async.
What I notice is that as long as I only type one character at a time, everything behaves exactly as expected. If I type two characters quickly, however, sublime hangs.
Of course, any time we have threads and locks, it’s very easy to imagine a deadlock, but that doesn’t seem to be what’s happening here. I loaded everything up with prints, and what I see when I type two characters quickly is that the first on_modifed() runs to completion and returns. And then, nothing else happens. Nothing of mine is running in the main thread, so I don’t think I could have hung the UI, yet the second on_modified() is never called, and the thread management function that should have been scheduled on async never happens.
It looks to me that sublime hangs if the buffer gets modified while the previous on_modified() call is still running. Testing this theory, if I change on_modified() to on_modified_async() and leave everything else alone, it all works fine.
Is there some gotcha with on_modified()? I don’t think on_modified_async() is a great replacement here because the modification invalidates my data structures, and I’m concerned that if it takes a while for on_modified_async to get called, the user might try to use the data structures in the meantime.
I don’t think there’s any gotcha with on_modified directly (that I remember anyway) other than modifying the buffer from within it will cause it to be called again right away, which can get you into some lockup type behaviour.
If you were using on_modified_async then a potential problem might be that the event gets handled on the same thread as set_timeout_async() things get executed, which might cause problems.
Apart from that, if you have a smallish example of code that doesn’t work that you can share, that might help in figuring out what’s going wrong.
Thanks again for the help. I’ve modified the code extensively since then, and I can’t recreate the behavior. Ultimately, I would up writing a thread-management thread so that all on_modified/on_modified_async needs to do now is push an entry into a queue. It made a lot of races easier to deal with, so the code works fine now whichever thread I use for on_modified.
As an experiment, I put a sleep in on_modified(), so I tested the theory that typing something while the previous on_modified was running was the cause of the hang, but that doesn’t seem to be the case. I guess it’s just a mystery.