Sublime Forum

Crawl files gobbling up memory on Linux

#1

My system starts to run out of memory when it’s been up for a while. Usually I just reboot but I did some investigation today. It turns out that my /dev/shm filesystem, which is a tmpfs, is using 19G! (The default config is to let it use up to 50% of memory, and my system has 64G.) /dev/shm contains over 100K files named *crawl* taking up 99.97% of the space. They contain what looks like a PID, and the PID in the most recent ones matches the currently running Sublime. However, there are thousands with an older PID that are left over from previous runs of Sublime, and when I quit the current Sublime the files it was using aren’t cleaned up. When I start Sublime again a new crop of crawl files appear in /dev/shm. Some of them start with sem. and are 32 bytes long, and the rest are 524288 bytes long.

This seems like a pretty serious problem. Is this a bug in Sublime or in my Linux distro? I’m on an x86_64 Linux machine running NixOS Unstable.

0 Likes

#2

What version of ST are you using? This bug was fixed in build 4170.

0 Likes

#3

I’m using 4192

0 Likes

#4

Is indexing working for you? What’s in the indexing status under Help > Indexing Status…?

0 Likes

#5

Yes, indexing is working, and there’s lots of activity in the status window

0 Likes

#6

The crawler (which does indexing) unlinks the shared memory it uses immediately upon being launched. Unless you’ve somehow blocked the unlink through some security mechanism I don’t see how both the indexing can work as well as leak.

0 Likes

#7

I’m on NixOS, which does some unusual things. It could be a problem with my glibc. I’ll look into it (maybe with strace) and get back to you.

I temporarily limited the size of my /dev/shm filesystem to 1GB and then both Sublime and Slack started crashing.

0 Likes

#8

I did an strace and it shows that there’s no attempt to unlink the shared memory file. I’ll provide a fuller trace extract here, but the only mentions of one particular left-over file are:

916110 unlink("/dev/shm/915244crawl4r0") = -1 ENOENT (No such file or directory)
916110 openat(AT_FDCWD, "/dev/shm/915244crawl4r0", O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0700) = 201
916110 ftruncate(201, 524288) = 0
916110 mmap(NULL, 524288, PROT_READ|PROT_WRITE, MAP_SHARED, 201, 0) = 0x7fabbef80000
916110 close(201) = 0
916110 clone3({flags=CLONE_VM|CLONE_VFORK|CLONE_CLEAR_SIGHAND, exit_signal=SIGCHLD, stack=0x7fabbf264000, stack_size=0x9000}, 88) = 916113
916113 execve("/nix/store/20sd8x85vxm2s895d7iywqj82b013bm2-sublimetext4-bin-4192/.sublime_text-wrapped", ["/nix/store/20sd8x85vxm2s895d7iyw"..., "--crawl", "915244crawl4s0", "915244crawl4r0", "915244", "/home/user/.config/sublime-te"..., "/home/user/.cache/sublime-tex"..., "/nix/store/20sd8x85vxm2s895d7iyw"...], 0x7fffda6757c0 /* 78 vars */) = 0
916110 munmap(0x7fabbef80000, 524288) = 0

So the parent thread creates the file, sets its length, mmaps it, closes it, passes the name to the crawler thread, and munmaps it. There’s no sign that either thread attempts to unlink it, and I used strace -f so the tracing included all subprocesses.

However, the child thread doesn’t seem to do much:

916113 rt_sigprocmask(SIG_BLOCK, NULL, ~[KILL STOP], 8) = 0
916113 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
916113 execve("/nix/store/20sd8x85vxm2s895d7iywqj82b013bm2-sublimetext4-bin-4192/.sublime_text-wrapped", ["/nix/store/20sd8x85vxm2s895d7iyw"..., "--crawl", "915244crawl4s0", "915244crawl4r0", "915244", "/home/user/.config/sublime-te"..., "/home/user/.cache/sublime-tex"..., "/nix/store/20sd8x85vxm2s895d7iyw"...], 0x7fffda6757c0 /* 78 vars */) = 0
916113 brk(NULL ) = 0x5634f1701000
916113 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1f6ab07000
916113 access("/etc/ld-nix.so.preload", R_OK) = -1 ENOENT (No such file or directory)
916113 openat(AT_FDCWD, "/etc/sane-libs/glibc-hwcaps/x86-64-v3/librt.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
916113 newfstatat(AT_FDCWD, "/etc/sane-libs/glibc-hwcaps/x86-64-v3/", 0x7ffc5f2cf260, 0) = -1 ENOENT (No such file or directory)
916113 openat(AT_FDCWD, "/etc/sane-libs/glibc-hwcaps/x86-64-v2/librt.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
916113 newfstatat(AT_FDCWD, "/etc/sane-libs/glibc-hwcaps/x86-64-v2/", 0x7ffc5f2cf260, 0) = -1 ENOENT (No such file or directory)
916113 +++ killed by SIGKILL +++

So it’s certainly not unlinking the file right after starting, but perhaps the problem is that it’s never getting a chance to do that.

It’s odd that it’s getting a SIGKILL, which prevents it from running any kind of cleanup before it terminates. I’d have expected it to get a SIGTERM or SIGINT. I exited Sublime from the GUI with File > Quit so there shouldn’t have been any emergency termination. I did watch the progress of indexing in the GUI and waited until it was idle for a while before quitting. It’s getting killed with SIGKILL by the parent thread, though:

916110 kill(916113, SIGKILL) = 0

I should run the trace again with timestamps, so I can see how long it executes before being killed.

0 Likes

#9

Running with timing enabled shows that the thread is being killed when it’s been running for only a few ms. One was 6.25ms and one was 0.8ms. In all of those cases, the subprocess hasn’t even finished loading all its shared libraries.

Strangely, I am seeing indexing progress messages in the status window.

0 Likes

#10

The SIGKILL indicates that the crawler is getting stuck (spending more than 10 seconds on a file) and getting killed by the main process. I could see a leak happening if the subprocess takes more than 10 seconds to get to opening the shared memory, but if that’s the case then something else has seriously gone wrong.

Note that anytime this happens it gets logged in the indexing status. Are you seeing that?

0 Likes

#11

No, I’m not seeing anything unusual in the logging status.

Also, the kill is happening within 6ms of the thread being forked, so the crawler isn’t getting stuck. It doesn’t even have time to finish loading its shared libraries before it’s killed.

0 Likes

#12

Could you share the crawler logs?

0 Likes

#13

I’m not sure exactly what you mean by the crawler logs, but here’s a copy of the text shown in the Indexing Status window. On a freshly-booted system, I opened the window, turned indexing on, waited for the activity to go back to idle, copied the text and turned indexing off.

index "hexdump-lazy" collated in 0.0002s from 6 files
index "hexdump-lazy" is using 45860 bytes for 192 symbols across 649 locations
index "CheckDuplicates" collated in 7.5e-05s from 1 files
index "CheckDuplicates" is using 3520 bytes for 47 symbols across 78 locations
indexing [job 3]: spawning 1 workers to process 1 files
indexing [job 8]: spawning 2 workers to process 7 files
indexing [job 4]: spawning 1 workers to process 1 files
indexing [job 4]: indexed 0 files in 0.000949578s
index "Service..." collated in 0.0066s from 421 files
index "Service..." is using 1090985 bytes for 2726 symbols across 160937 locations
indexing [job 6]: spawning 8 workers to process 84 files
indexing [job 8]: indexed 7 files in 0.092739s
index "nixpkgs..." collated in 0.0027s from 457 files
index "nixpkgs..." is using 629776 bytes for 4328 symbols across 19359 locations
indexing [job 3]: indexed 1 files in 0.100368s
index "Mathematics..." collated in 0.0012s from 114 files
index "Mathematics..." is using 299974 bytes for 2376 symbols across 14675 locations
indexing [job 6]: indexed 84 files in 0.153424s
index "project-1" collated in 0.032s from 1317 files
index "project-1" is using 6298550 bytes for 42399 symbols across 563196 locations
indexing [job 1]: spawning 2 workers to process 5 files
indexing [job 1]: indexed 0 files in 0.00110455s
index "project-2" collated in 0.056s from 7563 files
index "project-2" is using 16874250 bytes for 66085 symbols across 229089 locations
indexing [job 7]: spawning 11 workers to process 124712 files
indexing [job 7]: indexed 124677 files in 32.4544s
index "Documents" collated in 0.72s from 99013 files
index "Documents" is using 69555452 bytes for 382594 symbols across 4536202 locations

The /dev/shm filesystem contains these files, before and after I disabled indexing:

$ ls -lA /dev/shm/
total 8780
-rwx------ 1 user     user      524288 Apr  1 11:12 22742crawl1r0
-rwx------ 1 user     user      524288 Apr  1 11:12 22742crawl1r1
-rwx------ 1 user     user      524288 Apr  1 11:12 22742crawl1s0
-rwx------ 1 user     user      524288 Apr  1 11:12 22742crawl1s1
-rwx------ 1 user     user      524288 Apr  1 11:12 22742crawl4r0
-rwx------ 1 user     user      524288 Apr  1 11:12 22742crawl4s0
-rw-rw-rw- 1 user     user     4706304 Mar 25 13:27 aja-shm-debug
-rw-rw-rw- 1 gdm      gdm         4096 Mar 25 11:57 lttng-ust-wait-8
-rw-r----- 1 user     user        4096 Mar 25 11:57 lttng-ust-wait-8-1000
-rw-r----- 1 gdm      gdm         4096 Mar 25 11:57 lttng-ust-wait-8-132
-rw------- 1 postgres postgres 1048576 Mar 25 11:57 PostgreSQL.3237847608
-rw------- 1 postgres postgres   26976 Mar 25 11:57 PostgreSQL.4290351888
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1r0:r
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1r0:w
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1r1:r
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1r1:w
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1s0:r
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1s0:w
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1s1:r
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl1s1:w
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl4r0:r
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl4r0:w
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl4s0:r
-rwxr-xr-x 1 user     user          32 Apr  1 11:12 sem.22742crawl4s0:w

It looks like the files are being left behind by jobs 1 & 4, and those are the ones that indexed 0 files in a very short amount of time.

I have 11 windows and 9 projects open, so I’m surprised there isn’t more indexing activity. The log mentions only 6 of the 9 project names.

0 Likes

#14

Those jobs seem to have in common that they say they’re indexing N files, but actually do 0. I believe I’ve found a possible cause of that: If calling stat64 fails for all files being crawled we end up spawning the subprocess, skipping all files and then immediately killing it. This would certainly leak the shared memory and semaphores.

We’ve got some ways to mitigate this, but it’s still curious that those stat calls fail.

3 Likes

#15

I looked back at the strace logs, and I don’t see any failed occurrences of stat calls.

However, I did notice something interesting today. I quit Sublime and restarted it with strace. I cleared out all the *crawl* semaphores and shared memory files from /dev/shm before restarting it. I enabled indexing and it ran for quite a while. Afterwards, there were no files left behind in /dev/shm. Also, none of the indexing log entries showed indexed 0 files. This is the first time I’ve seen this since I first noticed the problem.

I wondered if the problem had somehow cured itself, so I quit Sublime and ran the same test again. This time, indexing finished very quickly, and there were three sets of files left, corresponding to the job numbers of three indexing entries that said indexed 0 files.

I’m wondering if there’s some logic that figures out there’s nothing new to index and kills the crawler thread before it’s got started, so it has no chance to clean up its files.

1 Like

#16

I learned from the strace output that Sublime is using LevelDB files in ~/.cache/sublime-text-3/Index. I poked around in there and found 59,462 .ldb files dating back to December 2023. Together they’re taking up 125GiB. There are 846 files from today with a total size of 1.8GiB. They are in two groups with timestamps corresponding to the two indexing runs. There are 839 from the first (longer) run taking up 1.8GiB, and 7 from the second run taking up 7.4MiB. There are also LOG files corresponding to the two runs. The earlier one has 8,531 lines and the second 27. I’ve placed an archive of those here.

Since this is a cache directory I assume I could just delete it. However, I won’t disturb anything until you ask me to.

0 Likes

#17

Thanks. Yea feel free to delete anything in the cache directory.

I’m wondering if there’s some logic that figures out there’s nothing new to index and kills the crawler thread before it’s got started, so it has no chance to clean up its files.

In addition to fixing the issue with the failed stat I’ve also added a failsafe clean up after the crawler. The fix should ship in the next build.

3 Likes

#18

Very good to hear! Thank you.

1 Like