Sublime Forum

Is there a way to find any special character that could be available in the character map

#1

Hi all,

I’ve special characters taken from the character map in Windows 10 such as ²¼½¾●■ for a simple example.

Is there any method to search for these via a regular expression?

I need to find them over 200 documents so checking this manually could be a pain in groin. I need to replace them with the Unicode JavaScript equivalent you see.

If I can find them in one go, that would be great. Cheers.

0 Likes

#2

There are two ways you can do something like this (and this works in all of the find panels; Find, Find/Replace, Incremental Search, Find in Files, etc).

The first would be to paste the character that you want to find directly into the search field in the Find panel:

You can also search for a Unicode code point by using the sequence \x{####}, where the #### is the Unicode code point for the character that you’re interested in looking for. This information is available from the Character Map tool when you hover the mouse over a character.

For example, the Unicode code point for Supercript Two is U+00B2, so you can search that way as well:

As noted above, this also works in Find in Files as well, so you could use that mechanism to search a large group of files and do the replacement that way (though you would need to do it one character at a time):

1 Like

#3

Thanks for the suggestion.

I could do them in one go via [²|¼|½|¾|●|■] but this would be good if I knew all the others which I don’t. Finding them is my first hurdle. Once I get a list, I’ll then do the replacements.

I might try [^A-Za-z0-9] and search with that and keep adding to that any others such as ()[]{} so that becomes [^A-Za-z0-9()[]{}] and work my way from there which will search for all but those in that range.

I might also try [^A-Za-z0-9\\p{Punct}] when I’m at the pc tomorrow.

0 Likes

#4

Ah I see; sorry I misinterpreted your question as wanting to search for things you’re already aware of. :slight_smile:

You could try something like [^\x00-\x7F] to find anything that’s not a standard ASCII character.

4 Likes

#5

Legend. That’s all I can say. The find results window has ground to a halt now :rofl:

With your regex I got which picked up on whitespace in some cases and the _ character.

72181 matches across 59 files

I had to change that slightly to [^\x00-\x7F_\s]

Now I have 3841 matches across 55 files

1 Like

#6

I’ve finally modified that to this:
\*/[\s\S]*[^\x00-\x7F_\s]

All of the files have those non-standard ASCII characters in the commented section that I don’t want to pick up, so I just changed the regex to search after the */.

/*

Title of my program
  
This non-standard ASCII character won't get picked up →
Or this one →

*/
 
 → This is a non-standard ASCII character that is found after the multi-line comment.
 → This is the last non-standard ASCII character.

Math.countLinesNonBlank = function(str) {
var result = eval(str.replace(/^.+$/gm, '+1'));
if (result != undefined) {return result} else {return 0}
}

console.log(Math.countLinesNonBlank('this is a line\r\nthis is another\r\nthis is the last.'));
console.log(Math.countLinesNonBlank('\r\n\r\nthis \r\n\r\n\r\n\r\n\r\nis a line\r\n\r\n'));
console.log(Math.countLinesNonBlank(''));
0 Likes