Sublime Forum

How to insert character at specified column

#1

Good morning. New user here. Thanks for any help.

I have a column of numbers and letters, like this:
FH210390RT
HF910226RT
DD120139RO

I want to insert hyphens at the third and ninth columns, like this.
FH-210390-RT
HF-910226-RT
DD-120139-RO

Some of the columns are over 200 lines, although that probably doesn’t matter.
The first two and last two characters are always letters. The middle six characters are always numbers.

Any help is appreciated. Thank you.

0 Likes

#2

maybe use regex replacement if there are really lots of lines.

  • Ctrl+H to open the search & replace panel.
  • Make sure we are in REGEX mode (Alt+R)

regex search: ([a-zA-Z]{2})([0-9]{6})([a-zA-Z]{2})
replacement: \1-\2-\3

If you want to test your regex online, https://regex101.com is a good place to go.

0 Likes

#3

Thank you very much.
I’m not an expert in Regex, but have some experience. I gather the approach you took wasn’t necessarily insertion at a specified column position … but it looks like it has to do with whether or not characters are letters or numbers.

This worked very well.

FYI, these are docket numbers for court cases stretching over 25 years. Looks like we have about 8-10,000 lines to do. But certainly not all at once. We break it down to monthly collections - about 75 dockets per month.

Again, thanks.

0 Likes

#4

That is correct. Also note that there are no word boundaries or line anchors in @jfcherng’s regex. It would replace every instance of 2-alpha-6-digit-2-alpha in the entire document regardless of the column position.

If you only wanted matches at the beginning of the line, use the regex ^([a-zA-Z]{2})([0-9]{6})([a-zA-Z]{2}) to bind the matches with the beginning-of-line anchor.


A non-regex approach using multi-cursors (this will go slow with a large number of lines) and assuming every line has the prefix:

  1. select all lines (Cmd/ Ctrl+A)
  2. split selection into lines (Cmd/Ctrl+Shift+L)
  3. go to beginning of line (Cmd+left arrow / Home)
    • at this point, you have a cursor at the beginning of each line (including blank lines)
  4. right arrow, right arrow, type -
  5. right arrow 6 more times, type -

Both approaches will work if initial assumptions are met. Regular expressions can be a double edged sword if you don’t know how to wield them well. It is difficult to catch all edge cases for a given document, so just be careful when using regex’s as you get familiar with them.

0 Likes

#5

Yeah. To write an accurate regex, we need an accurate format (or full content) of inputs so we don’t over-replace something.


I suggest @Lappert to learn some simple regex grammars. It would be helpful in many aspects especially when you are playing with string searching or replacement. :smile: Sometimes we even use it to parse a file such as ST’s programming language grammar parsing engine.

0 Likes

#6

Again, thanks.

This has turned into a multi-year project. We start with a list of cases that contain docket numbers from a pay web site. We use wget to grab the case pages. That merely has a description of the case, not the case itself. So each month results in about 70-80 files. Of those, we use grep to filter out the pages we don’t want, maybe about 40-50%.

Then using Sublime Text, I’ve used regex to reduce each line down to just the docket number. This is about 5-6 separate find/replace actions. That leaves the docket in AA123456ZZ format. Normal usage for these dockets has the hyphens, hence the insertion requested above.

And once we have the docket numbers, we insert those into a spreadsheet for tracking purposes, and then we request the actual cases through the government agency using the Freedom of Information Act (FOIA). About a month later, they send us PDF’s of each case.

We have about 25 years of these to process. In the end, we’ll put the cases online for free access. People shouldn’t have to pay $50/month for basic information that is already in the public domain.

0 Likes