Sublime Forum

Reformating text - Out of my depth

#1

I apologize if this is simple or something I should figure out on my own but i’m a bit out of my depth here. I have a frequency dictionary as a pdf and I’m trying to convert the entries into usable flashcards in anki so I can study the words, there are several thousand so changing them one by one would be a bit tedious. When pasting I do not get the right format for anki instead I get this:

(number)(word in portuguese, pt) (type of word eg noun, tx) (word in english, en)
• (portuguese phrase, ptp) - (english phrase, enp).
number | number

simplified:
(n)(pt)(tx)(en)
•(ptp) - (enp).
n | n

I need to change it into this format:
(pt)
(ptp); (en)
(enp)

Using excel and word and I remove the “n | n” and roughly remove the starting number, I can also change the “(tx)” and “•” into some other symbol, say * with the idea of having the “(en)” bracketed on either side and insert the “
”'s and the “;” at the same time, however I have no idea how to specify “select the text between * and *” and further "move that text to “-”.

I hope this makes sense. I think I can do what I need using sublimetext, but, with no coding knowledge the learning curve seems to be a bit of a brick wall at the moment. Any suggestions or advice would be helpful! Thanks!

0 Likes

#2

It’s sounds like a job for regular expressions, which are like find-and-replace on steroids.

Usually I’m a sucker for this kind of thing, but I’m slightly harried at the moment, so I’ll offer pointers rather than the solution.

You can use Sublime Text for this job (though there are other tools as well, this doesn’t really have anything to do with coding) by going in the find-and-replace panel and selecting the regular expressions button.

To get a grip on how to write these, just google “regular expressions help” or the like.

Hope this gets you started,
Alex

0 Likes

#3

Thanks! regular expressions seem to be exactly what I need! I’m still completely out of my depth but at least now I know which ocean I’m in.

I’m have a bit of trouble understanding how they go together though. I have a number between 100 and 5000 at the beginning of each entry followed by a work (ex: 543pessoa) and I want to delete those numbers so I did (\d{3,4}) but that also selects some numbers in other parts of the text. Adding ^ to make (^\d{3,4}) excluded most of those, however some lines still start with numbers out of chance because of the flow of text. I figured out how to select those numbers as they are followed by a space (^\d{3,4} ) however I can’t find out how to exclude them. In other words, I can’t seem to find, through google or otherwise a ‘not’ operator for regex.

On a side note. I can exclude them by including the letter after the number by (^\d{3,4})[a-z,A-Z] but I only want that letter as an identifier of the number, not to actually be selected and then removed. Is that possible to do?

once I figure this out, the next step is to figure out how to select text and move it. I’m trying to learn portuguese and it seems like I’m going to end up learning regex instead :stuck_out_tongue:

0 Likes

#4

Use a capture group. For your search use ^\d{3,4}([a-z,A-Z]) and for your replace use $1 to reference the character.

0 Likes