Sublime Forum

Regex: Match some words between meta tags

#1

hello. I have this tag:

<meta name="description" content="Reflect on how you &#WORD_1 so as to be convinced of the close connection that exists between your &#WORD_2 of values and the role you assume in world."/>

I want to match the words &#WORD_1 and &#WORD_2 from this particular meta tag, so I can Replace them with a different word. How can I do this?

I try this regex, but doesn’t work:

(?s)(<meta name="description" content=)&#WORD_1|&#WORD_2(?=/>)

0 Likes

#2

Can you give an example on what the replaced text should be?

0 Likes

#3

Here is something to get you started:

(<meta\s+name=\"description\"\s+content=\"[^\&]+)(\&#WORD_1)(.*\"/>)
  • <meta match the literal string <meta
  • \s+ match one or more white spaces
  • name=\"description\" match the literal string name="description"
  • \s+ match one or more white spaces
  • content=\" match the literal string content="
  • [^\&]+ match any character one or more times except the & character
  • (\&#WORD_1) capture the literal string &#WORD_1
  • .* match anything zero or more times
  • \"/> match the literal string "/>

The capture groups ${1} and ${3} are probably what you want to keep, the capture group ${2} should be the literal string &#WORD_1. So, in order to replace it, you can use

${1}foo${3}
0 Likes

#4

After replace, both word &#WORD_1 and &#WORD_2 should be replace with WORD_3. Eventualy, I can match and replace one by one. A search and replace for the first word, then another search and replace for second word.

Thanks rwols, works.

0 Likes

#5

and if I want to match simple word like MOTHER instead of &#WORD_1, how can I change the regex?

0 Likes

#6

Well, I explain how the regex works, so you should be able to figure it out. In particular:

0 Likes

#7

yes, I change into (<meta\s+name=\"description\"\s+content=\"[^\&]+)(\MOTHER)(.*\"/>) but doesn’t work.

0 Likes

#8

The & character is escaped with a backslash “”, you don’t have to escape the M in MOTHER, so just (MOTHER) should work.

By the way, you can use (\&#WORD_\d+) to capture WORD_1, WORD_2, and so forth.

0 Likes

#9

I try also without \ and only MOTHER, still don’t work. Try yourself please.

0 Likes

#10

I find the answer:

Search:
(<meta name="description" content=".*)&#WORD_1(.*"\/>)

Replace by:
\1ANOTHER_WORD\2

1 Like

#11

Just curious as to why the quotes and ampersand are escaped? I wouldn’t normally expect to have to escape these characters in a regex (inside or outside of a character class). Is this peculiar to the flavour of regex that Sublime Text uses?

0 Likes

#12

no, they don’t need to be escaped in ST, someone is over-escaping things I think :wink:

1 Like

#13

Better safe than sorry! :stuck_out_tongue:

0 Likes

#14

for future people looking at this thread, there really is no need to write super complicated regex patterns in ST, because one can “find in selection”. So just do a find, then narrow it down as many times as you need.

i.e. this example could be simplified to:

  1. <meta name="description" content="\K[^"]*
  2. &#WORD_1|&#WORD_2|MOTHER
3 Likes

Regex: Match and Remove particular repetead-words from tags