Sublime Forum

Regex - Select and delete any links from documents

#1

hello, I want to select and delete any entire link that begins with http: to the space (before alt), Practical, all the links, and to keep only the text from a document. How can I do this on regex? Can anybody help?

This is an exemple of link:

http://i1.wp.com/www.blablabla.com/wp-content/uploads/2010/12/frenchbla-bla-bla.jpg?resize=150%2C150” alt=

0 Likes

#2

Try this regular expression:

\bhttp://\w+.+[^ alt]\b

0 Likes

#3

works, thanks Subiswonder, but if I wanna delete same time http, https, and href how can I do this?

and, if it is possible, to delete also the link that have href

just like this example

href=“blablabla/3-000-oameni-cursurile-prim-ajutor.html#respond”>

I wrote a simple regex for this one https://regex101.com/r/eB7zO2/1 but I don’t know hot to implement all 3 in same time, in the same regex, http, https, href

0 Likes

#4

Try this for links with http, https protocols and a tag with href attribute:

(<a href[\s\S]*?>[\s\S]*?)|(\b(http|https):\/\/.*[^ alt]\b)

1 Like

#5

works just fine, thanks

0 Likes

#6

You are welcome.

0 Likes

#7

I made a quick regex, to eliminate almost all html code, and to keep only the text from page

<script[\s\S]*?/script>[\s\S]*?|href=[\s\S]*?>[\s\S]*?|<br />|<ul[\s\S]*?/ul>[\s\S]*?|<li[\s\S]*?/li>[\s\S]*?|<style type[\s\S]*?/style>[\s\S]*?|<object[\s\S]*?/object>[\s\S]*?|(<a href[\s\S]*?>[\s\S]*?)|(\b(http|https):\/\/.*[^ alt]\b)|</ul>|</li>|<br/>|<!--[\s\S]*?-->[\s\S]*?|<div style[\s\S]*?>[\s\S]*?|<img[\s\S]*?>[\s\S]*?|<div id[\s\S]*?>[\s\S]*?|<div class[\s\S]*?>[\s\S]*?|</object>|<embed[\s\S]*?/>[\s\S]*?|<param[\s\S]*?/>[\s\S]*?|<noscript>[\s\S]*?</noscript>[\s\S]*?|<link rel[\s\S]*?>[\s\S]*?|<p style="text-align: center;">|<iframe[\s\S]*?</iframe>[\s\S]*?

0 Likes