Sublime Forum

**nguyenph88** · December 14, 2016, 6:24pm

Hi all,

I know there is a way to filter all permutes to get the unique one, but my question is a little more than that. Supposed I have a list of urls as below:

http://www.abc.com/123
http://www.def.com/223
http://www.abc.com/456
http://www.def.com/556
http://www.qwe.com/667

Given that the first and third urls are on the same domain (also second and forth), but the path is different. Is it anyway to remove all of the permutations from the same domain and keep the first one only? I expect the result to be:

http://www.abc.com/123
http://www.def.com/223
http://www.qwe.com/667

Thank you.

**FichteFoll** · December 14, 2016, 8:57pm

I suggest writing a small script in a language of your choice (e.g. Python).

**nguyenph88** · December 14, 2016, 9:27pm

thanks for your reply, I was just looking for a native answer without using any other programming languages.

**FichteFoll** · December 14, 2016, 9:59pm

If you are okay with going through each line manually, there is a possibility, but depending on the file size this would quickly become infeasable.

For that, you would select the host url of a line (e.g. http://www.abc.com/), press Alt+F3, deselect the line you initially selected (using alt+leftclick drag) and then delete the remaining selected lines with Ctrl+Shift+d. Repeat this for every line.

**kingkeith** · December 15, 2016, 11:44am

why not just sort it (Edit -> Sort Lines) then perform a regex find and replace:

Find What: ^(http://[^/]+/)(.*$\n)((\1)(?2))+
Replace With: $1$2

**FichteFoll** · December 15, 2016, 1:23pm

That’s a good idea. I assumed the line order must not be changed and thus didn’t consider this option, but if it can be this would be a good way.

The only “problem” that you need to run the replacement multiple times because of 1) potential overlaps with an odd number of the same hosts and 2) more than 2 times of a host, which would need to be reduced by half again.

Edit: I didn’t notice the + at the end, so you can disregard this post.

**kingkeith** · December 15, 2016, 12:00pm

nope try it

http://www.abc.com/123
http://www.abc.com/456
http://www.def.com/223
http://www.def.com/556
http://www.def.com/602
http://www.ghi.com/700
http://www.ghi.com/731
http://www.qwe.com/667
http://www.qwe.com/667
http://www.qwe.com/667

turns into

http://www.abc.com/123
http://www.def.com/223
http://www.ghi.com/700
http://www.qwe.com/667

**FichteFoll** · December 15, 2016, 11:57am

http://www.abc.com/123
http://www.def.com/223
http://www.ghi.com/700
http://www.qwe.com/667
http://www.qwe.com/667

I used “Replace all” of course, since continuously pressing “Replace” for 1000 lines is nothing to sweat at.

**kingkeith** · December 15, 2016, 12:00pm

oh I see, I should make the final newline optional:

^(http://[^/]+/)(.*$\n?)((\1)(?2))+

**nguyenph88** · December 15, 2016, 8:31pm

thank @kingkeith, your solution is what i needed.

Remove duplicated domains in text