Sublime Forum

Is there a plugin to find all "bad words" or "swear words" in a document?

#1

Hi all,

I have an EPIC word list (20,000 plus words) that I need to strip all bad words out of. Words like F#$# and even things like BDSM, rape, etc. Words that are just bad form. It’s for a hangman game I’m making. I just can’t have kids having “RAPE” as a word they have to guess.

I can’t possibly do it manually. I started and went cross eyed and thought “This is what computers are for!”. Hence I’m here asking. I was considering writing a Python script to handle it but before I hit that road I thought I’d ask and see if one exists?

Thanks a heap.

Rob

0 Likes

#2

Do you have the list of the bad words?

1 Like

#3

I think you’re be better off using a script. FYI: Perl’s CPAN has a package Bad::Words .

3 Likes

#4

It turns out this was the big issue. I couldn’t get one that had a lot of the bad words. For example, “rape” isn’t necesarrily a bad word in a lot of contexts, but it is in a kids game. So I had to abandon the idea.

I ended up:

  • heading to the gutenberg list of children’s books
  • grabbed a collection of kids books in txt format
  • replaced all “space” characters with “carraige return” characters to put each word in its own line
  • removed all duplicate words via permute lines - unique
  • search and replaced all extra characters like !-()[]; etc with nothing (deleted them)
    … and had a word list.

Some of the words are a bit olde worlde but I’m happy with that.

A kids book collection has to have safe words in it right?!

2 Likes

#5

Did you sanity check the good words from Gutenberg? You could make sure that none of the words in Bad::Words are in your file.

0 Likes