Is it possible to customize the behavior of regex metacharacters for character classes like that?
Use re. U, for unicode, like this
re.compile("\w{1,}", re.U)