Remove any non-word

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Before I write my own, I'm wondering if anyone has something already
written that removes everything not a word from a string.  By "not a word",
I mean HTML, special characters, and punctuation.  Digits allowed.


Karl Groves

Re: Remove any non-word

On 2006-11-12, Karl Groves wrote:
Quoted text here. Click to load it

   Presuming a *nix OS, with bash or ksh93, to remove all characters
   that are not letters or numbers:


   If you want to remove all tags, as well, use sed:

printf "%s\n" "$str" | sed -e 's/<[^<]*>//g' -e 's/[^a-zA-Z0-9]*//g'

   Chris F.A. Johnson                      <
   Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)

Site Timeline