Removing Bad Words

Looking for suggestions on how to handle bad words that might
get passed in through $_GET['item'] variables.

My first thoughts included using str_replace() to strip out such
content, but then one ends up looking for characters that wrap
around the stripped characters and it ends up as a recursive
ordeal that fails to identify a poorly constructed $_GET['item']
variable (when someone hand-types the item into the line and
makes a simple typing error).

So the next thoughts involved employing a list of good words
and if any word in the $_GET['item'] list doesn't fall into the
list of good words, then an empty string gets returned.

Any suggestions on how to handle this?


Jim Carlock

Re: Removing Bad Words

Kimmo Laine wrote:
Unless next_page.php generates PHP, the script with this include will
only get HTML.

    if (isset($_GET['foo'])) {
      echo '<?php echo $_GET[\'foo\']; ?>';
    } else {
      echo '<?php echo \'Not available\'; ?>';

Re: Removing Bad Words

Jim Carlock wrote:
You will have to implement "fuzzy logics" which wil be able to filter not  
only "badword" but also "b a d w o r d", "b@d word", "b*dword", etcetera.

Although you should be able to catch some of those, the best filter is still  
the human moderator...


Re: Removing Bad Words

Jim Carlock wrote:

Jim, Not knowing your requirments or what the website will be used for makes it  
a little difficult to give you a solution.  Would a drop-down list of acceptable  
words be better than expecting the user to type them correctly?

That being said, if you type as badly as I do, you have probably made all of teh  
tpying errors most commonly seen.  Including a str_replace() for all of those  
examples would not be that difficult - better yet include it into a javascript  
and let the client-side handle the word-corrections (onclick or onsubmit).

I have worked with several products (OS and database) that will auto-correct  
some commands like: eixt = EXIT  or comit=COMMIT etc... Digital TOPS10/20 OS  
that ran on the KL10/20 systems (36bit - circa mid 70's early 80's) would prompt  
you for a yes/no to:
did you mean [whatever the correct spelling of the command is]  Pretty cool for  
it's day...

Re: Removing Bad Words

Jim Carlock wrote:
"Michael Austin" replied:
Well a drop down list will go into the making for some things, but
anyone can edit the line of text in the address-bar. And so instead
of filtering for bad words, I'm looking for suggestions on how to
parse through a list of good words (stored inside an array) and if
any of the words in the address bar fail to match the words in the
any of the words in the array, the individual gets routed to a
bad-word page (the website homepage). I see a database as a
very useful option but I'm working with PHP arrays at the
moment. The database will be the future, but for the moment, I
think an array of 200 possible words might work very well.

Just need an effective way to compare a word to a list of words
inside an array and return true if it matches, false if it fails the

My thoughts include:

function IsValidWord($sCheckThis) {
 global $aWords;
 foreach($aWords as $sWord) {
  if ($sWord === $sCheckThis) {

So I'm looking for any other suggestions.

The list of words is to remain on the server, so JavaScript in this
case, seems to be an invalid option. Any mistyped words are to
route the client to the homepage, or perhaps present the page in
question with no selections selected. Either/or seems appropriate
in this case.


Jim Carlock
Re: Removing Bad Words

The function you need is in_array() although an associative array would
be more efficient. E.g.

$good_hash = array(
  'good' => true,
  'better' => true,
  'best' => true,

if(!array_key_exists(strtolower($word), $good_word)) {

Re: Array Storage: Lowercase Versus Mixed-case [Topic was: Removing Bad Words]

On 23 Feb 2006 00:29:48 GMT,
$good_hash = array(
  'good' => true,
  'better' => true,
  'best' => true,

if(!array_key_exists(strtolower($word), $good_word)) {

Thanks, Chung. It seems like it's best to store everything inside the
array as lowercase and then fill in some appropriate variables for.

I initially started out with mixed-case arrays. For example:

// array of states
function Create_USA_States_Array() {
 $aStates = array(
 array("Alabama", "AL"),
 array("Alaska", "AK"),
 array("Arizona", "AZ"),
 array("Arkansas", "AR"),
 array("California", "CA"),
 array("Colorado", "CO"),
 array("Connecticut", "CT"),
 array("Deleware", "DE"),
 array("Florida", "FL"),
 array("Georgia", "GA"),
 array("Hawaii", "HI"),
 array("Idaho", "ID"),
 array("Illinois", "IL"),
 array("Indiana", "IN"),
 array("Iowa", "IA"),
 array("Kansas", "KS"),
 array("Kentucky", "KY"),
 array("Louisiana", "LA"),
 array("Maine", "ME"),
 array("Maryland", "MD"),
 array("Massachusetts", "MA"),
 array("Michigan", "MI"),
 array("Minnesota", "MN"),
 array("Mississippi", "MS"),
 array("Missouri", "MO"),
 array("Montana", "MT"),
 array("Nebraska", "NE"),
 array("Nevada", "NV"),
 array("New Hampshire", "NH"),
 array("New Jersey", "NJ"),
 array("New Mexico", "NM"),
 array("New York", "NY"),
 array("North Carolina", "NC"),
 array("North Dakota", "ND"),
 array("Ohio", "OH"),
 array("Oklahoma", "OK"),
 array("Oregon", "OR"),
 array("Pennsylvania", "PA"),
 array("Rhode Island", "RI"),
 array("South Carolina", "SC"),
 array("South Dakota", "SD"),
 array("Tennessee", "TN"),
 array("Texas", "TX"),
 array("Utah", "UT"),
 array("Vermont", "VT"),
 array("Virginia", "VA"),
 array("Washington", "WA"),
 array("Washington, D.C.", "DC"),
 array("West Virginia", "WV"),
 array("Wisconsin", "WI"),
 array("Wyoming", "WY"));

The function established to return a state name works as follows:

// this function is incomplete
// PURPOSE: RETURN statename from parameter passed in
// INPUT: City-State String, OPTIONAL default string
// RETURNS: empty string if invalid parameter requested
// $sDS represents default state name to return
// $sCS = $_GET['citystate'];
// "Charlotte NC" or "Charlotte North Carolina" or "Charlotte" or
// "usertyped garbage"
function GetStateNameFromCityState($sCS, $sDS = "") {
 $sStateAbbr = trim($sCS);
 $iLen = strlen($sStateAbbr);
 // first check to see if empty string
 if (strlen($iLen < 2)) { return($sDS); }
 if (GetStateFromAbbr($sStateAbbr)) {
  // a valid abbreviation was passed in
 $aStates = Create_USA_States_Array();
 // possible state name in parameter so check for a state name,
 // before checking against abbreviations
 foreach ($aStates as $aState) {
  // state name: $aState[0]
  if (stristr($sStateAbbr, $aState[0]) != FALSE) {
   // return state name
 // no valid statename found, so start abbreviation checks
 // first determine if there's an abbreviation present
 // explode(separator, string to separate)
 $aWords = explode(" ", $sStateAbbr);
 $yAbbrFound = FALSE;
 // check for abbreviations
 foreach ($aWords as $sWord) {
  if (strlen($sWord) == 2) {
   // assume a 2-letter word represents a state abbreviation
   $sStateAbbr = $sWord;
   $yAbbrFound = TRUE;
 if ($yAbbrFound) {
 } else {
  // no abbreviation to check, so return empty string
 // now validate abbreviation found
 foreach ($aStates as $aState) {
  // now check against abbreviations
  if (stristr($sStateAbbr, $aState[1]) != FALSE) {
   // return state name in proper formatting
 // return empty string when it all fails (default state)

Haven't fully tested the user-typed garbage being passed in, but
my question specifically involves configuring the state array, and
alternative suggestions for this.

Note, that the above function actually returns what's found inside
the predefined array, rather than what's found in the address-bar.
This in effect, should get me words proper for HTML presentation,
where I don't have to mess with capitalizing ALL state abbrev's,
or capitalizing the first word of anything.

I still need to test the code above some more, so if anyone happens
to catch a flaw please point it out.

And again back to the question in the topic... "Lowercase Versus
Mixed-case" words inside the array that holds the states and state
abbreviations. Anyone here that knows of a better way to do this?
Another array might get created, as the list of targeted cities is over
100 right at the moment. To possibly identify each city to a proper

I plan on getting something going whereby a new array appears as

"city name", iStateNumber

"state number" represents an integer 0 to 50 (51 states).
Duplicate "city name"'s could exist, so the database, combines
the "city name" and the "state number" into an index. The "state
number" ends up being a pointer to the StateID in the State
database. So continuing along the lines of the indexed arrays,
as presented by Chung Leong, how would I go about indexing
such an array as above and would indexing be appropriate for

Thanks, Chung Leong. I did put the indexed array into play in
another function where the number of items is greater. I didn't
know how to work it into this particular array (or an array with
multiple fields with duplicate records).

Jim Carlock
Re: Array Storage: Lowercase Versus Mixed-case [Topic was: Removing Bad Words]

Jim Carlock wrote:
Quoted text here. Click to load it

Just have the static array be in mixed case, then generate the other
one(s) programmatically:

$states = array(
  "AL" => "Alabama",
  "WY" => "Wyoming"

$state_hash = array_flip(array_map('strtolower', $states));

