Search Engine Filtering



Search engines have always had a relationship with filtering, in effect search engines filter indexed results to match your search queries. Search engines have de-listed specific content in the past. Google removed links to anti-Scientology websites and continues to remove links to unauthorized copies of Kazaa. Both of these were due to DMCA complaints. Search engines in China filter politically sensitive content; the results are excluded from search query results even though the search engine has indexed them. Most search engines have “safe” modes that users can turn on that will attempt to exclude pornographic content, or content the filter thinks is pornographic. And most commercial filtering applications have optional keyword filters that will prevent searches for specific words and phrases. Some are calling for search engines to filter “high risk key words or combinations of words” that could be used to search for child pornography. At least one search engine, Ask Jeeves, has implemented a key word filtering system based upon lists generated by the Internet Watch Foundation, a parnership between the government, police and the internet service provider industry to combat the distribution of child abuse images online.

From a researchers point of view one of the biggest problems with such filtering, as stated by many others, is independent review. Because accessing such content, even for research, could be illegal it is difficult to independently review the filtering mechanisms in place to see how they work, if they work, and if there is collateral blocking — blocking content the filter never intended to block. It would be interesting to see how specific (or not specific) these words are phrases are and if there is a string relationship between the keywords searched for and the results returned. I wonder if the search engines would just filter out the results returned or if they would instead just dissallow searched for such keywords? Or would they execute searches for these keywords on their own databases and just purge the URL’s returned from their indexes?

Awaiting an independent review, I would suspect that given the problems of overblocking associatated with filtering URLs/domains and keyword filtering (and here) there would be cosiderable collateral filtering in the case of search engine filtering.

Post a comment.