Posts tagged “Search Engines”

cDc’s Oxblood has a new tfile



My old friend Oxblood has a new tfile out. Read it here. I’ll add a couple of things, the first is that Yahoo! and MSN (live.com) censor their search engines for China as well and also that the results you get back for certain searches are heavily populated with content that is either hosted in China or ends in .cn (100% for “西藏独立“, tibet independence, for example).

Microsoft has censored me



Well, Microsoft’s Chinese version of live.com is now censoring me (www.nartv.org).

Yahoo CN



I just noticed that Yahoo now provides a link at the bottom of their China (search.cn.yahoo.com) search engine in their generic “censor message” that links to a page that lists the URL’s t

雅虎搜索结果均来自相关来源网站,根据有关法律法规和政策,部分搜索结果可能未予显示。
根据《信息网络传播权保护条例》未予显示的结果,请点击这里查看。

The page appears to list sites that have been removed due to copyright violation. It does not list, for example studentsforafreetibet.org which simply contains zero results.

MSN’s live.com



MSN’s search engine at live.com detects the HTTP header, “Accept-Language”, and then sets, via a Cookie, your “market”. Currently, there are three Chinese options zh-cn (China), zh-hk (Hong Kong) and zh-tw (Taiwan). Your “market” will be set depending on which one of these your browser sends to the server. If your browser send the more generic “zh” without specifying a region, live.com will default to zh-cn.

Unlike Google which uses geolocation by IP address (e.g. if your IP adddress is allocated to Canada you’ll be directed to www.google.ca) to redirect you to your localized Google (with the exception of China in which case you are redirected to the Chinese-language version of google.com), if your default setting is Chinese Simplified but you are not in China you will also be redirected to the zh-cn version of live.com. This is significant because the zh-cn version of live.com is the censored version for China. This means that people outside of China whose browsers are set to Chinese Simplified will receive, by default, the censored version of the live.com search engine.

You can, of course, go to the settings page and manually specify your market.

Still, this appears to be a problem because the English version of live.com seems to do a very poor job of indexing Chinese sites. I am not a Chinese speaker, so I would appreciate feedback on this. Also, are the HK and TW versions compatible (give the simplified vs. traditional and so forth)? Is it sufficient to expect Chinese simplified users to use the HK or TW versions of live.com?

Human Rights Watch Report



Human Rights Watch has released a report on Internet censorship. It particularly focuses on the role U.S. corporations in censorsing their products in order to enter the Chinese market. The report’s title, “Race to the Bottom”, sums up the situation quite well.

(A mirror copy for those who cannot access the hrw.org website is available here.[9.8mb .zip])

The report’s principal researcher and editor, Rebecca MacKinnon, did a great job of weaving together the filtering practices of Google, Yahoo!, MSN and Skype with reactions from the Chinese community, especially bloggers. Chapter 4 in particular is well worth the read. It really highlights the point that Internet censorship and surveillance is not merely an academic exersize, the choices these companies are making are having an effect on people’s lives. I know I tend to get bogged down in technical details sometimes (as I am about to do in the next paragraph) and it was a good reminder for me.
More… »

You’ve Been Censored!



When Google launched a censored Chinese search engine it created a storm of controversy. Google has arguably borne a disproportionate amount of criticism, after all Yahoo! had long been censoring their Chinese search service. After Google launched google.cn there were several Congressional inquiries and the issue of U.S. involvement in censorship in China was widely covered by the media.

The willingness of these powerful companies, Google, MSN, and Yahoo! to censor themselves signifies that censorship is now the norm. Now, this is not something entirely new. Chilling Effects has been documenting various cases usually in the area of copyright or hate speech.

The case of China brought up several issues. In addition to being overtly political — the removal of content widely acknowleged as credible — issues of transparency and accountabilty emerged as key. In the past information in legal papers and court cases was available which at least documented the process through which the filtering or removal of content was taking place. With China there was simply silence.

Google implemented one small measure of transparency which has now become the norm: they began to inform users when search results were censored. Now MSN and Yahoo! have followed. While there is still a long way to go, I hope that Google will continue to be a leader in implementing transparency. Implementing these measures would be a good start.

Here are some samples of the various “you’ve been censored” messages:

Babelfish translation: According to the local law laws and regulations and the policy, the part searches the result not to demonstrate.

Babelfish translation: In the search result removed certain contents

Babelfish translation: Already helped you to filter the unnecessary homepage!

*Note: Yahoo! only displays the error message if they’ve indexed a site but are not displaying results. Often, “sensitive” sites are not indexed so there is no censored message.

Keywords & Google.cn



Google.cn’s de-listing of websites (these sites do not appear in search results) has bee failry well documented. The best way to determine if a site is de-listed is to use the “site:” modifier which restricts results to particular websites. For example, a search for site:news.bbc.co.uk in google.cn shows that there are no results but also indicates that the results have been censored (据当地法律法规和政策,部分搜索结果未予显示).

Previously, Ethan and I found that searches for certain terms were restricted to Chinese webpages.

Google.cn has gone further now and appears to be restricting searches for certain terms to sites that have been whitelisted.

I started with a search for 六四 (64) that is restricted to *not* include any .cn, .com, .org, or .net sites. There are no results, and the censored message is displayed. (The censored message will always be displayed if one of these special terms are searched for, no matter if any results are actually censored or not).

I then began to remove some of the restrictions. First, I allowed .net to be included, and only one site was returned. When only .org is allowed there are only 4 indexed domains. And when .com is allowed only 8 domains are indexed. There seemed to be a fair amount of .cn sites when .cn is allowed, I didnt bother trying to fish out the unique domains.

Now, this is not *the* whitelist — just the sites that are returned for the search 六四 (64). as we find more censored terms, a more definitive whitelist can be built out.

The reason I suspect these sites are whitelisted is because you cannot search other sites — sites that are indexed and not de-listed — for these special terms. My blog for example, site:www.nartv.org is indexed. However, if you search my blog for 六四 (64), the results are censored. (There is content on my blog that google.com has indexed with 六四 (64), and google.cn has indexed it too.

Not even Microsoft has been spared, it too is censored :).. And, actually, it seems that results from ccTLD’s other than .cn are not displayed when the secial terms are searched for. (I didn’t check every single one).

Now, there are some weirdnesses. For example, a search for “falun” with .com, net, .org, and .cn excluded will still return results. Some of this appears to be because IP addresses obviously don’t have domain suffixes but also because Google does not properly parse out domains that have a port number attached (this also happens on google.com). But, strange, nonetheless.

(Some more strangeness, opinion.people.com.cn is de-listed and although news.xinhuanet.com is indexed the censor message appears!)

De-listed domains, restricted keywords and whitelisted domains! What’s next Google?

Google Groups Censorship



Seth has been probing Google Groups censorship. he’s found that some posts are being removed. Seth notes that these posts are censored for all, not just in Germany.

Microsoft responds with omissions



Amnesty International’s campaign urging Microsoft not to engage in human rights abuses has triggered a response from Microsoft. Microsoft claims it “has increased the ability of Chinese citizens to engage in free expression” and that Amnesty’s claims of Microsoft’s censorship are misleading. A first glance indicates that Microsoft may have a few points:

1. Microsoft says it has not signed “Public Pledge of Self Regulation” for the Chinese Internet industry (Microsoft is not listed as a member of ISOC). Still, Microsoft does self-censor its blog and search services in China.

2. Microsoft says that its beta search engine in China “does not block searches for particular key words, including ‘democracy,’ ‘freedom,’ ‘human rights’”. Indeed, searches for these terms (in English and Chinese) do produce results. (The results do seem to be weighted in favour of content hosted in China, but I’ll leave it to others to investigate that further.)

However, beta.search.msn.com.cn does, in fact, censor its search results. Rather than restrict what keywords a user can search for MSN simply removes specific web sites from the results displayed to the user. Following, Google.cn’s example, MSN indicates that results have been removed. MSN provides a link to a page that explains their filtering policy which states that sometimes, in accordance with local law, certain results will not be shown.


(* Click for larger image)

For example, using the “site:” modifier which restricts results to a particular website, a search for “site:news.bbc.co.uk” returns a page that indicates that although there are millions of items there were no results found and that some results have been removed. MSN China has removed the BBC News website from its results set.


(* Click for larger image)

The reality is that Microsoft does censor is MSN China search engine.

3. Microsoft says that users of its MSN Spaces blogging service in China “are not prohibited from using the words ‘democracy,’ ‘freedom,’ or ‘human rights’ in blog titles or blog content.” But it admits that there are restricted terms when it comes to the “account name, space name, or space sub-title – or in photo captions.” Microsoft claims that the key words ‘democracy,’ ‘freedom,’ and ‘human rights’ are not on their restricted term list. Microsoft states that “MSN Spaces does not filter blog content in any way.”

Microsoft is choosing the terms used very carefully, ostensibly to obfuscate the fact that MSN spaces does censor users. Note the distinction between blog titles and blog content. Blog content seems to refer to the “body” of a blog post, which does not appear to be filtered, but blog titles are in fact filtered. Although the specific words noted are no (or no longer) filtered, terms such as 天安门事件 (“tiananman massacre” in Chinese) are in fact filtered. If a blog post title contains such terms the user receives a warning indicating that the post contains prohibited language and the blog entry is not posted.


(* Click for larger image)

MSN Spaces content is in fact censored, just not in the “body” of a blog post.

Microsoft appears to be trying to divert attention from their censorship practices focusing on the specifics of their filtering system. Researchers are at a distinct disadvantage as Microsoft keeps the exact list of censored terms secret and can modify the list at anytime. In fact, Microsoft’s main claim that Amnesty International is inaccurate by stating that the words ‘democracy,’ ‘freedom,’ and ‘human rights’ (presumably they mean in Chinese) are censored by MSN is because MSN modified their original restricted term list.

Research conducted by Rebecca MacKinnon in June 2005 clearly shows that MSN Spaces prevented a blog titled 我爱言论自由人权和民主, which translates to “I love freedom of speech, human rights and democracy” from being created.

While the words identified by Amnesty International are not filtered for specific blog entries or in the MSN China search engine they were used as part of MSN Spaces’ filtering and Amnesty is rightly drawing attention to this. Microsoft, on the other hand, is using precision to deflect criticism and make it appear that they don’t censor their services at all.

This underscores the need for anti-censorship community to be thorough in our research. Since these companies (and countries) can change how and what they filter at anytime they may use this to attempt to discredit critics. It is very important for free speech advocates to accurately identify companies that are complicit is censorship world wide.

Microsoft claims that it “has increased the ability of Chinese citizens to engage in free expression” when in fact all they have done is introduced censored services that domestic Chinese firms already provide. Instead, Microsoft is, as Amnesty International states “aiding repression, censorship, and violation of fundamental freedoms”. By introducing yet another censored service in China — to compete for market share with other censored services — Microsoft is normalizing censorship. Rather than being the exception, censorship is becoming the rule and when the largest and most powerful technology companies on earth support it, it becomes increasingly difficult to fight against.

I can’t search for shit?



Seth Finkelstein notes that Yahoo Italy is strangely redirecting searches for words like “shit”. Instead of searching the “web” for shit, it redirects to a Yahoo directory search. “merda” is subject to the same redirection.But it seems that most other nasty words are fine. Another thing that’s interesting is that when you search for a word like “fuck” a warning appears at the top of the results.

Is there a way to circumvent Google’s censorship in China?



Google.cn is a Chinese language search service targeted towards users in the People’s Republic of China. It was launched on January 25 2006 and it filters search requests to content deemed to be “sensitive” by the government of China. (You can compare search results between the uncensored Chinese language Google.com and the censored Google.cn using the OpenNet Initiative’s Search Comparison tool.)

The filtering takes place in at least three ways:

  • de-listed domains: specific websites are removed entirely from search results; it is as if the website never existed.
  • de-listed urls: specific urls are removed from search results if they contain a de-listed domain.
  • restricted keywords: specific keywords are restricted to searches of web pages hosted in China only.

The New York Times reports that the Chinese government did not give Google a list of sites to block. Rather, Google set-up a computer in China and tested to see what content was accessible and content found to be inaccessible was deemed to be sensitive and added to Google’s blocklist.

For example, the website for Human Rights Watch (hrw.org), which is blocked in China, has also been de-listed from Google.cn. A normal web request to hrw.org from within China triggers an error and the content of the site never loads in a users browser. A search in Google using the modifier “site:” for content on hrw.org (site:hrw.org) on Google.cn yields no results. In China, it is as if hrw.org does not exist.

Is there a way to circumvent Google’s censorship in China?

Google has an advertisement program, Google Adsense/Adwords, that allows one to purchase certain keywords that will display an ad on Google when users search for those words. I created an account with Google Ads and selected that my ad be shown in Chinese to users in China. I noticed that a warning appeared indicating that there may be restrictions on advertising in China.

Google describes some specific categories of content that require licensing (local pdf). This list does not include content that may be sensitive for political reasons.

Due to advertising regulations and laws of the People’s Republic of China, Google AdWords requires advertisers to submit business licenses and approval certificates for the following product categories: Agricultural Chemicals Books/Periodicals Cosmetics Food/Foodstuffs Health Supplements Medical Appliances Medical Services Patents Real Estate Veterinary Medicine

I created my ad (which does not appear to fall under these categories) for hrw.org, which is censored by google.cn, and it was held in a queue waiting to be viewed and labeled “Family Safe”. Only “Family Safe” ads are allowed to be shown by Google in China. Eventually my ad was approved as “Family Safe” and was labeled as currently being shown.

However, my ad was initially not shown on Google.cn.

Google indicates that there is an ad for the search terms I selected, but it is not shown. I emailed Google for an explanation of why my ad was not being shown and was informed that there may be a technical error.

My ad was being shown on the uncensored Chinese language Google, but not the censored Google.cn. Google checks what ads to deliver by location (determined by IP address) and the language setting of your browser. Despite both of these showing that my language was Chinese and my location was in China the ad did not properly appear.

Eventually, my ad began to be shown on Google.cn. While my ad does not appear every time the keywords are searched, it does periodically appear. (See possible explanation below).

Although there are no search results available for hrw.org, my ad for a censored website did appear on some occasions. (See below for a possible explanation.)

This is a neat way to circumvent Google’s censorship. It may be possible to extend this even further. Mirror sites and alternative URLs for censored web sites can be displayed through the use of Google Ads.

More… »

No “Luv” 4 Google



Students for a Free Tibet’s have organized a Valentine’s Day boycott of Google in response to Google’s censorship of the google.cn search engine.

Google Compare



To help understand how the results of Google.com and Google.cn differ, the OpenNet Initiative has assembled a tool that lets you simultaneously compare search results. The tool is accessible here.

Google Media



I joined the BBC’s “World Have Your Say” program the other day to talk about Google.cn’s censorship practices. You can listen to the show here.

Also, I was on CBC’s “The Hour” talking about the COPA case and the US gov’t attempt to acquire some of Google’s search records.

Google.cn Filtering: How It Works



Google has opened a new Chinese-language search engine at www.google.cn that filters out results from sites that are considered “sensitive” by the Chinese government. In addition to filtering news.bbc.co.uk search results are also filtered for the human right groups hrw.org and hrichina.org and all of the geocities.com free hosting community. This filtering is quite similar to the filtering conducted by domestic Chinese search engines.

The filtering takes place in two ways:

1. de-listed domains: specific websites are removed entirely from search results; it is as if the website never existed.
2. de-listed urls: specific urls are removed from search results if they contain a de-listed domain.

For example, the domain news.bbc.co.uk has been removed from www.google.cn. Using Google’ “site:” modifier, a search for “site:news.bbc.co.uk” in google.cn returns no results and appears as if there is not such a website all. In addition to Google’s usual text that appears when searching for a non-existent website additional text appears informing the user that results have been removed to comply with local law.

However, using Google’s “inurl:” modifier, a search for “inurl:news.bbc.co.uk” does appear to return results although they are not listed and instead are replaced with text informing the user that results have been removed to comply with local law. Furthermore, a search for “site:bbc.co.uk inurl:news” shows that although bbc.co.uk is indexed and searchable the specific domain news.bbc.co.uk is not listed in the search results.

Another illustrative example is a search in google.cn for “site:www9.beijing999.com inurl:dmirror” versus the same search in google.com. In google.com 3 results are returned and all three are listed whereas google.cn returns 3 results but only lists 2 of them. The missing URL is “https://www9.beijing999.com/dmirror/http/mirror.epochtimes.com/gb/nf3154.htm” which contains the text “epochtimes.com/” in the URL path.

The website epochtimes.com is treated as a de-listed domain (site:epochtimes.com) however, a search with the modifier “inurl:” (inurl:epochtimes.com) does return results although none of the results are actual the requested website. But a search for “inurl:epochtimes.com/” (with a trailing slash) also returns results but does not list them for the user.

This fine grain control allows google.cn to keep websites such as “epochtimes.com.ua” in its index while eliminating epochtimes.com. There is similar fine grain control targeting Chinese language content. While there are results for “site:faluninfo.net” there are no results for “site:chinese.faluninfo.net“.

To be clear, this filtering only affects www.google.cn; users who choose to access Google’s Chinese language search engine at http://www.google.com/ig?hl=zh-CN are not subjected to this filtering.

While this filtering can be easily circumvented most users will simply use google.cn, since users from China are redirected there by default.
More… »