CBC’s The National Reports on the Citizen Lab

May 14, 2008

Microsoft: Censorship Notification Returns

May 13, 2008

Microsoft now has a censorship notification in the censored version of the search engine live.com that they provide for the Chinese market. The notification appears when search are made for particular keywords, however, the notification is not displayed when searches are restricted to censored domains. (See Degrading Transparency: Comparing Google, Yahoo and Microsoft for past reports).

May 13, 2008
Engine Presence Placement Specificity Connection Screenshot
Google Yes High
Notification is placed under results
Low
Mentions “local law”
Yes
Notification only appears when results are censored
screenshot
Yahoo Yes Medium
Notification is placed at the bottom of every page
Low
Mentions “local law”
No screenshot
Microsoft Yes* Medium
Notification when searching for particular “key words”.*
Low
Mentions “local law”
Yes* screenshot (2)

* Microsoft provides notification when searching for particular “key words”, however, no message appears when restricting the search to a censored web site.

U.S. Funded Health Search Engine Blocks ‘Abortion’

April 6, 2008

Wired reports that a health services search engine funded by the US Government blocks searches for the word “abortion” because of the possibilty that funding could be denied for project that “actively promote abortion”:

Called Popline, the search site is run by the Johns Hopkins Bloomberg School of Public Health in Maryland. It’s funded by the U.S. Agency for International Development, or USAID…

“We recently made all abortion terms stop words,” Dickson [the manager of the database at John Hopkins] wrote in a note to Gloria Won, the UCSF medical center librarian making the inquiry. “As a federally funded project, we decided this was best for now.”

It turns out that the block was prompted by complaints from the Bush administration:

“The items in question had to do with abortion advocacy — the two items dealing with abortion were removed following this inquiry, and the administrators made a decision to restrict abortion as a search term,” said Tim Parsons, a spokesman for the Johns Hopkins Bloomberg School of Public Health in Maryland.

Searches for “abortion” have been restored. However, it does not appear that the two removed articles were restored.

Youtube & Tibet

March 19, 2008
Filed Under: Geolocation

Since all of Youtube is currently blocked in China, I wondered if Youtube might start tagging videos of the protests in Tibet in order to have Youtube unblocked with the specific videos being blocked for users in China. But after running a few (definitely not comprehensive) Tibet related search terms all I found so far was that it appears that BBC videos are blocked for users in Great Britain:

GB,http://www.youtube.com/watch?v=FEkyrDdepBc
GB,http://www.youtube.com/watch?v=f4637Ez3-as
GB,http://www.youtube.com/watch?v=vg8AYs56RAY
GB,http://www.youtube.com/watch?v=3GzyfTOACDs
GB,http://www.youtube.com/watch?v=1Ew8oLFVVcc
GB,http://www.youtube.com/watch?v=h7R3J0NvfgE
GB,http://www.youtube.com/watch?v=FLg4aMDadYo
GB,http://www.youtube.com/watch?v=S4E1Rsaq3yc
GB,http://www.youtube.com/watch?v=NS_jvYTEhkQ
GB,http://www.youtube.com/watch?v=pLy6DkrjyHg
GB,http://www.youtube.com/watch?v=AUcIxq4hBuc
GB,http://www.youtube.com/watch?v=l1CTaq9sQM0
GB,http://www.youtube.com/watch?v=dviGSn5Wq0s
GB,http://www.youtube.com/watch?v=-T3LwA2mA4I
GB,http://www.youtube.com/watch?v=q8-eBuGsh-4
GB,http://www.youtube.com/watch?v=mS1FOSQCA3k
GB,http://www.youtube.com/watch?v=MR_NNGpWku4
GB,http://www.youtube.com/watch?v=TEQmJBINYj4
GB,http://www.youtube.com/watch?v=geJ9lRhSRdQ
GB,http://www.youtube.com/watch?v=HwRIzNyArRY
GB,http://www.youtube.com/watch?v=CPIKkf5w9TY
GB,http://www.youtube.com/watch?v=2NB91D-Da50
GB,http://www.youtube.com/watch?v=tUaX3Mw8qvg
GB,http://www.youtube.com/watch?v=AaJQip6bt7w
GB,http://www.youtube.com/watch?v=d7lDTuBhb2Y

YouTube,Geolocation & China

March 16, 2008

After reading this great post on the ONI blog, did a bit of testing myself. As Youtomb discovered There is a tag available through the YouTube API the indicates the country (or countries in some cases) to which YouTube will restrict access to the video. These videos are not (necessarily) blocked by the country itself, but by Youtube.

<media :restriction type=”country” relationship=”deny”>
TH
</media>

I’ve updated blockpage.com and started a new album for geolocation blockpages. In this case there is a pink line near the top which states “This video is not available in your country.”

As ONI and Youtomb note, there a variety of videos that have this tag. I’ve been able to confirm that the same behavior reported from Thailand occurs when flagged video as accessed from Germany and France. One of the videos about Thailand is marked:

“PL TH DE FR”,”http://www.youtube.com/watch?v=oU9iT3vEdWo”

I checked it from Thailand, Germany and France all experienced the same blocking behaviour. Here’s what I’ve found blocked so far based on the info in the ONI blog:

“TH”,”http://www.youtube.com/watch?v=A1USDXkaJFM”
“TH”,”http://www.youtube.com/watch?v=L4RX2cIDa4E”
“PL TH DE FR”,”http://www.youtube.com/watch?v=oU9iT3vEdWo”
“TH”,”http://www.youtube.com/watch?v=jVbUx4TPkVs”
“TH”,”http://www.youtube.com/watch?v=70m1ncXQjXA”
“TH”,”http://www.youtube.com/watch?v=4dFjO4ZJNDE”
“PF TF YT GP DE RE FR GF MQ PM PL”,”http://www.youtube.com/watch?v=lt2Zsr9bwlE”
“CN”,”http://www.youtube.com/watch?v=3Roy0BFaUtc”
“CN”,”http://www.youtube.com/watch?v=Ffw4-OMmchY”
“CN”,”http://www.youtube.com/watch?v=tzz9rZwFENA”
“CN”,”http://www.youtube.com/watch?v=C1oBcPtH5aU”
“CN”,”http://www.youtube.com/watch?v=liwgfyc1Im4″
“CN”,”http://www.youtube.com/watch?v=FeXZY4eVLlo”
“CN”,”http://www.youtube.com/watch?v=mnIuu73X8es”
“CN”,”http://www.youtube.com/watch?v=kmlDqPtHV-E”
“CN”,”http://www.youtube.com/watch?v=aPg1yvj7thA”
“CN”,”http://www.youtube.com/watch?v=-0D_oGgAGmI”
“CN”,”http://www.youtube.com/watch?v=53QwPeImmAA”
“CN”,”http://www.youtube.com/watch?v=XThGzqBYrh0″
“CN”,”http://www.youtube.com/watch?v=_FnwTj0OuFE”
“CN”,”http://www.youtube.com/watch?v=kdEULgZYxK8″

I’ve been unable to check out China because China is currently blocking all of Youtube. In short the 3 YouTube IP’s are blocked and “www.youtube.com” has been added as a “keyword”.

Although the detailed reference guide for the API does not contain information about the blocking tag, another section of the API has some information about the restrictions:

The restriction parameter identifies the IP address that should be used to filter videos that can only be played in specific countries. By default, the API filters out videos that cannot be played in the country from which you send API requests. This restriction is based on your client application’s IP address.

To request videos playable from a specific computer, include the restriction parameter in your request and set the parameter value to the IP address of the computer where the videos will be played – e.g. restriction=255.255.255.255.

To request videos that are playable in a specific country, include the restriction parameter in your request and set the parameter value to the ISO 3166 two-letter country code of the country where the videos will be played – e.g. restriction=DE.

ISP Filtering

March 5, 2008

After reading this great enumeration of various efforts to block accidental access to images of child sexual abuse I updated updated blockpage.com to include the blockpages from Sweden, Switzerland and Denmark.

This document notes many of the unintended consequences of filtering, especially overblocking, and it challenges the wisdom of making the blocking look like an error, as opposed to presenting the user with a blockpage:

Providing such a notice seems far more likely to achieve the intended objective of discouraging access to material that is illegal to possess, and raising public awareness of the fact that such a law exists, than merely providing a ‘page not found’ notice.

In the context of Sweden it also discusses threats to block the bit torrent tracker Pirate Bay by adding it to the child pornography blocklist. Mission creep is always present.

I’ve updated blockpage.com with the blockpage that users in Denmark see when they try to access Pirate Bay.

Filtering for the reason of copyright violation is reportedly gaining in Europe:

To recap, the Commission saw great merit in an anti-piracy system where Internet Service Providers (”ISPs”) would voluntarily agree to monitor their users and report the infringers to the industry reps or to the authorities, as well as possibly cut off their internet connection. From what we have heard from our sources at the Commission, a lot of the feedback they have currently received has been very supportive of the idea of filtering and monitoring. This has now emboldened some officials to push forward with plans to implement such voluntary EU-wide proposals, although nothing has yet been firmly decided. EU law clearly states that ISPs have no obligation to monitor and filter content, but the carrot they get from participating is that they are less likely to be sued by IFPI and others.

This is something that the Copyright Lobby has been slowly moving toward here in Canada.

Framing Censorship

February 28, 2008
Filed Under: Internet Censorship

Recently, Microsoft’s Bill Gates stated that in the end Internet censorship will not work. He suggested that resistance to Internet censorship will be “driven by business requirements” because “[r]estrictions on free speech will curtail business activity, and so commercial forces will work against censorship.” This is interesting because on one hand companies such as Microsoft along with Google and Yahoo! are already censoring their products, particularly search engines geared for the Chinese market. Microsoft has in fact decreased the level of transparency regarding the censorship of their Chinese search engine — they are moving further away from challenging censorship. On the other hand, Yahoo! has been asking the U.S. government to help free the Chinese dissidents it helped imprison and Gates’ comments seem to echo Google’s argument that censorship should be treated as a barrier to trade. Google has been lobbying the U.S. government on this issue and a resolution has recently passed in the European Parliament that is being interpreted as a a way to treat Internet censorship as a trade barrier. The EU resolution:

Calls on the Commission to specifically deal with all restrictions on the provision of Internet and information society services imported by European companies in third countries as part of its external trade policy and to regard all unnecessary limitations on the provision of those services as constituting trade barriers;

These developments are interesting because Internet censorship is almost exclusively framed within the realm of human rights, particularly Article 19 of both the Universal Declaration of Human Rights (UDHR) and the International Covenant on Civil and Political Rights (ICCPR) which state:

Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.

1. Everyone shall have the right to hold opinions without interference.

2. Everyone shall have the right to freedom of expression; this right shall include freedom to seek, receive and impart information and ideas of all kinds, regardless of frontiers, either orally, in writing or in print, in the form of art, or through any other media of his choice.

3. The exercise of the rights provided for in paragraph 2 of this article carries with it special duties and responsibilities. It may therefore be subject to certain restrictions, but these shall only be such as are provided by law and are necessary:

(a) For respect of the rights or reputations of others;

(b) For the protection of national security or of public order (ordre public), or of public health or morals.

Article 19 of the ICCPR includes a provision for the restriction of the right to freedom of expression as does Article 29 of the UDHR which states:

1. Everyone has duties to the community in which alone the free and full development of his personality is possible.
2. In the exercise of his rights and freedoms, everyone shall be subject only to such limitations as are determined by law solely for the purpose of securing due recognition and respect for the rights and freedoms of others and of meeting the just requirements of morality, public order and the general welfare in a democratic society.
3. These rights and freedoms may in no case be exercised contrary to the purposes and principles of the United Nations.

The interaction between freedom and restriction has made its way into the area of internet governance — if we can call it that. For example, in ICANN discussion on expanding gTLDs this interaction is quite prominent:

The string evaluation process must not infringe the applicant’s freedom of expression rights that are protected under internationally recognized principles of law.

Strings must not be contrary to generally accepted legal norms relating to morality and public order that are recognized under international principles of law.

It is often under the rubric of morality and public order and/or national security that Internet censorship is framed by those who seek its implementation or seek to justify its ongoing practice. The practice of “filtering” — the technical means of blocking online content — is growing. Increasingly, it is not the practice of filtering that is being challenged, the debate is about what content is being filtered. In other words, how the practice of filtering is being framed is the location where ideas about censorship are being contested. China, for example, justifies its extensive Internet filtering and surveillance systems by “stressing repeatedly that Chinese Internet minders abide strictly by laws and regulations that in some cases have been modeled on American and European statutes.” Chinese official Liu Zhengrong told the New York Times:

“If you study the main international practices in this regard you will find that China is basically in compliance with the international norm,” he said. “The main purposes and methods of implementing our laws are basically the same.”

With specific reference to surveillance, Liu noted:

“It is clear that any country’s legal authorities closely monitor the spread of illegal information,” he said. “We have noted that the U.S. is doing a good job on this front.”

The efforts by Google to frame Internet censorship as a trade barrier can be seen as an entrance into this contest of ideas. Such a framing has interesting potential consequences. First, it removes of reduces the moral component of human rights that anti-censorship activists have so heavily relied on. Making less money rather than protecting human rights because of the driving argument. But while international human rights agreements have little-to-no enforcement mechanisms trade agreements usually have quite explicit means through which disputes can be settled and decisions enforced. Since censorship often takes place in an environment with minimal, if any, transparency and accountability the resistance to censorship focuses on challenging these practices.

These range from research projects designed to document and expose current censorship practices, to legal challenges to the development and use of technologies. Combined, these efforts seek to challenge the norms surrounding the practice of filtering, change the policies of governments and ISPs and empower users to protect their privacy and exercise the right of free expression online.

Does framing censorship in terms of trade undercut the normative moral foundation of human rights based arguments or does it represent a means to an end, another tactic in the toolbox for anti-censorship activists? What are the consequences of linking Internet censorship and regimes that deal with trade barriers, particularly when this effort is lead by corporations, corporations that are already complicit in Internet censorship?

There have been past efforts to tie human rights to trade. The most notable case, especially relevant in terms of the efforts by Google to lobby the U.S. government to treat censorship as a trade barrier, concerned human rights and the most favoured nation (MFN) status afforded to China. In 1994 Bill Clinton extended China’s MFN status stating:

I am moving, therefore, to delink human rights from the annual extension of Most Favored Nation trading status for China. That linkage has been constructive during the past year, but I believe, based on our aggressive contacts with the Chinese in the past several months, that we have reached the end of the usefulness of that policy

The de-linking of trade and human rights has been characterized as a victory for China (Lynch 2002) and signals that re-linking the two in the context of censorship may be more difficult than it appears. However, in China, the United Nations, and Human Rights Ann Kent suggests that a major factor in the de-linking was that “the business community in particular opposed the linkage” (Kent 1999:72). The combination of China’s resistance and corporate lobbying which Robert Dreyfuss suggests was “led by Boeing, Motorola, Caterpillar, AT&T, and the American International Group (AIG)” eventually succeeded in pressuring the U.S. to de-link trade and human rights.

Underpinning this strategy is what John Garver calls China’s “negative instruments of leverage” (Deng and Wang 2005:225). In China Rising, edited by Deng and Wang, Garver suggests that China preferred to do without U.S. economic cooperation rather than capitulate to threats on human rights issues. This same strategy appears to be at play in terms of Internet censorship. China blocked Google’s search engine and news site entirely. Both of these Google services now censor results for users in China and full access has been restored. To be fair, Microsoft, Yahoo! and others also censor many of their services targeted for the Chinese market, Google has in may ways been the most transparent and demonstrated leadership in this area. Google has publicly engaged with their controversial decision to censor and has made the choice not to introduce services such as email. Yahoo!, which has long been censoring its search engine, does provide email services and has been complicit in the imprisonment of Chinese dissidents as a result. Microsoft initially followed Google’s lead but has since reduced its level of transparency. All three are involved in the effort to develop an industry code of conduct to guide the behaviour of corporations when faced with laws that interfere with human rights. While these companies have taken steps, albeit small ones, towards confronting censorship the extent of their resolve is unclear especially considering that full blocking is always an option that China has. Moreover, China may even have an incentive to block these companies as doing so privileges their domestic competitors. In the past, China has redirected users to domestic search engines when blocking foreign hosted ones.

Unfortunately, I don’t have much in the way of answers. In fact, I am left with questions: what are the consequences of creating a norm of filtering in which objections only concern the content targeted and not the practice itself? China, with the help of U.S. business, has manged to de-link trade and human rights in the past, does the fact that business is now favouring the link make a difference given their money and lobbying experience?? How will such a framing affect the prospect of enforcement that has escaped international human rights commitments in the past but been arguably successful in the arena of trade? Does the shift from framing censorship as a human rights violation to a trade barrier undermine the normative moral efforts of human rights organizations? Or does it enhance it? What are the prospects for success when China can just block these services wholesale as it has done in the past? Finally, is this just a distraction from the real issue — the complicity of these corporations in Internet censorship in China?

Pakistan & YouTube

February 23, 2008

UPDATE — In attempting to block access to YouTube, Pakistan ended up making YouTube inaccessible to everyone — not just everyone in Pakistan, but everyone! Martin A. Brown provides some of the technical details and a time line here (Thanks Steven!):

Just before 18:48 UTC, Pakistan Telecom, in response to government order (thanks nsp-sec-d) to block access to YouTube (see news item) started advertising a route for 208.65.153.0/24 to its provider, PCCW (AS 3491). For those unfamiliar with BGP, this is a more specific route than the ones used by YouTube (208.65.152.0/22), and therefore most routers would choose to send traffic to Pakistan Telecom for this slice of YouTube’s network.

I’ve updated blockpage.com with a screen capture from an ISP in Pakistan from the Don’t Block the Blog Campaign. As noted, since most ISPs route through the Pakistan Internet Exchange which only blocks IP addresses, many users in Pakistan won’t have access to YouTube at all. Users of the ISP TWA appear to have partial access.

The Global Voices Advocacy blog has good coverage of the story and has also posted a copy of the blocking order. (Older blocking orders from Pakistan available here, here and here.) But what I found interesting is that the blocking notice contains a full url to a video http://www.youtube.com/watch?v=o3s8jtvvg00 and the url in the blockpage is http://www.youtube.com/watch? which suggests that while the front matter at youtube.com may be accessible all the videos are not since they are accessed via /watch?. But perhaps the blockpage is incorrectly printing a partial url, but still, its something worth checking. The proxy is only blocking the targeted video.

Democracy “Magnified”

February 21, 2008

The “magnify” component of the Search Monitor project attempts to match the top ten results from Google/Yahoo with the top ten results form the China-specific versions of Google/Yahoo in order to note the similarities and differences in terms of censored, returned (the website is in the top ten of the both the .com and .cn versions of the search engine) and indexed (the website is in the top ten of the .com version, but not in the top ten of the .cn version, but is not censored). It also compares the results based on whether or not each website is hosted in China or ends in a .cn. This is taken as a measurement of “authorized” content that is unlikely to present information that China would block.

But, is this a worthwhile measurement?

Nine out of the top ten results for a search for 民主 in google.com and google.cn are the same. The only difference is that http://asiademo.org/ which appears as number two in google.com is censored in google.cn and, as a result, http://theory.people.com.cn/GB/49150/49152/5224247.html rounds out the top ten in google.cn.

But despite only having one censored site 7 of the top ten results for 民主 are either hosted in China or end in a .cn domain suffix. This number increases to 80% in google.cn

There is no overlap between yahoo.com and yahoo.cn, drastically different results are returned. Of the top ten results in yahoo.com 4 are censored in yahoo.cn.

All 10 of the results from yahoo.com are hosted outside of China and all 10 of the results from yahoo.cn are hosted inside China.

However, how does the content of the actual results match up? On this I require some help. Qualitatively, what content are users in China missing out on? How relevant are the censored sites?

Results for 民主 from google.com

http://zh.wikipedia.org/wiki/%E6%B0%91%E4%B8%BB
http://asiademo.org/
http://www.mzyfz-news.com.cn/
http://www.usembassy-china.org.cn/infousa/whatdm/GB/homepage.htm
http://www.mj.org.cn/
http://www.dphk.org/
http://www.dem-league.org.cn/
http://www.cndca.org.cn/
http://tag.blog.sohu.com/%C3%F1%D6%F7/
http://www.jfdaily.com.cn/epublish/gb/paper26/

Results for 民主 from google.cn

http://zh.wikipedia.org/wiki/%E6%B0%91%E4%B8%BB
http://www.usembassy-china.org.cn/infousa/whatdm/GB/homepage.htm
http://www.mzyfz-news.com.cn/
http://www.mj.org.cn/
http://www.dphk.org/
http://www.dem-league.org.cn/
http://theory.people.com.cn/GB/49150/49152/5224247.html
http://www.cndca.org.cn/
http://tag.blog.sohu.com/%C3%F1%D6%F7/
http://www.jfdaily.com.cn/epublish/gb/paper26/

Censored in google.cn:

asiademo.org

Results for 民主 from yahoo.com

http://zh.wikipedia.org/wiki/%E6%B0%91%E4%B8%BB
http://en.wikipedia.org/wiki/Democracy
http://www.asiademo.org/
http://www.dnc.org/
http://usinfo.state.gov/mgck/home/topics/democracy_human_rights/democracy.html
http://www.paulgraham.com/web20.html
http://www.dpj.or.jp/
http://home.computer.net/~pyd/clcb11.html
http://www.democracy.gov/dd/mgck_democracy_dialogues.html
http://www.cchere.net/tags/%C3%F1%D6%F7/

Results for 民主 from yahoo.cn

http://www.mzfz.gov.cn/
http://www.studa.net/minzhu/
http://npc.people.com.cn/GB/28320/41246/index.html
http://www.mzyfz.com/
http://www.taimeng.org.cn/
http://www.dem-league.org.cn/index.shtml
http://jb.mzfz.gov.cn/
http://cpc.people.com.cn/GB/104019/104098/6378610.html
http://www.civillaw.com.cn/Article/default.asp?id=35562
http://www.gongfa.com/minzhuzhuanti.htm

Censored in yahoo.cn:

www.dnc.org
usinfo.state.gov
www.asiademo.org
www.cchere.net

Does the hosted in China or ending in .cn measure make sense when the content itself is analyzed?

Wikileaks

February 20, 2008

Wikileaks, the transparency web site that allows anyone to upload leaked materials, was shut down after a California Judge ordered its domain registrar to:

immediately clear and remove all DNS hosting records for the wikileaks.org domain name and prevent the domain name from resolving to the wikileaks.org website or any other website or server other than a blank park page, until further order of this Court.

The site is still available here: http://88.80.13.160/

The Citizen Media Law Project has the case documents and analysis and the story has now been picked up by the mass media. But what’s caught my attention is who is not talking about it. Glad to see the usual suspects raising the issue.

Finland Filtering

February 19, 2008

Finland’s filtering system, put in place to block access to images of child abuse (child pornography) is blocking sites that do not match this criteria. In addition to blocking an anti-censorship activism site, the filtering seems to be significantly overblocking. EFFi reports:

The censorship supposedly applies only to foreign web sites that are used to distribute child pornographic images and the block list indeed reportedly contains such sites. However, many of the censored sites are apparently legal pornographic sites. Most of the censored sites are located in the United States or in the EU countries which have strict legislation against child pornography. Many of the censored US sites contain the 18 U.S.C. 2257 notice. Many of the blocked sites are link farms, without actual independent image content. The block list reportedly contains disproportionately many gay sites.The censorship however extends not only to the adult sites.

An interesting issue brought up in this case concerns links. The website of the anti-censorship activist Matti Nikki was censored after he published the blocklist as hyperlinks:

Previously the list of censored sites on Nikki’s site had just the names of the sites, not links. To enter a censored site one had to copy the site name to the address bar of the browser. The site was censored after Nikki had made the names of the sites clickable links (after which there was no need to manually copy the site names to the address bar of the browser). According to the police FAQ (in Finnish) the block list includes sites with “a working link to a site containing child pornography”. There is however no apparent legal basis for the distinction between not censoring a site with a written site name of an alleged child pornographic site, and censoring a site with the corresponding clickable link.

This is an interesting case as it shows how the lack of transparency and accountability can lead to practices that impinge on freedom of expression despite the intended goal of protecting children.

(More screen shots of block pages at blockpage.com)

Psiphon

February 15, 2008

Psiphon has been awarded top honours by Netxplorateur. Congratulations to all those who worked on Psiphon over the years.

Psiphon, an Internet censorship evading software project developed by the University of Toronto’s Citizen Lab has been deemed “the world’s most original, significant and exemplary Net and Digital Initiative” by a panel of French and international government, media and business experts. Psiphon was chosen first among 100 technology projects from around the world that were nominated for the Netxplorateur of the Year Grand Prix award.

News Cluster: China

February 13, 2008

There has been a flurry of articles on Internet censorship in China recently. One very interesting AFP article suggests that China may relax its restrictions and allow access to some sites currently blocked by the GFW:

Plans to tear down the so-called Great Firewall of China were being debated and a decision was expected soon, said Wang Hui, head of media relations for the organising committee…

“I believe you will be able to (access banned sites such as the BBC) but I can’t give you a promise yet. The relevant government departments are still working on it,” she said.

That’s something to keep an eye on for sure.

An article in The Guardian discusses the rapid growth of Internet usage in China the related effects. The article discusses how the Internet, and blogs in particular, have created “competing public opinions.” This is an interesting way to frame the topic as censorship in China is often characterized as monolithic when in fact there is a significant amount of competition in the realm of ideas. Even within a confined informational space there is considerable movement — what I’ve called wiggle room in the past — if one looks for it.

However, the article repeats the charge that China is exporting their Internet censorship technology:

Campaigners suspect China is passing its censorship know-how to Cuba, Vietnam and several African countries.

Now, I don’t doubt that others are looking at the forms of control China is applying to the Internet and evaluating how they too can keep the benefits, particularly economic, that come with the Internet while minimizing its use for free expression but I’m not so sure that this means that China is actively exporting censorship technology. As it currently stands, ONI found no filtering in Zimbabwe despite reports to the contrary. While Vietnam does censor the Internet it does so in a very different way than China does. Cuba may conduct a limited amount of filtering, but it is also much different than that in China. RSF reported:

There is hardly any censorship of the Internet in Internet cafes. Tests carried out by Reporters Without Borders showed that most Cuban opposition websites and the sites of international human rights organisations can be accessed using the “international” network. In China, filtering for key-words makes it impossible to access webpages containing “subversive” words. But, by testing a series of banned terms in Internet cafes, Reporters Without Borders was able to established that no such filtering system has been installed in Cuba.

While not ruling out the possibility, I am skeptical of this claim based on my experience with testing filtering systems in these countries. (What’s more interesting is that Comcast’s filtering in the USA is more like the GFW than any of these countries.)

The New York Times published an article that looks at the resistance to Internet censorship in China. It picks up on the theme of backlash that I’ve suggested comes about when over blocking occurs. When common web sites and services are blocked, it helps turn normally apolitical people into activists. The NYT reports:

For a vast majority of Internet users, censorship still does not appear to be much of a factor. The most popular Web applications here are games and messaging services, and the most visited Internet sites focus on everyday subjects like entertainment news and sports. Many, in fact, seem only vaguely aware that China’s Internet universe is carefully pruned, and even among those who know, a majority hardly seems to care.

But growing numbers of others are becoming increasingly resentful of restrictions on a wide range of Web sites, including Flickr, YouTube, Wikipedia, MySpace (sometimes), Blogspot and many other sites that the public sees as sources of harmless diversion or information. The mounting resentment has inspired a wave of increasingly determined social resistance of a kind that is uncommon in China.

The Financial Times reports that Guo Quan, a Chinese scholar, is planning to sue Google because a search for his name in google.cn is censored. If some one gives me the proper Chinese translation for his name I can check this out further. (In English it returns results, using 郭泉 results are also returned along with Google’s standard censorship notification. The name itself is a censored term as a search for it with a non-existent domain will produce the censorship notification as well. Yahoo.cn and Baidu produce no results. They will produce results if something is appended to the search (yahoo.cn, baidu)

The Atlantic published an article on censorship in China (it seems to be gone now, here are links to Google’s cache: 1, 2, 3, 4) that takes on the challenge of explaining the technical measures used to censor the Internet. The article also discusses circumvention and the self-censorship component that is so integral. The article concludes with some salient points regarding the important role of domestic censorship as well as the widening space for dialog:

It would be wrong to portray China as a tightly buttoned mind-control state. It is too wide-open in too many ways for that. “Most people in China feel freer than any Chinese people have been in the country’s history, ever,” a Chinese software engineer who earned a doctorate in the United States told me. “There has never been a space for any kind of discussion before, and the government is clever about continuing to expand space for anything that doesn’t threaten its survival.” But it would also be wrong to ignore the cumulative effect of topics people are not allowed to discuss.

However, the are several issues with the technical analysis as well as underlying tones of “exceptionlism” that obscure some of the bigger picture issues.There seems to be confusion over surveillance and filtering. Its best to think of filtering a set of rules, if packets contain something that violates the rules certain actions are taken. If a destination IP address is on a block list, the connection is not made, if packets contain certain keywords reset packets are sent to the source and destination to terminate the connection. Surveillance implies that someone is watching the traffic, or more logically it is stored, parsed and then someone looks at it. When surveillance and filtering are (con)fused together you get something strange like this:

Thus Chinese authorities can easily do something that would be harder in most developed countries: physically monitor all traffic into or out of the country. They do so by installing at each of these few “international gateways” a device called a “tapper” or “network sniffer,” which can mirror every packet of data going in or out. This involves mirroring in both a figurative and a literal sense. “Mirroring” is the term for normal copying or backup operations, and in this case real though extremely small mirrors are employed. Information travels along fiber-optic cables as little pulses of light, and as these travel through the Chinese gateway routers, numerous tiny mirrors bounce reflections of them to a separate set of “Golden Shield” computers.Here the term’s creepiness is appropriate. As the other routers and servers (short for file servers, which are essentially very large-capacity computers) that make up the Internet do their best to get the packet where it’s supposed to go, China’s own surveillance computers are looking over the same information to see whether it should be stopped.

If one conducts passive surveillance with a tap, one cannot then go back and interfere with the packets. For filtering, such a setup is not needed. You just route the traffic though something that filters — basically all routers can filter. The filter looks at the packets and matches them to the rules. There are no “tiny mirror” or whatever. If you want to conduct passive surveillance you can use a tap and record the traffic for analysis. The two things are not really related. Moreover, internet surveillance is not something that only China does or that is easier for China to do — a quick look at the most sophisticated internet surveillance system in world can demonstrate that.

On to the mechanisms:

DNS tampering
is explained well (although there may be some new variant). An important point is that most ISPs have their own DNS servers, managing a centralized system could be awkward (though not impossible), and users can use other uncensored DNS servers.

IP Blocking: This technique is incorrectly explained in the article.

While your signal is going out, and as the other system is sending a reply, the surveillance computers within China are looking over your request, which has been mirrored to them. They quickly check a list of forbidden IP sites. If you’re trying to reach one on that blacklist, the Chinese international-gateway servers will interrupt the transmission by sending an Internet “Reset” command both to your computer and to the one you’re trying to reach.

If packets are sent (trying to establich a tcp connection) for a particular IP and they pass through a router configured to block packets for that IP, the router will block those packets. Thats it. There is no connection ever made. If you sniff such a connection you will only see outgoing syn packets and nothing else. No reset packets are sent. There’s no “mirror” processing anything while you wait.

URL keyword block - This technique is actually the resest one described under IP blocking. If any part of the get request contains certain keywords — and domain names are often used as keywords — a reset packets will be sent to both the source and destination to terminate the connection. When is it triggered? This is confusing because the GFW’s keyword filtering is bi-directional but in my experience it is triggered on the way out of China. I say this because you can trigger it by requesting non-existent content. Depending on how long it takes to send the reset packet you may receive some of the content you requested which is what makes it appear that the filtering happens on the way in. After receiving reset packets the source and destination will not be able to connect to each other for a period of time.

Body Filtering - This is a bit of a tough one. Basically, if you create a web page with a keyword that normally triggers the reset packets if it appears in the url path, you can access it fine from China. I originally thought that this meant that body content was not filtered, but if you create a large page of such words the reset packets can be triggered. This may mean that a sampling of packet are checked, not all packets. In any case the behavior is the same as discussed above — the source and destination cannot connect to one another for a period of time. If you keep requesting the content you trigger more reset packets so t takes longer to be able to connect, but if you wait, and then trigger the reset packets again it won’t be longer the second or third time. There’s no escalating punishment.

Bi-directional keyword filtering

As Chinese-speaking people outside the country, perhaps academics or exiled dissidents, look for data on Chinese sites—say, public-health figures or news about a local protest—the GFW computers can monitor what they’re asking for and censor what they find.

Again, the keyword filtering is bi-directional, if you trigger it on connections to China the same behavior applies. Again, the issue of “monitoring” in this context implies that there’s something intelligent and deliberate about the filtering. If the packet matches the rules, it triggers the filtering mechanism, in this case reset packets.

Circumvention

Easy is a relative concept here. If a user chooses to break the law and acquires the necessary knowledge to by pass censorship then, yeah, it can be easy. You can buy vpn access — at least until lots of people start using and then it gets blocked - or use an encrypted proxy — at least until it gets blocked. They don’t need to block all VPNs, they can just block the IP addresses of those they want — those that become popular amongst citizens seeking to circumvent the GFW.

But despite the issues with the technical mechanisms the article is dead on with its conclusions:

What the government cares about is making the quest for information just enough of a nuisance that people generally won’t bother. Most Chinese people, like most Americans, are interested mainly in their own country. All around them is more information about China and things Chinese than they could possibly take in… When this much is available inside the Great Firewall, why go to the expense and bother, or incur the possible risk, of trying to look outside?

All the technology employed by the Golden Shield, all the marvelous mirrors that help build the Great Firewall—these and other modern achievements matter mainly for an old-fashioned and pre-technological reason. By making the search for external information a nuisance, they drive Chinese people back to an environment in which familiar tools of social control come into play.

Ding! We have a winner.

A Search for Human Rights

February 12, 2008

The Search Monitor Project: China focuses on assessing the level of transparency with regard to the self-censorship practices of search engine companies as well as the mechanisms and effects of this political censorship. (For background information, see this and this.) The following is a step by step process of a search for “human rights” (人权).

The first step is to retrieve a result set from the (uncensored) Chinese version of Google. Each result is parsed to its domain name (http://www.hrw.org/chinese/ becomes “www.hrw.org”).

The second step is to use the “site:” modifier to restrict results to the domain. The censored versions of Google and Microsoft can be queried directly, but Yahoo and Baidu must be queried from inside China because they are hosted inside China. This is because the bi-directional filtering of China’s “Great Firewall” (GFW) will block the inbound connections due to the presence of “www.hrw.org” in the search query. Conversely, search from inside China to Google or to Microsoft will be blocked because of the GFW — but since we are interested in search engine censorship it is necessary for us to remove the effects of the GFW.

The censored results for the top ten results from Google for the query 人权 are:

Keyword Translation Google MSN Yahoo Baidu
人权 human rights 1 / 10 3 / 10 1 / 10 1 / 10

The common site, censored by all four search engines, is www.hrw.org. This is the website of Human Rights Watch. Google, acting the most transparently, provides a notification that results have been removed and since the search has been restricted, using “site:” we can conclude that www.hrw.org is censored specifically and deliberately. Yahoo provides a notification, but since it appears at the bottom of every page regardless of whether the results are censored or not we are left to assume that it was never indexed because Yahoo China operates it web crawlers from behind the GFW. Baidu also operates its crawlers from behind the GFW so, like Yahoo, sites blocked by the GFW are not indexed. Microsoft uses the same de-listing mechanism as Google but has removed the censorship notification they formerly displayed. We therefore assume that it has been censored because there are no results when using the “site:” modifier (results do appear in the English version) but the lack of transparency reduces the accuracy of the claim.

The two other sites from (uncensored) Google’s top ten results for 人权 (human rights) are: zh.wikipedia.org and www.epicbook.com.

URL Google MSN Yahoo Baidu
www.hrw.org

Meta: org | 199.173.149.120 | 701 | US | UUNET - MCI Communications Services, Inc. d/b/a Verizon Business

Censored Censored Censored Censored
zh.wikipedia.org

Meta: org | 208.80.152.2 | 14907 | US | WIKIMEDIA Wikimedia US network

Indexed Censored Indexed Indexed
www.epicbook.com

Meta: com | 61.152.160.205 | 4812 | CN | CHINANET-SH-AP China Telecom (Group)

Indexed Censored Indexed Indexed

It is interesting that Microsoft censors wikipedia while Yahoo and Baidu index it because wikipedia is generally blocked by the GFW. A possible explanation is that due to the fact that the GFW is not 100% consistent or accurate with its keyword filtering the crawlers were able to index normally blocked sites.

I am not familiar with www.epicbook.com but it is hosted inside China and is thus an unlikely candidate to host information that the government would want to censor. The fact that it is index by all the other three search engines supports this. While this could be a case of “collateral damage” due to Microsoft’s lack of transparency (the possibility that it is not censored, it is just not indexed) it is indexed in the English version of Microsoft’s search engine.

The “magnify” component of this project attempts to match the top ten results from Google/Yahoo with the top ten results form the China-specific versions of Google/Yahoo in order to note the similarities and differences in terms of censored, returned (the website is in the top ten of the both the .com and .cn versions of the search engine) and indexed (the website is in the top ten of the .com version, but not in the top ten of the .cn version, but is not censored). It also compares the results based on whether or not each website is hosted in China or ends in a .cn. This is taken as a measurement of “authorized” content that is unlikely to present information that China would block.

As noted above, there is only one censored site, the other nine results are also returned in the top ten of the censored .cn version of Google.

However, even in google.com 4 of the top 10 sites are hosted in China or end in a .cn leaving only 6 sites to represent alternative information. When the censored site is removed, the google.cn version moves to 50/50 split between authorized and potentially unauthorized information. While this case doesn’t show a dramatic difference, other search queries, particularly those specific to contextually relevant information often do.

This yahoo.com vs. yahoo.cn comparison uses the top 10 results from yahoo.com for a comparison. With this result set 8 sites returned in yahoo.com are censored, the remaining 2 are indexed but not returned in the top 10 in yahoo.cn.

While all 10 results in yahoo.com are hosted outside of China all 7 results in yahoo.cn are hosted in China or end in a .cn. This helps show how significant the censored sites are in comparison.

Although the total number of censored sites may be low, especially when compared to the amount of indexed sites, the significance of these sites in providing alternative information should not be underestimated.

Search Monitor Project: China

February 8, 2008

Search engines are increasingly censoring their results, often by geographic location, having a significant, negative impact on the right to freedom of expression. The most advanced cases of censoring political content is in search engines that market a version of their product in China. This project aims to expose and monitor the censoring practises of search engines with a specific focus on China.

Building upon efforts to assess the level of transparency (reading this first is probably a good idea) with regard to search engine censorship, this project aims to compare the level of censorship across the China-specific search engines of Google, Yahoo, and Microsoft as well as the domestic Chinese search engine, Baidu. The goal of comparison poses some significant methodological problems as the presence or absence of censorship notification, mechanism of censorship (and irregularities therein) and physical location of the servers themselves all add additional layers of complexity.

In attempting to develop an automated system that can reasonably compare the search engines some additional methods which would be well suited to one search engine but for which comparable data could not be generated from the others have been delegated to separate search engine-specific projects. As a result of the focus on comparability the methods outlined below not only build upon existing research in this area but can hopefully explain some of the anomalies previously identified. After reviewing previous reports by Reporters Without Borders and Human Rights Watch I sketch out methods that attempt to provide an accurate, automated comparison between the search engines.

Previous Research

In June 2006, Reporters Without Borders (RSF) conducted a comparison of the four search engines Google, Yahoo, Microsoft, and Baidu (later updated to also include Sohu and Sina) by entering key words into the search engines and analyzing 1) the presence or absence of any results and 2) the content of the results by classifying each returned web site (URL) as either “authorized” or “unauthorized” which presumably refers to whether or not the source is controlled by or supports the government of China or whether it contains critical, alternative information. While this report is an innovative attempt and comparison is suffers from methodological issues that affect the accuracy of the results.

First, the report is actually about the ranking of results rather than censorship. The top ten results were analyzed based on their content, not on whether a web site had been censored (de-listed/removed) from the results set. While the removal of censored sites will likely affect the combination of “authorized” vs. “unauthorized” sources it does not tell us what sites are censored or if the “unauthorized” sites are not censored but just do not appear in the top ten results. Since localized search engines often algorithmically privilege sites in the local language, ending in the country’s domain suffix (e.g. .cn) and possibly even being hosted within the country it affects where foreign hosted “unauthorized” content appears in the result set. Thus an “unauthorized” site may not appear in the top ten results of the localized search engine even though it does in the uncensored version. Instead, the site may appear further down in the rankings.

Second, the testing of the search engines did not account for China’s national filtering system, often labeled the Great Firewall of China (GFW). Consequently, the results concerning “no results” and “no results + user banned” should actually be seen in reverse. Since Yahoo and Baidu are physically located in China the search queries made by RSF were filtered by the GFW on their way to the Yahoo and Baidu servers. If the same search were conducted from China, the search queries would not pass through the GFW and would not be filtered. The search queries RSF made to Google and Microsoft did not pass through the GFW because those servers are not located in China and therefore results were always returned. However, had those same search queries been made from China to Google and Microsoft they would have been filtered by the GFW and would have been designated “no results” and “no results + user banned”. The failure to account for the GFW prevented RSF from accurately interrogating the filtering of the search engines because a distinction was not made between filtering by the search engines and filtering by the GFW. If the tests had been conducted from inside, rather than outside, of China the report would have captured the behaviour experienced by users in China who are censored by both the GFW and the search engines and perhaps are agnostic about which one is doing the censoring since the result is the same: censorship.

In August 2006, Human Rights Watch (HRW) released an impressive and detailed comparison of Google, Yahoo, Microsoft, and Baidu. Two approached were used in this report: the first focused on identifying censored sites the second on whether or not the result set returned from a search for a specific key word query was censored. The first approach involved using a list of 25 websites and searching for each website in each search engine (using the site: modifier, discussed below, when possible). If a “censorship notification” appeared and there were no results the web site was censored, but the report also noted instances in which the message appeared but some partial results appeared as well. In other cases, there were no results and since there was also no censorship notification (or a censorship notification that always appears and has no relationship with the results) it was suspected that the web site was censored. In this way, HRW was able to determine how many of the 25 sites were censored in each search engine.

HRW tested from both inside and outside of China and was thus able to isolate search engine filtering from that conducted by the GFW of China. HRW notes that queries to Yahoo from outside China generated errors (as we saw in the RSF study) and we now know that this is due to the bi-directional filtering of the GFW (see below). The partially censored results (what I call “Page Censored” below) can result from at least two reasons. The first, is that some search queries automatically trigger the censorship notification regardless of whether the results have been censored or not and second because the filtering algorithms of the search engines are imperfect. Google, for example, does not handle port numbers properly and fails to remove such pages and Microsoft does not handle domains by their root (domain.com) and therefore sub-domains (www.domain.com or dom.domain.com) may not all be removed. Microsoft also does not properly handle URLs that begin with “https”. In such cases partial results may be available despite the search engines attempts to censor.

Another issue (which is still an issue in the methodology discussed below) concerns search engines not censoring pages directly. Both Yahoo and Baidu operate the crawlers that index websites from inside China and thus do not index sites that are blocked by the GFW. This removes the need for the search engines to censor their results, as the index itself is already censored by the GFW. This means that there is not a credible technical way to distinguish between sites that are not indexed and sites that are censored. Another issue is that the GFW is not perfect, and normally censored sites sometimes end up in Yahoo & Baidu’s index. There have also been some cases in which Yahoo has removed indexed sites — those not blocked by the GFW — and used a censorship notification as Google does and Microsoft did previously. Therefore, for the most part, Yahoo and Baidu do not need to censor their results, because their index is already censored because their crawlers operate form within China and cannot visit blocked sites to begin with.

The second approach used by HRW focused on the issue of keyword filtering by search engines. The question is simple enough, if I search for keyword “a” will I get censored results “b“? However, the lack of transparency on the part of the search engines makes the answer to this simple question difficult. HRW used a list of 25 keywords to query the search engines and inferred possible censorship by comparing the results from censored China-specific versions of Google, Yahoo and Microsoft and their US counter-parts as well as noting the appearance of a censorship message. (Baidu had no such counterpart at the time, but perhaps Baidu Japan can now be used for this purpose.)

Comparing result sets can be problematic because of the algorithmically determined rank of the results. What appears on page one in the top ten results in google.com may appear on page twenty-five in google.cn. In the case of Yahoo and Baidu GFW-censored sites are not indexed at all and so will never appear no mater what one searches for. Another method is to use the difference in the estimated page count as an indicator of censored results. But the estimated page counts can vary considerably between servers and language/region-specific versions. Microsoft, for example, returns very few Chinese language results in their default English language search engine making comparison virtually impossible. As noted by HRW, even the presence of the censorship notification may not be reliable. In some cases the censorship notification will appear based on the keywords in the query not on the results returned. (You can restrict the results to a non-existent site and still get the censorship message.) In other cases, it has nothing to do with what was used as a query for example, a non-politically sensitive term) but the censorship message appears because a URL has been removed/de-listed. In other cases, some keyword queries return results and a censor message not because results have been removed but because results are only returned from a set of “white listed” sites. Compounding the problem, the censorship message appears to be page specific (at least in the case of Google). That is, if one searches for keyword “x” and gets back ten results there may be no censorship message, but when one click on “Page 2” and gets results 11-20 which do contain a censored site the censorship notification will appear. (Therefore, if you set the preferences to retrieve 100 results will one may be more likely to encounter the censorship notification than if restricted to 10 results?).

HRW accounted for such variance through manually checking results in addition to the estimated page count comparisons and the presence of a censorship notification. Not only does this involve extensive manual labour but also an expertise in analyzing the content for political significance. For example, HRW manually assessed and compared the first three pages of search results for Yahoo and Yahoo China. HRW’s efforts in the regard stand out as an example of the quality needed for this line of research.

Methodology

The Search Monitor Project currently contains two related but separate components. The first, Generalized Comparison : Keywords and Urls, focuses on a generalized comparison between the China-specific versions of Google, Yahoo, Microsoft and Baidu. The second, Magnify: A Google-Google, Yahoo-Yahoo comparison, focuses on comparisons between the Chinese-language “global” versions of Google and Yahoo and their special censored China-specific versions.

While the core testing methods are the same, the Magnify: A Google-Google, Yahoo-Yahoo comparison contains some additional elements that allow for a more fine grained analysis. These will be noted below when appropriate.

Generating a URL Set

A set of sixty keywords have been selected covering the broad topical categories of censorship circumvention, falun gong/dafa, political sensitivities and social taboos. Search queries in “uncensored” engines (the Chinese language versions of Google or Yahoo) are used to generate lists of sites that are checked in censored search engines.

A query term, such as “人权” (human rights), is used to retrieve results from an “uncensored” search engine, such as google.com.

The websites from the “uncensored” results are parsed to retrieve the domain (including sub-domains).

A list of URL results, usually ten, are retrieved. A URL such as http://www.hrw.org/chinese/ is shortened to www.hrw.org

Each domain name is checked in each censored search engine.

Determining a Censored site

These domains are checked in the censored search engines using the “site:” modifier. The “site:” modifier restricts the results set to pages of a specific host name.

“site:www.hrw.org” (without the quotes) is used as a search term in in censored search engines to restrict the results to only those from the web site www.hrw.org

In cases where the censored search engine being tested display as special message indicating that results have been censored, a “censor message”, that relates to the specific search query domains that produce no results when queried with the “site:” modifier and contain a censor message are labeled as “Censored” while domains that return some results but contain a “censor message” are labeled as “Page Censored”.

In the cases where there is no censor message, or the censorship message appears on every page and bears no connection to the results, domains that produce no results when queried with the “site:” modifier are labelled as “Censored.”

Depending on the current behaviour of search engines there may be ad hoc additions.

http://www.google.cn/ - censored = censor message + 0 results, pagecensored = censor message + some results
http://www.live.com/?mkt=zh-cn - censored = 0 results, pagecensored = results that only contain urls beginning with “https” (no longer a censor message, failure to exclude “https” urls was noted when the censor message was in place and is thus used as “Page Censored”)
http://www.yahoo.cn/ - censored = 0 results, censor message is ignored because it appears on every page, it bears no relation to search results
http://www.baidu.com/ - censored = 0 results

It is important to note that sites that are simply not indexed by the search engine will appear as “Censored” thus possibly inflating the total amount to censorship attributed to search engines that do not have a censor message that is related to the results. This can be slightly compensated for by looking at the overlap of censored sites among search engines. In addition, since this is a normative project advocating transparency, should serve as an incentive for search engines to implement a censor message that is related to the results.

The Magnify: A Google-Google, Yahoo-Yahoo comparison project contains the classifications “Returned” and “Indexed” in addition to “Censored” and “PageCensored”. “Returned” refers to URLs from the the result set from the “uncensored” search engine that are returned in the result set from the censored search engine. “Indexed” refers to URLs from the the result set from the “uncensored” search engine that are not returned in the result set from the censored search engine, but are not censored. Using this method, the top ten results or a query in Google (Chinese) can be compared with the top ten results of Google China and can be categorized by “Returned”, sites that are common to both results sets, “Indexed”, sites in the top ten uncensored but not in the top ten of the censored results, and “(Page)Censored”, results that are actually censored. In addition, each URL in both result sets is check to see of it it hosted in China or ends in a .cn domain suffix.

The Great Firewall (GFW)

Borrowing a phrase from my colleagues Richard Clayton, Steven Murdoch and Robert Watson, it is necessary to ignore the filtering conducted by China to accurately test levels of censorship by the search engines themselves. As Clayton and Murdoch reveal, Internet traffic to and from China passes through a filtering system that is bi-directional - it affects both inbound and outbound traffic - (China also blocks outbound connections to IP addresses, but this does not interfere with the ability to test the search engines) which disrupts connections if the presence of particular keywords are detected. Often, China will designate a domain name as “key word” this disrupting access for any request that contains that domain name. This is important as queries directed to search engines hosted in China use the “site:” modifier followed by a domain name.

In order to avoid interference from the China’s filtering system, the China-specific versions of Google and MSN, which are hosted outside of China, are queried from outside of China and the China-specific versions of Yahoo and Baidu, hosted inside China, are queried from inside China.

The censored search engines, http://www.google.cn/ and http://www.live.com/?mkt=zh-cn are checked from outside of China.

The censored search engines, http://www.baidu.com/ and http://www.yahoo.cn/ are checked from inside of China.

In addition to affecting how to test each search engine, the location of the search engine to the GFW also affects how the search engines censor. Google and Microsoft, located outside of China, must remove, or de-list, specific sites from the results. Yahoo! and Baidu both operate their search spiders from inside China. The results in a situation where, because of China’s gateway filtering, the crawlers that index content for these search engines cannot access sites that China blocks.

61.135.166.102 - - [08/Feb/2008:08:05:40 -0500] “GET / HTTP/1.1″ 200 12258 “-” “Baiduspider+(+http://www.baidu.com/search/spider.htm)”

220.181.38.169 - - [08/Feb/2008:09:04:42 -0500] “GET / HTTP/1.1″ 200 12258 “-” “Baiduspider+(+http://www.baidu.com/search/spider.htm)”

60.28.17.38 - - [08/Feb/2008:11:46:31 -0500] “GET / HTTP/1.1″ 200 12258 “-” “Baiduspider+(+http://www.baidu.com/search/spider.htm)”

202.160.180.184 - - [07/Feb/2008:16:58:33 -0500] “GET /robots.txt/ HTTP/1.0″ 200 24 “-” “Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)”

202.160.180.96 - - [07/Feb/2008:16:58:35 -0500] “GET / HTTP/1.0″ 200 19068 “-” “Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)”

Thus Yahoo! rarely has to de-list specific websites, most are just not indexed in the first place. However, this also leads o situations in which sites blocked by China and de-listed by Google and Microsoft are index by Yahoo!. The GFW is not 100% effective and occasionally crawlers operating from inside China are able to index a normally blocked site which then appears in their search results.

It is also important to note that sites indexed by the search engines that are blocked by the GFW will still be inaccessible to users in China.

Types of Results

Generalized Comparison : Keywords and Urls

This component shows each of the keywords used as search queries in an “uncensored” search engine and the number of the URLs from that result set that are censored in the China-specific versions of Google, Yahoo, Microsoft and Baidu. It also shows the domains that are censored by any one of the four search engines and that domain’s status with regard to the other three engines. In this way we can compare the amount of censored URLs per search query across all four search engines as well as that build a list of censored domains and compare th level of censorship across all four search engines.

Magnify: A Google-Google, Yahoo-Yahoo comparison

This component focuses on two of the search engines that have comparable censored and “uncensored” versions: Google and Yahoo. Microsoft’s “global” search engine in English contains very few Chinese sites and cannot really be compared to their Chinese version. Even other versions such as Hong Kong and Taiwan have such drastically different results when compared to the Chinese version making it a difficult fit for this model of testing. (At this time Baidu Japan has not been sufficiently investigated but it may offer an opportunity for comparison with Baidu China.)

The data collection for section focuses on a direct comparison between google.com (in Chinese)/ google.cn and yahoo.com (in Chinese) and yahoo.cn. It expands upon the collection of censored sites and pages by looking at the overlap of returned pages – pages that appear in the results set for the same query for the same number of results in both search engines as well as indexed pages. It also tracks which sites are hosted in China or end in a .cn domain suffix.

Organized in this way the results raise questions regarding the nature of censorship process as well a the censored content. In terms of process, critical questions have been frequently posed concerning the specificity of the censorship requirements communicated to these search engines by the Chinese government. Are search engines given a list of keywords or a list of web sites that they are to censor? Or, is there just a general reference type of content leaving search engines to infer what exact content to block? How significant are the censored web sites since they only represent a small fraction of indexed sites? How frequently are users search results censored in relation to the topics they search for? Which specific web sites are actually censored? What type of web sites are censored?

In an effort to provide some insight regarding the question of process, the project will measure the overlap between all the search engines as well as subsets that are functionally similar (Google/Microsoft, Yahoo/Baidu). Overlap refers to the sites that are censored by multiple search engines. Overlap is analyzed in two ways: the first focuses on sites that are censored by all search engines tested, the second focuses on search engines that censor using similar mechanisms. While this allows for a comparison among search engines it also acts as an indicator of whether the search engines are responding to specific blocking requests, usually associated with an official order, or a general determination on the part of company, perhaps based on topic areas provided by officials.

Since the total number of censored sites is likely to relatively small compared to the total number of indexed sites , this project proposes a measure of significance in order to show just how important the censored sites are in relation to those displayed to the user. Significance refers to the the number of top ten sites returned from an uncensored search engine that are censored in the China-specific version in relation to those that are either hosted in China or that end in a .cn domain suffix. China could, presumably, take action against those sites under their jurisdiction without having to resort to blocking. In this context, these sites are considered to be “authorized” and are unlikely to contain information that presents an alternative perspective to that approved by the government. In this component results that are returned in the top ten along with those that are indexed but not displayed in the top ten are distinguished from those that are censored. The significance is demonstrated by the absence of top ten results outside of China’s control among a majority of sites that are within the top ten.

Analyzing the web sites found to be censored is an important indicator of the type of content that the government of China wants to block (or of the interpretation of this interest by the search engine companies). Content refers to the type of websites that are targeted for censorship, not the content of individual articles contained within them. This is accomplished through the creation of categories to which censored sites are assigned. This component is the most problematic for automatic determinations. Web sites could be classified using various services that provide categorized URLs but the results may be less than desirable. To operationalize this component properly likely requires, as suggested by Rebecca Mackinnon, analysis by “a team of near-native Chinese speakers who are highly tuned-in to what the sensitive media topics.” Thus this component remains the least developed aspect of the project.