Human Rights Watch Report

Human Rights Watch has released a report on Internet censorship. It particularly focuses on the role U.S. corporations in censorsing their products in order to enter the Chinese market. The report’s title, “Race to the Bottom”, sums up the situation quite well.

(A mirror copy for those who cannot access the website is available here.[9.8mb .zip])

The report’s principal researcher and editor, Rebecca MacKinnon, did a great job of weaving together the filtering practices of Google, Yahoo!, MSN and Skype with reactions from the Chinese community, especially bloggers. Chapter 4 in particular is well worth the read. It really highlights the point that Internet censorship and surveillance is not merely an academic exersize, the choices these companies are making are having an effect on people’s lives. I know I tend to get bogged down in technical details sometimes (as I am about to do in the next paragraph) and it was a good reminder for me.

From a purely technical perspective I’ve been doing research on how and what Baidu, Google, Yahoo! and MSN etc… are censoring and have a few points I’ve managed to learn in the process.

  • Remember the GFW: When testing search engines always take note of where the server you’re connecting to is located. Remember that inbound requests to servers hosted in China containing specific keywords can trigger blocking, while outbound requests from China to servers hosted elsewhere for specific keywords can triger blocking. This blocking is done by China’s filtering system, it is completely independent from and filtering the search engine itself may be doing. If the connection is disrupted by the GFW you will not get anything back from the search engine (except spoofed RST packets :)). This is different that getting back a proper search engine page with 0 results. HRW corrected for this occurance, a recent RSF report did not.
  • De-Listed vs. De-Indexed: I’ve confused these two a lot in the past, so I am now trying to distinguish between the two, I’ve started to use the following terminology. There are two situations that occur when a site is censored from search engines: one, de-indexed (example: de-indexed the “site:” modifier is very helpful here), the site is treated as if it doesnt exist, it is as if it has never been indexed by the search engine, there are no results and two, de-listed (example: de-listed [and de-listed, but a bit weird because you can get some results on page 2 etc… ), the site has been indexed, the search engines identifies that there are some results for this site, but doesn’t display them. The distinction is subtle, but important.
  • Keywords: Keyword censorship occurs when a search query is censored based on the specific search words used. These are targetted, specific keywords that usually represent a sensitive topic. This could result in a case one is prevented from searching for certain words by the search engine or no results are returned for a search which arguably should return results. I have not encountered this situation. What generally happens is that search queries for certain keywords are restricted so that results only appear from certain sites. For example, in the past restricted searches for certain terms to “sites hosted in China” (See Ethan Zuckerman’s post about how the radio button (Search Chinese Web pages) was forced) and now restricts search queries for certain keywords to a whitelist of specific domains. The keyword filtering affects the results, but keywords are the trigger for the filtering.
  • Results: A comparison of results between an uncensored version of a search engine and its censored version often yeilds “missing results“. These missing results are sites that are prominently placed in the uncensored version but absent in the censored version. If a site is truly missing — absent from the results — it has generally been de-listed or de-indexed and can be checked with a “site:” modifier. This has nothing to do with the keywords used in the query — any words you search for (even innocuous words) that return a de-listed or de-indexed site in the results.

    If your expected site is lower in the censored version it is likely die to the algorithm used by the search engine. For example, .cn sites, sites hosted in China, and sites in Chinese appear higher in the Chinese version of Google. If your expected site is present but lower in the results it is not neccessarily a result of nefarious censorship even if the situation is one where, for example, locally hosted pro-government sites are ranked higher in the results than foreign hosted anti-government ones.

    The total number of results for a search query will also vary between “global” and “local” search engines, this may be further affected by any de-listed or de-indexed sites or by any keyword filtering that restricts the results set to certain sites.

The HRW report also has a list of recommendations. The recommendations try to find a balance between upholding the right to freedom of expression while working within a situation where there are limitations on this freedom. Rather than call for an embargo, the reccomendations focus on transparency and acountabilty while working to create change through engagement.

While I understand that the specific focus of the report is on China, a significant issue, raised by twofish, concerns the global nature of Internet censorship while most of the focus is solely on China. I don’t think the reccomendation by HRW were meant to neccessarily be exclusive to China. I think they have global applicability. Yahoo!, for example, appears to be turning over data to the NSA without a warrant, just as it does in China. The USA has the most sophisticated system of internet surveillance in the world which many contend is illegal and being aided by major technology corporations in the USA. Companies should abide by these reccomendations in all countries, not just China.

Post a comment.