Google Search & Cache Filtering Behind China’s Great Firewall



The subject of Google cache filtering by China is not really new, but I had never seen explicit details posted about it. So here is the latest OpenNet Initiative bulletin that details what we’ve found. In short, China uses its keyword filtering technology to filter any HTTP GET requests that have the string ‘search?q=cache’ in them. This blocks the Google cache regardless of what IP address (and I found 37 such IP’s) is used to access the cache. The IP addresses themselves are not filtered, in fact, requests to any server for ‘search?q=cache’ will be disrupted. However, this can easily be circumvented.

First, by “disrupts access” I’m refering to the technique I described in OpenNet Initiative Bulletin 005 whereby HTTP GET requests containing banned keywords recieve an RST packet instead of the requested content. No errors or headers are returned the connection is simply terminated. Often attempts to reconnect to same host (with a different GET request) will not be successfull. This is due to the fact that after sending the RST the host advertises a ZeroWindow — no connections can be untill the host advertises a non-zero window size. The time it takes for a non-zero window size to be advertised will vary. This is why many user have reported being “banned” for periods of time after trying to access blocked content.

Moving on, the filtering of the Google cache can be circumvented by manually adding an ampersand symbol to the GET request e.g. ‘search?&q=cache’. However, if you try to access a cached copy of a URL that contains a different banned keyword, that keyword will trigger the blocking and this will not work.

However, the Yahoo & Gigablast cache’s are not filtered. But with Yahoo the same circumvention limitations apply (Yahoo includes the domain in the cache request) but Gigablast does not append the domain, so if you can get blocked sites to come up in your search results (i.e. searching for ‘zhuan’ in Gigablast will return results for falun dafa) you can access to the cached copy of these blocked sites.

3 comments.

  1. You might want to cite also:

    http://sethf.com/anticensorware/bess/google.php

    BESS vs The Google Search Engine (Cache, Groups, Images)

    Abstract: This report examines how N2H2’s censorware deals with
    archives of large amount of information. Three features are examined
    from the Google search engine (Cache, Groups, Images). N2H2/BESS is
    found to ban the cached pages everywhere, pass porn in groups, and
    consider all image searching to be pornography. The general problems
    of censorware versus large archives are discussed (i.e., why
    censorware is impelled to situations such as banning the Google
    cache).

  2. I dont really understand the filtering of IP adresses…can someone help me for that?

  3. This post explains it (http://www.nartv.org/?p=78).

Post a comment.