Testing through proxies in China



I have been getting a lot of help requests lately as various people attempt to testing China’s Internet filtering system through open proxies located in China. While proxies can be useful for testing they should not be solely relied upon. Proxies are a useful way to demonstrate that content is accessible, however, proxies add an aditional possible point of failure when checking to see what is inaccessible.

Scenario 1: A request is made for a webpage that contains a banned keyword, “www.falundafa.org”, in the body of a webpage through a proxy in China to see if banned keywords in the body of a web page will be accessible or inaccessible. Both the connection on the computer that is connecting to the proxy (ME) and the connection on webserver (HOST) that the page is hosted on are being sniffed with a packet sniffer. Because I control two of the three computers this allows me to monitor two ends of the connection: the connection to the proxy and the connection from the proxy to the webserver.

I (ME) issue the request through the proxy server (PROXY) and get the requested page that contain banned keywords. It is accessible (200 OK).

0.296887 ME -> PROXY HTTP GET http://www.nartv.org/tests/test.html HTTP/1.1
19.686393 PROXY -> ME HTTP HTTP/1.1 200 OK (text/html)

The webserver (HOST) receives the request and answers the proxy’s request (200 OK).

0.338255 PROXY -> HOST HTTP GET /tests/test.html HTTP/1.1
0.341245 HOST -> PROXY HTTP HTTP/1.1 200 OK (text/html)

No problem is found in accessing web pages that include “www.falundafa.org”. China does not appear to dynamically block pages based on keywords in their content.

Scenario 2a: A request is made for a page containing innocuous content from the same previously accessible webserver, however, the file name contains the keyword “www.falundafa.org”. This is a test for keyword in URL path blocking. (This case shows outbound blocking beween the proxy and the host while allowing the connection between me and the proxy to remain).

I (ME) issue the request through the proxy server (PROXY) and receive an RST packet which terminates the connection (the page never loads). The requested content is inaccessible.

0.302997 ME -> PROXY HTTP GET http://www.nartv.org/tests/www.falundafa.org.html HTTP/1.1
0.602310 PROXY -> ME TCP www > 2207 [RST] Seq=1 Ack=2367417381 Win=1 Len=0

The webserver gets both the request from the proxy and reset packets which terminate the connection.

0.303520 PROXY -> HOST HTTP GET /tests/www.falundafa.org.html HTTP/1.1
0.304453 PROXY -> HOST TCP 4281 > www [RST] Seq=694 Ack=2293838462 Win=1 Len=0
0.304916 PROXY -> HOST TCP 4281 > www [RST] Seq=694 Ack=1 Win=0 Len=0

A request is issued through the proxy to the page that was previously successfully retrieved:

54.784373 ME -> PROXY HTTP GET http://www.nartv.org/tests/test.html HTTP/1.1
58.554942 PROXY -> ME HTTP HTTP/1.1 502 Proxy Error (text/html)

An HTTP error is received from the proxy. At this point divergence occurs with proxies. I cannot access the server that triggered the the blocking but I can still access the proxy but the proxy gives a 502 error because it cannot connect to the host because of continued RST packets.

The server gets more RST packets:

54.468168 PROXY -> HOST TCP 4391 > www [RST] Seq=1 Ack=2236767498 Win=0 Len=0
54.760082 PROXY -> HOST TCP 4391 > www [RST] Seq=1 Ack=1 Win=0 Len=0

Other content on other servers can be accessed through the proxy:

94.795413 ME -> PROXY HTTP GET http://www.gnu.org/ HTTP/1.1

But no content on the server which triggered the blocking can be accessed:

117.253580 ME -> PROXY HTTP GET http://www.nartv.org/ HTTP/1.1
118.435081 PROXY -> ME HTTP HTTP/1.1 502 Proxy Error (text/html)

The server gets more RST packets from the proxy:

117.218448 PROXY -> HOST TCP 4653 > www [RST] Seq=1 Ack=2164037076 Win=0 Len=0
117.500543 PROXY -> HOST TCP 4653 > www [RST] Seq=1 Ack=1 Win=0 Len=0

Sometimes proxies continue to work when accessing sites other than the one that triggered the blocking, as in the situation above, while others fail to connect to any sites because the connection between me and the proxy is disrupted with RST packets. This is a situation where the filtering is occuring on the outbound request.

Sometimes when requesting content that is blocked there is a disruption between me and the proxy because the request I give to the proxy, in plain text, is being filtered on the inbound request.

Scenario 2b: A request is made for a page containing innocuous content from the same previously accessible webserver, however, the file name contains the keyword “www.falundafa.org”. This is a test for keyword in URL path blocking. (This case shows inbound blocking beween me and the proxy, further connection between me and the proxy fail).

I (ME) issue the request through the proxy server (PROXY) and receive RST packets which terminates the connection (the page never loads).

52.758824 ME -> PROXY HTTP GET http://www.nartv.org/tests/www.falundafa.org.html HTTP/1.1
53.071331 PROXY -> ME TCP www > 2311 [RST] Seq=1 Ack=3653495283 Win=1 Len=0
53.071337 PROXY -> ME TCP www > 2311 [RST] Seq=1 Ack=3653495283 Win=1 Len=0

The server does not receive a connection from the proxy. the proxy does not issue my request to the server. The blocking occurred between me and the proxy.

Requests for any sites through the proxy will not be sucecssful, instead RST packets are received. This occurs when requests are made to the same previously accessible server as well any other web site. The connection between me and the proxy is disrupted.

73.807658 ME -> PROXY HTTP GET http://www.gnu.org/ HTTP/1.1
74.230743 PROXY -> ME TCP www > 2312 [RST] Seq=1070894714 Ack=2074992522 Win=0 Len=0
74.231351 PROXY -> ME TCP www > 2312 [RST] Seq=1070894714 Ack=2074992522 Win=0 Len=0

Scenario 3: Often, people want to test whether or not specific terms can be used as queries in search engines such as a request to Google for search queries that contain blocked keywords. To be clear, here we are testing if the request will go through to the search engine, not the filtering of results that the search engine itself may engage in.

Because Google appends search terms to its URL path, this is in reality a test for keyword in URL path blocking. It has nothing to do with Google specifically.

Given the behaviours described above one can see how testing search terms in Google can be difficult.

  • One cannot determine if the blocking is occuring between the requestor and the proxy or between the proxy and Google.
  • If the connection between the requestor and the proxy is disrupted (2b) further requests to Google will appear to be blocked. Although you may receive your browser’s cached copy of the initial Google search page (by pressing the back button in your browser) the connection to Google is actually blocked and further search requests will appear to be blocked whether or not they contain blocked keywords or random unblocked keywords.
  • In a scenario (2a) where the connection between the requestor and the proxy is fine, but the connection between the proxy and Google was blocked, searches for unblocked terms may appear to be blocked. Further complicating the issue, Google has many different IP addresses, so if the request from the proxy to Google was filtered further requests to that same Google IP will fail, but if the domain resolves to a different IP the request will go through.

This creates a situation where it is difficult to determine where the filtering is occuring (inbound or outbound), what is actualy blocked and what appears to be blocked because the connection to Google is blocked. Because Google has multiple IP addresses some search requests may get through (if the connection between the requestor and the proxy is ok as in Scenario A) making the proxy appear like it is functioning correctly. However, when connecting to the Google IP that originally triggered the blocking all search requests to that IP will fail and thus may inaccurately identify some keywords as blocked when in fact they are accessible.

Post a comment.