A while back I put together various lists of keywords that have been found to be censored in some way in China. I noticed that they’ve been floating around the Net so here’s a post explaining where each of the lists came from.
- badwords.txt – This is a list that was found on 163.com, a popular Chinese portal. It is unclear to me what the exact purpose of the list is.
Date: November 6, 2008
Source: http://sports.163.com/special/00051DT9/badwords.txt - banword.txt – This is a list that I found on TOM Online’s Skype servers. This is is not the list used in Tom-Skype (the actual Skype client), but appears to be part of another product, possibly “web chat” software of some kind.
Date: September 17, 2008
Source: http://tcc.skype.tom.com/ - keyword.txt – This is a keyword list from a blog provider in China.
Date: March 18, 2005
Source: Blog Provider in China - condopper.txt – This is the list of keywords found to be censored at the “gateway” level by the Concept Doppler project.
Date: June 18, 2008
Source: http://www.cs.unm.edu/~crandall/cd/GETRequestBlocked18June.html - qqdll.txt – This is the list of keywords found in QQ (Program Files\Tencent\QQGame\COMToolKit.dll) a popular Chinese instant messaging program.
Date: July 31, 2004
Source: http://bbs.omnitalk.org/arts/messages/3824.html
Thanks Nart, this is helpful!
You know what would be even more helpful? Could you note the date on which you received/discovered each of those lists? My understanding from people in the industry is that these lists change and evolve from week to week..
Thanks :)
Posted by Rebecca MacKinnon on November 25th, 2008.
Thanks RMac, I’ll add the dates.
Posted by nart on November 26th, 2008.
Don’t know if you saw it, but here’s another for the list: http://www.circleid.com/posts/thailands_blacklist_of_newly_banned_websites_leaked/
Posted by Kevin Donovan on December 22nd, 2008.