Keyword Lists



A while back I put together various lists of keywords that have been found to be censored in some way in China. I noticed that they’ve been floating around the Net so here’s a post explaining where each of the lists came from.

  • badwords.txt – This is a list that was found on 163.com, a popular Chinese portal. It is unclear to me what the exact purpose of the list is.
    Date: November 6, 2008
    Source: http://sports.163.com/special/00051DT9/badwords.txt
  • banword.txt – This is a list that I found on TOM Online’s Skype servers. This is is not the list used in Tom-Skype (the actual Skype client), but appears to be part of another product, possibly “web chat” software of some kind.
    Date: September 17, 2008
    Source: http://tcc.skype.tom.com/
  • keyword.txt – This is a keyword list from a blog provider in China.
    Date: March 18, 2005
    Source: Blog Provider in China
  • condopper.txt – This is the list of keywords found to be censored at the “gateway” level by the Concept Doppler project.
    Date: June 18, 2008
    Source: http://www.cs.unm.edu/~crandall/cd/GETRequestBlocked18June.html
  • qqdll.txt – This is the list of keywords found in QQ (Program Files\Tencent\QQGame\COMToolKit.dll) a popular Chinese instant messaging program.
    Date: July 31, 2004
    Source: http://bbs.omnitalk.org/arts/messages/3824.html

3 comments.

  1. Thanks Nart, this is helpful!
    You know what would be even more helpful? Could you note the date on which you received/discovered each of those lists? My understanding from people in the industry is that these lists change and evolve from week to week..
    Thanks :)

  2. Thanks RMac, I’ll add the dates.

  3. Don’t know if you saw it, but here’s another for the list: http://www.circleid.com/posts/thailands_blacklist_of_newly_banned_websites_leaked/

Post a comment.