Title from disc label. "Authors: Eric Forsyth, Jane Lin, Craig Martell"--LDC catalogue. "LDC2010T05". Data type: Text. Data source: Text chat conversations. Applications: Named entity recognition, topic detection and tracking.
Summary:
"... consists of 10,567 English posts (45,068 tokens) gathered from age-specific chat rooms of various online chat services in October and November 2006. Each file is a text recording from one of these chat rooms for a short period on a particular day. Users should be aware that some of the conversations in this corpus feature subjects and language that some people may find offensive or objectionable, including discussions of a sexual nature. This corpus was developed by researchers at the Department of Computer Science, Naval Postgraduate School, Monterey, California."--LDC catalogue.
This resource is supported by the Institute of Museum and Library Services under the provisions of the Library Services and Technology Act as administered by State Library of Iowa.