Restricted for use by site license. Title from disc label. Data type: Text. Data source(s): newswire, broadcast conversation, web collection. Application(s): automatic content extraction, cross-lingual information retrieval, information detection, machine translation. Author(s): Xuansong Li ... [et al.] "LDC2015T06"
Summary:
"GALE Chinese-English Parallel Aligned Treebank -- Training was developed by the Linguistic Data Consortium (LDC) and contains 229,249 tokens of word aligned Chinese and English parallel text with treebank annotations. This material was used as training data in the DARPA GALE (Global Autonomous Language Exploitation)
This resource is supported by the Institute of Museum and Library Services under the provisions of the Library Services and Technology Act as administered by State Library of Iowa.