Introduction
Researches
Online Systems
License Application
CKIP Members
Technical Report
Related Websites
Contact Us

 

 

 

 


Search All site
Search CKIP site
 
 
  The online systems below are partial research results of our laboratory, and are provided for everybody to use.
   
Chinese Parser
 

When a user input a Chinese sentence in this demo version, the system will automatically tag, parse and assign roles to the sentence. Then information about its text, POS and tree structure will be displayed.

 
Sinica TreeBank
 

Sinica TreeBank 3.0 contains 6 files, 61,087 syntactic tree structures, and 361,834 words. The tree structures were extracted from the Sinica Corpus, and every structure is segmented and parsed. Each segmented word of a tree structure is tagged with its part-of-speech and argument.

Sinica TreeBank 3.0 is provided free on the website for syntactic and semantic research use. 1,000 syntactic tree structures are available.

 
A Chinese Word Segmentation System with Unknown Word Extraction and Pos Tagging
 

This demo system provides users with an interface to input some text, e.g., a news article. Input text is processed through unknown word detection/extraction, and the final segmentation result returned. The result not only includes the word segmentation and unknown word list, but also the detailed processing and steps of unknown word detection/extraction.

 
Academia Sinica Balanced Corpus of Modern Chinese
 

"Academia Sinica Balanced Corpus of Modern Chinese", simplified as Sinica Corpus, is the first Balanced Modern Chinese Corpus with part-of-speech tagging. The preliminary version of Sinica Corpus was developed on a small-scale and opened to the academic community in 1994 with the major purpose of obtaining feedback. The present corpus (Sinica 3.0), which was completed in 1997, contains 5 million words.

 
Affix Database
 

This sub-corpus is composed of the following high-frequency initial and final morphemes retrieved from Sinica Corpus.

  • Initial Morpheme in Noun Compound : 1,135 (words with ambivalent meanings: 1,197)

  • Final Morpoheme in Noun Compound : 1,427 (words with ambivalent meanings: 1,610)

  • Initial Morpheme in Verb Compound : 735 (words with ambivalent meanings: 918 )

  • Final Morpoheme in Verb Compound : 282 (words with ambivalent meanings: 300)

There are 4,025 morphemes in total.

English meaning, POS, cilin, and examples are provided in each morpheme.

For Verb Compound, its English meaning, morphological rules, and examples are provided in each morpheme. The number of morphological rules varies in Verb Compound per se.

 
E-HowNet Ontology
 

Extended-HowNet (E-HowNet) is a lexical knowledge base evolved from HowNet and created by the CKIP (Chinese Knowledge and Information Processing) group. It consists of definitions for lexical senses and an ontology. The ontology is built by modifying HowNet taxonomy for sememes to denote taxonomic relations between concepts and attributes of concepts and aimed to construct a lexical knowledge database. It is a very important groundwork for E-HowNet project.