Information Extraction

Mining various kinds of information, such as relationships between named entities, protein-protein interactions, word translation pairs between Chinese and English or between Chinese and Japanese from Multi-language texts, further constructing networks, such as social networks, PPI networks or other knowledge to help better understand the essence of human lives, human cognition, and social activities etc. in order to facilitate the application of natural language processing technology to knowledge acquisition.
Particular areas include, but not limited to, relation extraction-extracting semantic relationships between named entities from news articles or biomedical literature; open information extraction-mining entities, concepts, and relationships from a vast amount of web pages; social network construction-building social networks from free texts via shallow semantic analysis; Bilingual lexicon construction-constructing bilingual lexicons between Chinese and English or between Chinese and Japanese from comparable corpora or from Wikipedia.

Cross-document Information Extraction and Fusion

Understanding the meaning of text is a long-term goal in the realm of natural language processing. To understand a text, it is critical to mine the semantics of its contained entities, entities relations and events which are tasks of Information extraction (IE). Our researches focus on Cross-document Information Extraction and Information Fusion. More specifically, our goal are: how to identify important facts (entities, relations and events) from web-scale multi-documents, how to track the various events involving important entities in temporal and spatial dimensions, how to identify and then extract the relation between cross-document events, How to merge multi-documents co-reference events into a complete event.

Statistical Machine Translation

Statistical machine translation (SMT) is different to the rule-based approaches to machine translation as well as with example-based machine translation, which generates translations on the basis of statistical models whose parameters are derived from the analysis of vast amounts of bilingual texts. Currently, the most free and famous translation platforms, such as Google, Baidu and Youdao, are all based on SMT. SMT belongs to an advanced application of natural language processing and includes many popular research topics: aligning bilingual documents and sentences automatically, construction of bilingual corpora, word alignment, Pre-processing, translation model, language model, decoder, Post-edit processing and machine translation evaluation etc. At present, we are very interested in document-level (discourse-level) translation and its corresponding evaluation methods.

Sentiment Analysis

Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials. Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall tonality of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).

Information Retrieval

Information retrieval (IR) is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching relational databases and the World Wide Web. There is overlap in the usage of the terms data retrieval, document retrieval, information retrieval, and text retrieval, but each also has its own body of literature, theory, praxis, and technologies. IR is interdisciplinary, based on computer science, mathematics, library science, information science, information architecture, cognitive psychology, linguistics, and statistics.

Shallow Semantic Parsing

Beyond the syntactical analysis of natural language sentences is the extraction of its semantic information. Semantic role labeling is one of such task of shallow semantic parsing, which identifies the verb and argument structure in natural language sentences, and is an important task toward natural language understanding.

Chinese Syntactic and Semantic Parsing

Any decent NLP applications would require the technology of automatic parsing, which in many cases refers to syntactic parsing but may sometimes include semantic parsing. In particular, syntactic parsing of the Chinese language incorporate Chinese word segmentation, word structure analysis/parsing, part-of-speech tagging and constituent/dependency parsing. Semantic parsing means determining automatically the semantic roles of each constituent, the thematic roles of each arguments of verbs and so on. Our research team focuses on both syntactic and semantic parsing of the Chinese language, hoping to produce both practical methods/algorithms/models for NLP applications, and to further our insights with respect to the mechanisms of human language understanding.