JavaScript is disabled for your browser. Some features of this site may not work without it.
Unsupervised Statistical Segmentation of Japanese Kanji Strings

Author
Ando, Rie; Lee, Lillian
Abstract
Word segmentation is an important issue in Japanese language processing because Japanese is written without space delimiters between words. We propose a simple dictionary-less method to segment Japanese kanji sequences into words based solely on character $n$-gram counts from an unannotated corpus. The performance was often better than that of rule-based morphological analyzers over a variety of both standard and novel error metrics.
Date Issued
1999-07Publisher
Cornell University
Subject
computer science; technical report
Previously Published As
http://techreports.library.cornell.edu:8081/Dienst/UI/1.0/Display/cul.cs/TR99-1756
Type
technical report