The invention provides a text segmentation apparatus comprising means for analyzing an electronic text to determine likelihood of segmentation point for each of sentence ends in the text based on a coherent unit and means for segmenting the text into text segments based on the likelihood of segmentation point. The apparatus is programmed to segment the text segment at the position having the best likelihood of segmentation point within the text segment when the size of any of the segmented text segments exceeds a threshold value to be determined based on the specified text segmentation size. Particularly, the apparatus determines the similarity between the text parts contained in a pair of windows to be set up on the left and right sides of each sentence end position in the text so as to obtain similarity curves. Then, the apparatus determines the likelihood of segmentation point for each sentence end point based on the obtained similarity curves. The apparatus segments the text at the point having the best likelihood of segmentation point and further segments it at the point of the second best likelihood of segmentation point, and so on, until the size of all of the text segments becomes approximately equal to the specified segment size.

 
Web www.patentalert.com

< System, method, and computer program product for representing object relationships in a multidimensional space

< Computer-implemented knowledge repository interface system and method

> Computer program and data structure for use in knowledge enhance electronic logic

> Method of fabricating a fractal structure for constructing complex neural networks

~ 00288