A method (100) and apparatus (700) are disclosed for detecting and tracking human faces across a sequence of video frames. Spatiotemporal segmentation is used to segment (115) the sequence of video frames into 3D segments. 2D segments are then formed from the 3D segments, with each 2D segment being associated with one 3D segment. Features are extracted (140) from the 2D segments and grouped into groups of features. For each group of features, a probability that the group of features includes human facial features is calculated (145) based on the similarity of the geometry of the group of features with the geometry of a human face model. Each group of features is also matched with a group of features in a previous 2D segment and an accumulated probability that said group of features includes human facial features is calculated (150). Each 2D segment is classified (155) as a face segment or a non-face segment based on the accumulated probability. Human faces are then tracked by finding 2D segments in subsequent frames associated with 3D segments associated with face segments.

 
Web www.patentalert.com

> Image processing method and apparatus and storage medium

~ 00306