Visual Speech Recognition (VSR) is a process of understanding speech by interpreting visual information of speakers lip movement. Efficient and accurate mouth detection is an essential step in the field of speech recognition using visual-only signals. This research paper proposes a novel approach using Coordinate Based Super-pixel Segmentation algorithm (CBSS) to improve the accuracy of mouth segmentation. The proposed CBSS algorithm is able to robustly segment the mouth region that belongs to a given mouth shape. For the extracted mouth region, Discrete Cosine Transform (DCT) is applied to segregate the crucial features. Then the visual lip features are trained using Support Vector Machine (SVM) to recognize the speech. Experiments are conducted on in-house database with normal hearing persons and hearing impaired persons and also on publically available CUAVE databases. The results from the studies indicate that the proposed CBSS algorithm drastically improves the mouth detection accuracy compared to the existing techniques. This leads to significant improvement in recognition rate for identifying the isolated words.
Digital Object Identifier (DOI)
Sujatha, P. and Radhakrishnan, M.
"Mouth Segmentation Using Coordinate-Based Method for the Improvement of Visual Speech Recognition,"
Applied Mathematics & Information Sciences: Vol. 12
, Article 24.
Available at: https://dc.naturalspublishing.com/amis/vol12/iss4/24