A Computational Model Of The Intelligibility Of American Sign Language Video And Video Coding Applications
Real-time, two-way transmission of American Sign Language (ASL) video over cellular networks provides natural communication among members of the Deaf community. Bandwidth restrictions on cellular networks and limited computational power on cellular devices necessitate the use of advanced video coding techniques designed explicitly for ASL video. As a communication tool, compressed ASL video must be evaluated according to the intelligibility of the conversation, not according to conventional definitions of video quality. The intelligibility evaluation can either be performed using human subjects participating in perceptual experiments or using computational models suitable for ASL video. This dissertation addresses each of these issues in turn, presenting a computational model of the intelligibility of ASL video, which is demonstrated to be accurate with respect to true intelligibility ratings as provided by human subjects. The computational model affords the development of video compression techniques that are optimized for ASL video. Guided by linguistic principles and human perception of ASL, this dissertation presents a full-reference computational model of intelligibility for ASL (CIM-ASL) that is suitable for evaluating compressed ASL video. The CIM-ASL measures distortions only in regions relevant for ASL communication, using spatial and temporal pooling mechanisms that vary the contribution of distortions according to their relative impact on the intelligibility of the compressed video. The model is trained and evaluated using ground truth experimental data, collected in three separate perceptual studies. The CIM-ASL provides accurate estimates of subjective intelligibility and demonstrates statistically significant improvements over computational models traditionally used to estimate video quality. The CIM-ASL is incorporated into an H.264/AVC compliant video coding framework, creating a closed-loop encoding system optimized explicitly for ASL intelligibility. This intelligibility optimized coder achieves bitrate reductions between 10% and 42% without reducing intelligibility, when compared to a general purpose H.264/AVC encoder. The intelligibility optimized encoder is refined by introducing reduced complexity encoding modes, which yield a 16% improvement in encoding speed. The purpose of the intelligibility optimized encoder is to generate video that is suitable for real-time ASL communication. Ultimately, the preferences of ASL users determine the success of the intelligibility optimized coder. User preferences are explicitly evaluated in a perceptual experiment in which ASL users select between the intelligibility optimized coder and a general purpose video coder. The results of this experiment demonstrate that the preferences vary depending on the demographics of the participants and that a significant proportion of users prefer the intelligibility optimized coder.
Hemami, Sheila S
Johnson Jr, Charles R.; Gay, Geraldine K
Ph. D., Electrical Engineering
Doctor of Philosophy
dissertation or thesis