Sign languages, being as complex as any spoken ones, have thousands of signs each distinguishable from the other by small changes in hand shape, motion, location or non-manual expressions, e.g. in face. This, together with inter-speaker differences and the complexities of sign grammars make sign language recognition an intricate challenge, both from machine learning and computer vision perspectives.
Many of the published solutions use data gloves or (several) RGB cameras to track the hands and extract relevant features. In this thesis, mainly the vision part of the problem is addressed: we want to investigate the benefits and shortcomings of using a time-of-flight camera, which produces a real-time stream of depth data. Ideally, our system should work in (near) real time as well. Currently, we concentrate on hand shape recognition.
The thesis is in a very early stage and therefore only a problem overview will be given.