A self-calibrating system for finger tracking using sound waves
Sammanfattning: In this thesis a system for tracking the fingers of a user using sound waves is developed. The proposed solution is to attach a small speaker to each finger and then have a number of microphones placed ad hoc around a computer monitor listening to the speakers. The system should then be able to track the positions of the fingers so that the coordinates can be mapped to the computer monitor and be used for human-computer interfacing. The thesis focuses on the proof-of-concept of the system. The system pipeline consists of three parts: signal processing, system self-calibration and real-time sound source tracking. In the signal processing step four different signal methods are constructed and evaluated. It is shown that multiple signals can be used in parallel. The signal method with the best performance uses a number of dampened sine waves stacked on top of each other, with each sound wave having a different frequency within a specified frequency band. The goal was to use ultrasound frequency bands for the system but experimenting showed that they gave rise to a lot of aliasing, thus rendering the higher frequency bands unusable. The second step, the system self-calibration, aims to do a scene reconstruction to find the positions of the microphones and the sound source path using only the received signal transmissions. First the time-difference of arrival (TDOA) values are estimated using robust techniques centred around a GCC-PHAT. The time offsets are then estimated in order to convert the TDOA problem into a time-of-arrival (TOA) problem so that the positions of the receivers and sound events can be calculated. Finally a "virtual screen" is fitted to the sound source path to be used for coordinate projection. The scene reconstruction was successful in 80 % of the test cases, in the sense that it managed to estimate the spatial positions at all. The estimates for the microphones had errors of 11.8 +/- 5 centimetres on average for the successful test cases, which is worse than the results presented in previous research. However, the best test case outperformed the results of another paper. The newly developed and implemented technique for finding the virtual screen was far from robust and only found a reasonable virtual screen in 12.5 % of the test cases. In the third step the sound events were estimated, one sound event at a time, using the SRP-PHAT method with the CFRC improvement. Unfortunate choices of the search volumes made the calculations very computationally heavy. The results were comparable to those of the system self-calibration when using the same data and the estimated microphone positions.
HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)