On current computer systems the graphics card does not confined only to the graphic tasks but also employed for processing of general data from different area like medicine, computer vision, or financial fields. Modern Graphics Processor Units (GPUs) use stream processing (also known as SIMD1) and can therefore achieve a higher computational throughput than Central Processing Units (CPUs) for problems that can be computed in parallel. Therefore, in the last years they become more and more interesting for general purpose applications. For example in computer vision the automatic detection and tracking of features points through an image sequence is a basic algorithms that is used as a building block for many applications. Algorithm for detection and tracking of features points such as the Kanade-Lucas-Tomasi (KLT) algorithm inherently requires a large amount of parallelizable computations. This thesis presents an implementation of the KLT feature tracking algorithm for the GPU that exploits the parallel structure of the problem to significantly speed up the computation time. Thereby, the GPU version of the KLT algorithm is implemented with CUDA (Compute Unified Device Architecture). It is shown that the total computation time of the CUDA code is on average reduced by a factor of 3-5 compared to as similar implementation on the CPU.