In the ubiquitous computing (ubicomp) vision, one persues the idea of equipping everyday objects with computational abilities. One open problem in such environments is multi-user distinction, because associating interactions with devices or objects to a certain user is difficult, but crucial for personalized services, which base on an interaction-history. Our solution uses a visual approach, using quite complex sensors, namely cameras. Therefore we split our 'problem-space' in a macro- and a micro-space with a final fusion of results. The macro-space comprises the whole room, i.e., all persons and their position, whereas the micro-space deals with special areas of interest and gives detailled information about the interactions of persons in it. The technique we are using is a Computer Vision tracking technique, allowing to track movements of persons in image sequences obtained by cameras. Together with homography estimations and skin-detection, we will be able to determine how many persons are in the room (tracking), where, approximately, the persons are located (homography) and who interacts with devices (skin-detection) by reaching towards or grasping them. This approach helps us to do a visual multi-user distinction and an assignment of interactions with devices to users in the environment.