Subventions et des contributions :

Retour à la page de recherche

Titre :

Engineering Robust 3D Representations from Robotic Visual Sensors for Navigation & Scene Analysis

Numéro de l’entente :

RGPIN

Valeur d'entente :

140 000,00 $

Date d'entente :

10 mai 2017 -

Organisation :

Conseil de recherches en sciences naturelles et en génie du Canada

Location :

Ontario, Autre, CA

Numéro de référence :

GC-2017-Q1-01861

Type d'entente :

subvention

Type de rapport :

Subventions et des contributions

Informations supplémentaires :

Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)

Nom légal du bénéficiaire :

Zelek, John (University of Waterloo)

Programme :

Programme de subventions à la découverte - individuelles

But du programme :

Whether we are walking or driving, we are constantly making mental maps to determine where the pathway/road, landmarks, objects and other human agents are situated and their relationships to each other. These mental maps and models are what allow us to navigate to a particular location, drive to work, mow our lawns or clean our homes. We also build mental maps when we reason or plan certain activities like repairing a fence or painting a home. Transferring the ability to build such a mental map for a robot is what drives our research. Humans are able to chiefly do this with our eyes. What if we could do the same for a robot with a camera? The robot would then be able to understand and reason in the world so as to perform a task. A mental map is just a snapshot in time. This map has to also include differentiating moving entities from static landmarks and placeholders. Depending on the task at hand, the level of detail will vary. A complete 3D reconstruction of the static components is relevant for analysis, re-engineering or possibly 3D printing the objects and environments. Two visual techniques that are used to build 3D maps include SFM (Structure from Motion) and SLAM (Simultaneous Localization And Mapping). The two methods are very similar, the differentiating factor being that SFM is typically off-line while SLAM is online. Both methods include a front end which detects features of interest and uses photogrammetry to associate these data points between views. The back-end for both methods is an optimization method that minimizes re-projection errors. Both processes may be easy or difficult depending on the sensors being used, the level of complexity in the environment and the required performance. Autonomous automobiles benefit from using a LIDAR sensor which provides precise environmental measurements and a GPS which provides location information. However, for many real world applications, just relying on visual SLAM/SFM is not robust. SLAM/SFM solutions can be very brittle when the camera makes sharp corners such as in an office building where the tracking of pose fails and views cannot be registered. There is a heavy reliance on the feature point data association & a lack of features or highly repetitive features can be problematic. SLAM/SFM are essentially problems of geometry whereas deep learning neural networks have had recent immense success in visual perception, especially in categorizing scene labels. Deep Learning's scene labels can guide feature point associations as well as geometrical, photometric & other consistency checks to help make visual SLAM/SFM robust. Problems of interest include (1) automating inexpensive navigating machines such as industrial cleaners with only a camera system; (2) building historical maps for municipal infrastructure monitoring such as for roads, bridges; and others. A robust SLAM/SFM solution can help solve these applications and others.