Collaborative Scene Perception with Multiple Sensing Modalities

Other Titles


With the increasing reliance on autonomous systems, there is a critical need for robots to perceive the world at least as good as a human does. This requires being able to take advantage of all the sensing modalities that are available to the robot and fuse them together to come up with the best estimate about the state of the observable surrounding. However, even with the tremendous research in the field of robot perception, there is still a long way for robots to serve as reliable teammates for humans in the wild. This dissertation explores gaps in four key areas affiliated to collaborative perception: choosing an apt feature representation, active perception, shared autonomy, and perception-enabled planning. First, a human-subject study is presented that reveals the challenges associated with current fusion models in situations when there is a human in the loop. The study depicts the unreliability of certain feature representations due to human errors that needs to be accounted for in subsequent decision-making steps. To facilitate active perception, a multi-stage question-answering scheme is proposed that helps the robot to seek specific human input with the goal of maximizing situational awareness. The algorithm is implemented on a ground robot and tested in a crowded environmental setting, proving its robustness. To develop a shared understanding of the surrounding in a search and rescue (SaR) mission, a deep learning-based approach is presented that fuses information from the visual and language domain. The fused knowledge is used to intelligently plan paths for a team of heterogeneous agents, resulting in safer paths while maintaining performance in terms of time to locate the victim. The approach is tested on the gazebo simulation platform. Finally, to bridge the gap between simulation and reality, specifically in the context of SaR missions, a dataset is developed with photo-realistic online images. A Bayesian fusion framework is developed for assessing danger from photo-realistic images and human language input. An extensive simulation campaign reveals that a danger-aware planner achieves a higher mission success rate compared to a naive shortest path planner.

Journal / Series

Volume & Issue


137 pages


Date Issued




Human-robot interaction; Multi-modal perception; Search and rescue robots


Effective Date

Expiration Date




Union Local


Number of Workers

Committee Chair

Campbell, Mark

Committee Co-Chair

Committee Member

Hariharan, Bharath
Ferrari, Silvia

Degree Discipline

Mechanical Engineering

Degree Name

Ph. D., Mechanical Engineering

Degree Level

Doctor of Philosophy

Related Version

Related DOI

Related To

Related Part

Based on Related Item

Has Other Format(s)

Part of Related Item

Related To

Related Publication(s)

Link(s) to Related Publication(s)


Link(s) to Reference(s)

Previously Published As

Government Document




Other Identifiers


Attribution 4.0 International


dissertation or thesis

Accessibility Feature

Accessibility Hazard

Accessibility Summary

Link(s) to Catalog Record