Holistic Scene Understanding for Real-Time Human-Autonomy Systems Interaction and Teaming
Access to this document is restricted. Some items have been embargoed at the request of the author, but will be made publicly available after the "No Access Until" date.
During the embargo period, you may request access to the item by clicking the link to the restricted file(s) and completing the request form. If we have contact information for a Cornell author, we will contact the author and request permission to provide access. If we do not have contact information for a Cornell author, or the author denies or does not respond to our inquiry, we will not be able to provide access. For more information, review our policies for restricted content.
Effective human-autonomy collaboration requires sophisticated testing environments and intelligent perception systems that respond in real time to dynamic situations. As autonomous robots become increasingly prevalent in society, their ability to understand complex scenes and interact naturally with human teammates becomes critical for applications ranging from search and rescue to healthcare and manufacturing. Current approaches face significant limitations: field testing lacks detailed ground truth for algorithm validation, while conventional perception systems operate without close coupling to the vehicle's motor control, introducing latency that hinders fluid interaction within human-autonomy teams. This dissertation addresses these challenges through two complementary research thrusts. The first contribution introduces the Real-Time Human Autonomous Systems Collaborations (RealTHASC) facility, a cyber-physical extended reality (XR) testbed that seamlessly interfaces real and virtual agents with photorealistic simulated environments. By integrating motion capture, wearable sensors, and virtual reality technologies, RealTHASC creates a unified framework where physical robots and humans interact with virtual agents through real-time feedback. Human subjects and robots in the laboratory space are transferred into synthetic environments as avatars that experience the virtual world through simulated sensors while maintaining kinematic coupling with their real counterparts. The second contribution presents a perception-in-action framework that enables robot-embodied visual understanding for real-time human-robot interaction. This event-driven approach integrates motor control and perception in a closed-loop system, allowing mobile robots to interact with nearby people using semantic labels and hand gestures while navigating complex environments. By coupling vision and feedback control designs, the system enables robots to respond to visual scenes through dynamic and autonomous functionalities, overcoming limitations of traditional computer vision techniques that rely on passive, open-loop processing. The framework exploits controlled camera motion to actively simplify visual perception tasks, creating a synergistic relationship between perception and action that enhances efficiency and responsiveness. Experimental results demonstrate effectiveness across diverse applications, with RealTHASC validated through case studies of mixed real/virtual interactions and the perception-in-action framework proving robust in challenging environments. Together, these advances provide a foundation for holistic scene understanding that facilitates natural, effective collaboration between humans and autonomous systems in complex scenarios.