Robots That Understand Natural Language Instructions and Resolve Ambiguities

Abstract: Verbal communication is a key challenge in human-robot interaction. For effective verbal interaction, understanding natural language instructions and clarifying ambiguous user requests are crucial for robots. In real-world environments, the instructions can be ambiguous for many reasons. For instance, when a user asks the robot to find and bring 'the porcelain mug', the mug might be located in the kitchen cabinet or on the dining room table, depending on whether it is clean or full (semantic ambiguities). Additionally, there can be multiple mugs in the same location, and the robot can disambiguate them by asking follow-up questions based on their distinguishing features, such as their color or spatial relations to other objects (visual ambiguities).While resolving ambiguities, previous works have addressed this problem by only disambiguating the objects in the robot's current view and have not considered ones outside the robot's point of view. To fill in this gap and resolve semantic ambiguities caused by objects possibly being located at multiple places, we present a novel approach by reasoning about their semantic properties. On the other hand, while dealing with ambiguous instructions caused by multiple similar objects in the same location, most of the existing systems ask users to repeat their requests with the assumption that the robot is familiar with all of the objects in the environment. To address this limitation and resolve visual ambiguities, we present an interactive system that asks for follow-up clarifications to disambiguate the described objects using the pieces of information that the robot could understand from the request and the objects in the environment that are known to the robot.In summary, in this thesis, we aim to resolve semantic and visual ambiguities to guide a robot's search for described objects specified in user instructions. With semantic disambiguation, we aim to find described objects' locations across an entire household by leveraging object semantics to form clarifying questions when there are ambiguities. After identifying object locations, with visual disambiguation, we aim to identify the specified object among multiple similar objects located in the same space. To achieve this, we suggest a multi-stage approach where the robot first identifies the objects that are fitting to the user's description, and if there are multiple objects, the robot generates clarification questions by describing each potential target object with its spatial relations to other objects. Our results emphasize the significance of semantic and visual disambiguation for successful task completion and human-robot collaboration.

  CLICK HERE TO DOWNLOAD THE WHOLE DISSERTATION. (in PDF format)