Temporal and Physical Reasoning for Perception-Based Robotic Manipulation
Accurate knowledge of object poses is crucial to successful robotic manipulation tasks, and yet most current approaches only work in laboratory settings. Noisy sensors and cluttered scenes interfere with accurate pose recognition, which is problematic especially when performing complex tasks involving object interactions. This is because most pose estimation algorithms focus only on estimating objects from a single frame, which means they lack continuity between frames. Further, they often do not consider resulting physical properties of the predicted scene such as intersecting objects or objects in unstable positions. In this work, we enhance the accuracy and stability of estimated poses for a whole scene by enforcing these physical constraints over time through the integration of a physics simulation. This allows us to accurately determine relationships between objects for a construction task. Scene parsing performance was evaluated on both simulated and real- world data. We apply our method to a real-world block stacking task, where the robot must build a tall tower of colored blocks.
READ FULL TEXT