Figure 1: Illustration of SENIOR. PGE assigns high task rewards for fewer visits and human-preferred states to encourage efficient exploration through hybrid experience updating policy, which will provide query selection for more valuable task-relevant segments. MDS select easily comparable and meaningful segment pairs with apparent motion distinction for high-quality labels to facilitate reward learning, providing the agent with accurate rewards guidance for PGE exploration. During training, MDS and PGE interact and complement each other, improving both feedback- and exploration-efficiency of PbRL.
We compare our method (SENIOR) with five baseline methods on six complex robot manipulation tasks in Meta-World, including Door Lock, Window Close, Handle Press, Window Open, Door Open and Door Unlock. PbRL methods include PEBBLE, MRN, RUNE, M-RUNE(RUNE with MRN), QPA, P-SENIOR(PEBBLE with SENIOR) and M-SENIOR(MRN with SENIOR). The experiment was repeated 100 times for each task.
Door Lock (feedback=250)
Window Close (feedback=250)
Handle Press (feedback=250)
Window Open (feedback=250)
Door Open (feedback=1000)
Door Unlock (feedback=1000)
Figure 2: Learning curves on six robotic manipulation tasks as measured by success rate. The solid line and shaded regions represent the mean and standard deviation, respectively, across five runs.
Figure 3:Comparison of success rates for six tasks at 500K and 1000K steps.
We also compare our method (SENIOR) with five baseline methods on four complex robot manipulation tasks in the real world, including Door Open, Door Close, Box Open and Box Close. PbRL methods include PEBBLE, MRN, RUNE, M-RUNE(RUNE with MRN), QPA and M-SENIOR(MRN with SENIOR). The experiment was repeated 20 times for each task.
Door Open (feedback=1000)
Door Close (feedback=50)
Box Open (feedback=250)
Box Close (feedback=250)
Figure 4: Success rate of simulation and real experiments on four tasks.