Bridging Adaptivity and Safety:
Learning Agile Collision-Free Locomotion
Across Varied Physics

Under review

Abstract

Real-world legged locomotion systems often need to reconcile agility and safety for different scenarios. Moreover, the underlying dynamics are often unknown and time-variant (e.g., payload, friction). In this paper, we introduce BAS (Bridging Adaptivity and Safety), which builds upon the pipeline of prior work Agile But Safe (ABS) and is designed to provide adaptive safety even in dynamic environments with uncertainties. BAS involves an agile policy to avoid obstacles rapidly and a recovery policy to prevent collisions, a physical parameter estimator that is concurrently trained with agile policy, and a learned control-theoretic RA (reach-avoid) value network that governs the policy switch. Also, the agile policy and RA network are both conditioned on physical parameters to make them adaptive. To mitigate the distribution shift issue, we further introduce an on-policy fine-tuning phase for the estimator to enhance its robustness and accuracy. The simulation results show that BAS achieves 50% better safety in dynamic environments while maintaining a higher speed on average. In real-world experiments, BAS shows its capability in complex environments with unknown physics (e.g., slippery floors with unknown frictions, carrying up unknown payloads up to 8kg), while baselines lack adaptivity, leading to collisions or degraded agility. As a result, BAS achieves a 19.8% increase in speed and gets a 2.36 times lower collision rate than ABS in the real world.

Adapt to Online Environment Changes

Baseline Comparisons

Adapt to Variant Friction

BAS
ABS
RMA+Lagrangian

Safety Test

Agility Test

Adapt to Terrain Properties

BAS
ABS
RMA+Lagrangian

Safety Test

Agility Test

Adapt to Payloads (5kg)

BAS
ABS
RMA+Lagrangian

Safety Test

Agility Test

Vanilla Agility-Safety Test

BAS
ABS
RMA+Lagrangian

Safety Test

Agility Test

Method

BAS framework

  1. Training architecture: There are four trained modules within the BAS framework:
    1. Agile Policy is trained to achieve the maximum agility amidst obstacles;
    2. Reach-Avoid Value Network is trained to predict the RA values conditioned on the agile policy as safety indicators;
    3. Recovery Policy is trained to track desired twist commands (2D linear velocity and yaw angular velocity) that lower the RA values;
    4. Estimator is trained to predict environmental factors like mass of payload, frictions, etc. concurrently with the policies.
  2. Deployment architecture: The dual policy setup switches between the agile policy and the recovery policy based on the estimated V̂ from the RA value network:
    1. If RA Value < Vthreshold, the agile policy is activated to navigate amidst obstacles;
    2. If RA Value ≥ Vthreshold, the recovery policy is activated to track twist commands that lower the RA values via constrained optimization.