Back to AI Research

AI Research

ScenePilot: Controllable Boundary-Driven Critical S... | AI Research

Key Takeaways

  • ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving Autonomous vehicles (AVs) are often tested using simulation, but...
  • Safety-critical scenarios are central to evaluating autonomous driving systems, yet their rarity in naturalistic logs makes simulation-based stress testing indispensable.
  • We propose ScenePilot, a feasibility-guided, boundary-driven framework that targets the boundary band: scenarios that are physically solvable in principle yet still cause the deployed autonomy stack to fail.
  • The code is available at this https URL .
  • ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving
Paper AbstractExpand

Safety-critical scenarios are central to evaluating autonomous driving systems, yet their rarity in naturalistic logs makes simulation-based stress testing indispensable. Most scenario generation methods treat surrounding agents as adversaries, but they either (i) induce failures without explicitly modeling vehicle-road physical limits, yielding visually extreme yet physically unsolvable crashes, or (ii) enforce physical feasibility or policy feasibility in isolation, which can over-focus on aggressive maneuvers or remain tied to a controller-dependent capability boundary. We propose ScenePilot, a feasibility-guided, boundary-driven framework that targets the boundary band: scenarios that are physically solvable in principle yet still cause the deployed autonomy stack to fail. We formulate generation as constrained multi-objective reinforcement learning, combining an RSS-derived physical-feasibility score $\sigma$ with an online-learned AV-risk predictor $\Phi$, and introduce step-level feasibility-aware shielding to keep exploration near the feasibility boundary while avoiding infeasible artifacts. Experiments on SafeBench with multiple planners show that ScenePilot yields substantially higher collision rates (+6.2 percentage points) while preserving physical validity, and that adversarial fine-tuning on these boundary-band scenarios consistently reduces downstream crash rates. The code is available at this https URL .

ScenePilot: Controllable Boundary-Driven Critical Scenario Generation for Autonomous Driving
Autonomous vehicles (AVs) are often tested using simulation, but generating scenarios that are both challenging and realistic remains a significant hurdle. Many existing methods create "adversarial" scenarios by forcing other vehicles to cause crashes, but these often result in physically impossible situations that don't actually help improve the AV's real-world performance. ScenePilot addresses this by focusing on the "boundary band"—a specific set of scenarios that are physically possible to solve but still difficult enough to cause the AV to fail. By targeting this gap, the framework helps developers identify and fix genuine weaknesses in their autonomy stacks.

The Problem with Current Testing

Most current simulation tools fall into two traps. First, some methods ignore the laws of physics, creating "crashes" that no driver—human or machine—could have avoided. These scenarios are visually dramatic but offer little diagnostic value. Second, other methods focus too heavily on the AV’s own limitations, creating scenarios that are only difficult because the AV's specific software is weak, rather than because the situation itself is inherently challenging. ScenePilot aims to bridge these two by separating physical reality from software capability, ensuring that the scenarios generated are both fair and informative.

How ScenePilot Works

ScenePilot uses a two-part scoring system to guide the generation of test scenarios:

  • Physical Feasibility ($\sigma$): This score is based on the Responsibility-Sensitive Safety (RSS) model. It calculates whether a collision is truly unavoidable given the laws of motion. If a scenario is physically solvable, the system keeps it in the "boundary band." * AV Risk ($\Phi$): This is an online-learned predictor that estimates how likely the AV is to fail in a given situation.
    By combining these, the framework uses reinforcement learning to push the simulation toward scenarios that are high-risk for the AV but still physically possible. It also uses "feasibility-aware shielding," which acts as a safety guardrail to prevent the simulation from drifting into physically impossible, nonsensical crashes.

Key Results

When tested on the SafeBench platform, ScenePilot demonstrated a significant improvement in stress-testing autonomous systems. It produced a higher rate of collisions—specifically, 6.2 percentage points higher than previous methods—while ensuring that these collisions were physically valid. Furthermore, when the researchers used these "boundary-band" scenarios to fine-tune their AV models, they observed a consistent reduction in crash rates in subsequent tests. This suggests that training on these specific, challenging, yet solvable scenarios is an effective way to make autonomous driving systems more robust.

What to Keep in Mind

The effectiveness of ScenePilot relies on its ability to distinguish between a failure caused by the environment and a failure caused by the AV's software. By focusing on the "boundary band," the framework provides a more nuanced way to evaluate safety. However, it is important to note that this approach is designed to complement existing safety testing rather than replace it. The framework is specifically tuned to find the limits of an autonomy stack, making it a powerful tool for developers looking to move beyond simple, passive driving logs and into active, targeted stress testing.

Comments (0)

No comments yet

Be the first to share your thoughts!