This paper was accepted at “Reinforcement Studying for Actual Life” workshop at NeurIPS 2022.
Developments in reinforcement studying (RL) have impressed new instructions in clever automation of community protection. Nonetheless, many of those developments have both outpaced their utility to community safety or haven’t thought of the challenges related to implementing them within the real-world. To grasp these issues, this work evaluates a number of RL approaches applied within the second version of the CAGE Problem, a public competitors to construct an autonomous community defender agent in a high-fidelity community simulator. Our approaches all construct on the Proximal Coverage Optimization (PPO) household of algorithms, and embody hierarchical RL, motion masking, customized coaching, and ensemble RL. We discover that the ensemble RL approach performs strongest, outperforming our different fashions and taking second place within the competitors. To grasp applicability to actual environments we consider every technique’s capability to generalize to unseen networks and in opposition to an unknown assault technique. In unseen environments, all of our approaches carry out worse, with degradation different primarily based on the kind of environmental change. In opposition to an unknown attacker technique, we discovered that our fashions had lowered total efficiency although the brand new technique was much less environment friendly than those our fashions educated on. Collectively, these outcomes spotlight promising analysis instructions for autonomous community protection in the true world.