deterministic action prediction

transform

Observation -> Action

Query a trained policy with an environment observation to obtain the action corresponding to the peak of the probability distribution, skipping stochastic exploration.

Problem it solves

Stochastic exploration actions degrade performance during deployment or validation phases.

Consumes

Observation

Emits

Action

Distilled from 1 source

The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.

DLR-RM/stable-baselines3github