Collected sources and patterns will appear here. Add from search, explore, or the patterns library.
Observation -> Action
Query a trained policy with an environment observation to obtain the action corresponding to the peak of the probability distribution, skipping stochastic exploration.
Problem it solves
Stochastic exploration actions degrade performance during deployment or validation phases.
Consumes
Emits
The real projects this mechanism was found in. Attribution is the point — this is how the best teams actually do it.