<no title>

class mindspore_rl.policy.EpsilonGreedyPolicy(input_network, size, epsi_high, epsi_low, decay, action_space_dim)[源代码]

基于给定的epsilon-greedy策略生成采样动作。

参数：

input_network (Cell) - 返回策略动作的输入网络。
size (int) - epsilon的shape。
epsi_high (float) - 探索的上限epsilon值，介于[0, 1]。
epsi_low (float) - 探索的下限epsilon值，介于[0, epsi_high]。
decay (float) - epsilon的衰减系数。
action_space_dim (int) - 动作空间的维度。

样例：

>>> state_dim, hidden_dim, action_dim = (4, 10, 2)
>>> input_net = FullyConnectedNet(state_dim, hidden_dim, action_dim)
>>> policy = EpsilonGreedyPolicy(input_net, 1, 0.1, 0.1, 100, action_dim)
>>> state = Tensor(np.ones([1, state_dim]).astype(np.float32))
>>> step =  Tensor(np.array([10,]).astype(np.float32))
>>> output = policy(state, step)
>>> print(output.shape)
(1,)

construct(state, step)[源代码]

构造函数接口。

参数：

state (Tensor) - 网络的输入Tensor。
step (Tensor) - 当前step, 影响epsilon的衰减。

返回：

输出动作。