Keras模型:RL代理的输入形状尺寸错误 [英] Keras model: Input shape dimension error for RL agent

查看:89
本文介绍了Keras模型:RL代理的输入形状尺寸错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是开发一个 DQN代理 ,它将根据特定的策略/政策选择其行动.我以前曾在OpenAi体育馆环境中工作,但现在我想创建自己的RL环境.

在此阶段,代理要么选择随机动作,要么根据深度神经网络(在 DQN 类中定义)给出的预测选择其动作. .

到目前为止,我已经设置了神经网络模型和我的环境. NN将接收状态作为其输入.这些状态表示从9.5到10.5(9.5、9.6,...,10.4、10.5)的11个可能的标量值.由于我们正在处理RL,因此代理会在训练过程中生成其数据.输出应为0和1,与建议的操作相对应.

现在,我想为代理商提供一个标量值:例如x = 10的样本状态并让他决定要采取的动作(调用了Agent.select_action()),我遇到了与输入形状/输入尺寸有关的问题.

代码如下: 1. DQN类别:

class DQN():

    def __init__(self, state_size, action_size, lr):
        self.state_size = state_size
        self.action_size = action_size
        self.lr = lr

        self.model = Sequential()
        self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))
        self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(self.action_size, activation='linear'))

        self.model.compile(optimizer=Adam(lr=self.lr), loss='mse')

        self.model.summary()


    def model_info(self):
        model_description = '\n\n---Model_INFO Summary: The model was passed {} state sizes,\
            \n {} action sizes and a learning rate of {} -----'\
                            .format(self.state_size, self.action_size, self.lr)
        return model_description

    def predict(self, state):
        return self.model.predict(state)

    def train(self, state, q_values):
        self.state = state
        self.q_values = q_values
        return self.model.fit(state, q_values, verbose=0)

    def load_weights(self, path):
        self.model.load_weights(path)

    def save_weights(self, path):
        self.model.save_weights(path)

2.座席类别:

NUM_EPISODES = 100
MAX_STEPS_PER_EPISODE = 100
EPSILON = 0.5 
EPSILON_DECAY_RATE = 0.001
EPSILON_MIN = 0.01
EPSILON_MAX = 1
DISCOUNT_FACTOR = 0.99
REPLAY_MEMORY_SIZE = 50000
BATCH_SIZE = 50
TRAIN_START = 100
ACTION_SPACE = [0, 1]
STATE_SIZE = 11 
LEARNING_RATE = 0.01

class Agent():
    def __init__(self, num_episodes, max_steps_per_episode, epsilon, epsilon_decay_rate, \
        epsilon_min, epsilon_max, discount_factor, replay_memory_size, batch_size, train_start):
        self.num_episodes = NUM_EPISODES
        self.max_steps_per_episode = MAX_STEPS_PER_EPISODE
        self.epsilon = EPSILON
        self.epsilon_decay_rate = EPSILON_DECAY_RATE
        self.epsilon_min = EPSILON_MIN
        self.epsilon_max = EPSILON_MAX
        self.discount_factor = DISCOUNT_FACTOR
        self.replay_memory_size = REPLAY_MEMORY_SIZE
        self.replay_memory = deque(maxlen=self.replay_memory_size)
        self.batch_size = BATCH_SIZE
        self.train_start = TRAIN_START
        self.action_space = ACTION_SPACE
        self.action_size = len(self.action_space)
        self.state_size = STATE_SIZE
        self.learning_rate = LEARNING_RATE
        self.model = DQN(self.state_size, self.action_size, self.learning_rate)

    def select_action(self, state):
        random_value = np.random.rand()
        if random_value < self.epsilon:
            print('random_value = ', random_value)       
            chosen_action = random.choice(self.action_space) # = EXPLORATION Strategy
            print('Agent randomly chooses the following EXPLORATION action:', chosen_action)       
        else: 
            print('random_value = {} is greater than epsilon'.format(random_value))       
            state = np.float32(state) # Transforming passed state into numpy array
            prediction_by_model = self.model.predict(state) 
            chosen_action = np.argmax(prediction_by_model[0]) # = EXPLOITATION strategy
            print('NN chooses the following EXPLOITATION action:', chosen_action)       
        return chosen_action

if __name__ == "__main__":
    agent_test = Agent(NUM_EPISODES, MAX_STEPS_PER_EPISODE, EPSILON, EPSILON_DECAY_RATE, \
        EPSILON_MIN, EPSILON_MAX, DISCOUNT_FACTOR, REPLAY_MEMORY_SIZE, BATCH_SIZE, \
            TRAIN_START)
    # Test of select_action function:
    state = 10 
    state = np.array(state)
    print(state.shape)
    print(agent_test.select_action(state))

这是运行此代码时遇到的回溯错误:

**ValueError**: Error when checking input: expected dense_209_input to have 2 dimensions, but got array with shape ()

由于我已将DQN类中的NN配置为仅接收1个维度,因此我不确定为什么会发生有关2维的错误.

我已经阅读了关于stackoverflow的类似问题( Keras顺序模型输入形状 Keras模型输入形状错误解决方案

这里有几个问题.首先,您所说的state_size实际上是一个状态空间,即代理可以处于的所有可能状态的集合.状态大小实际上是1,因为您只想传递一个参数作为状态.

在此处定义输入层时:

 self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))
 

您说输入维等于11,但是当您调用预测时,将其传递1个数字(10).

因此,您需要修改input_dim以仅接收一个数字,或者可以定义状态向量(如state = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])),每个数字对应一个可能的状态(从9.5到10.5).因此,当状态为9.5时,您的状态向量为[1, 0, 0, ...0],依此类推.

第二个问题是,在定义状态时,应放在方括号中

state = np.array([10])

否则,数组的形状为(),我相信您已经发现了.

希望有帮助!让我知道您是否需要任何澄清.

My goal is to develop a DQN-agent that will choose its action based on a certain strategy/policy. I previously worked with OpenAi gym-environments, but now I wanted to create my own RL environment.

At this stage, the agent shall either choose a random action or choose his action based on the predictions given by a deep neural network (defined in the class DQN).

So far, I have setup both the neural net model and my environment. The NN shall receive states as its input. These states represent 11 possible scalar values ranging from 9.5 to 10.5 (9.5, 9.6, ..., 10.4, 10.5). Since we're dealing with RL, the agent generates its data during the training process. The output should be either 0 and 1 corresponding to the recommended action.

Now, I would like to feed my agent a scalar value: e.g. a sample state of x = 10 and let him decide upon the action to take (Agent.select_action() is called), I encounter an issue related to the input shape/input dimension.

Here's the code: 1. DQN Class:

class DQN():

    def __init__(self, state_size, action_size, lr):
        self.state_size = state_size
        self.action_size = action_size
        self.lr = lr

        self.model = Sequential()
        self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))
        self.model.add(Dense(128, activation='relu'))
        self.model.add(Dense(self.action_size, activation='linear'))

        self.model.compile(optimizer=Adam(lr=self.lr), loss='mse')

        self.model.summary()


    def model_info(self):
        model_description = '\n\n---Model_INFO Summary: The model was passed {} state sizes,\
            \n {} action sizes and a learning rate of {} -----'\
                            .format(self.state_size, self.action_size, self.lr)
        return model_description

    def predict(self, state):
        return self.model.predict(state)

    def train(self, state, q_values):
        self.state = state
        self.q_values = q_values
        return self.model.fit(state, q_values, verbose=0)

    def load_weights(self, path):
        self.model.load_weights(path)

    def save_weights(self, path):
        self.model.save_weights(path)

2. Agent Class:

NUM_EPISODES = 100
MAX_STEPS_PER_EPISODE = 100
EPSILON = 0.5 
EPSILON_DECAY_RATE = 0.001
EPSILON_MIN = 0.01
EPSILON_MAX = 1
DISCOUNT_FACTOR = 0.99
REPLAY_MEMORY_SIZE = 50000
BATCH_SIZE = 50
TRAIN_START = 100
ACTION_SPACE = [0, 1]
STATE_SIZE = 11 
LEARNING_RATE = 0.01

class Agent():
    def __init__(self, num_episodes, max_steps_per_episode, epsilon, epsilon_decay_rate, \
        epsilon_min, epsilon_max, discount_factor, replay_memory_size, batch_size, train_start):
        self.num_episodes = NUM_EPISODES
        self.max_steps_per_episode = MAX_STEPS_PER_EPISODE
        self.epsilon = EPSILON
        self.epsilon_decay_rate = EPSILON_DECAY_RATE
        self.epsilon_min = EPSILON_MIN
        self.epsilon_max = EPSILON_MAX
        self.discount_factor = DISCOUNT_FACTOR
        self.replay_memory_size = REPLAY_MEMORY_SIZE
        self.replay_memory = deque(maxlen=self.replay_memory_size)
        self.batch_size = BATCH_SIZE
        self.train_start = TRAIN_START
        self.action_space = ACTION_SPACE
        self.action_size = len(self.action_space)
        self.state_size = STATE_SIZE
        self.learning_rate = LEARNING_RATE
        self.model = DQN(self.state_size, self.action_size, self.learning_rate)

    def select_action(self, state):
        random_value = np.random.rand()
        if random_value < self.epsilon:
            print('random_value = ', random_value)       
            chosen_action = random.choice(self.action_space) # = EXPLORATION Strategy
            print('Agent randomly chooses the following EXPLORATION action:', chosen_action)       
        else: 
            print('random_value = {} is greater than epsilon'.format(random_value))       
            state = np.float32(state) # Transforming passed state into numpy array
            prediction_by_model = self.model.predict(state) 
            chosen_action = np.argmax(prediction_by_model[0]) # = EXPLOITATION strategy
            print('NN chooses the following EXPLOITATION action:', chosen_action)       
        return chosen_action

if __name__ == "__main__":
    agent_test = Agent(NUM_EPISODES, MAX_STEPS_PER_EPISODE, EPSILON, EPSILON_DECAY_RATE, \
        EPSILON_MIN, EPSILON_MAX, DISCOUNT_FACTOR, REPLAY_MEMORY_SIZE, BATCH_SIZE, \
            TRAIN_START)
    # Test of select_action function:
    state = 10 
    state = np.array(state)
    print(state.shape)
    print(agent_test.select_action(state))

Here's the traceback error I get when running this code:

**ValueError**: Error when checking input: expected dense_209_input to have 2 dimensions, but got array with shape ()

I am unsure why the error regarding 2 dimensions occurs since I have configured the NN in the DQN class to receive only 1 dimension.

I have already read through similar questions on stackoverflow (Keras Sequential model input shape, Keras model input shape wrong, Keras input explanation: input_shape, units, batch_size, dim, etc). However, I was not yet able to adapt the suggestions to my use case.

Do you have any suggestions or hints? Thank you for your help!

解决方案

There are several problems here. First, what you call state_size is actually a state space, i.e. a collection of all possible states your agent can be in. The state size is actually 1, since there is only one parameter you want to pass as a state.

When you define your input layer here:

self.model.add(Dense(128, input_dim=self.state_size, activation='relu'))

You say that your input dimension will be equal to 11, but then when you call the prediction, you pass it 1 number (10).

So you either need to modify input_dim to receive only one number, or you can define your state vector like state = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), each number corresponding to a possible state (from 9.5 to 10.5). So when the state is 9.5 your state vector is [1, 0, 0, ...0] and so on.

The second problem is that when you define your state you should put square brackets

state = np.array([10])

otherwise the array's shape is (), as I am sure you've found out.

Hope it helps! Let me know if you need any clarification.

这篇关于Keras模型:RL代理的输入形状尺寸错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆