将数字和分类数据混合到具有密集层的keras顺序模型中 [英] Mixing numerical and categorical data into keras sequential model with Dense layers

查看：60 发布时间：2020/4/25 10:33:30 python keras neural-network keras-layer one-hot-encoding

本文介绍了将数字和分类数据混合到具有密集层的keras顺序模型中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Pandas数据帧中有一个训练集，然后将此数据帧与df.values一起传递到model.fit()中.以下是有关df的一些信息:

I have a training set in a Pandas dataframe, and I pass this data frame into model.fit() with df.values. Here is some information about the df:

df.values.shape
# (981, 5)

df.values[0]
# array([163, 0.6, 83, 0.52,
#       array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
#       0, 0, 0, 0, 0, 0, 0])], dtype=object)

如您所见，df中的行包含5列，其中4列包含数值(int或float)，其中一列包含表示某些分类数据的热编码数组.我正在创建我的keras模型，如下所示:

As you can see, rows in the df contain 5 columns, 4 of which contain numerical values (either int or float), and one which contains a hot encoded array representing some categorical data. I am creating my keras model as seen below:

model = keras.Sequential([
    keras.layers.Dense(1024, activation=tf.nn.relu, kernel_initializer=init_orth, bias_initializer=init_0),
    keras.layers.Dense(512, activation=tf.nn.relu, kernel_initializer=init_orth, bias_initializer=init_0),
    keras.layers.Dense(256, activation=tf.nn.relu, kernel_initializer=init_orth, bias_initializer=init_0),
    keras.layers.Dense(128, activation=tf.nn.relu, kernel_initializer=init_orth, bias_initializer=init_0),
    keras.layers.Dense(64, activation=tf.nn.relu, kernel_initializer=init_orth, bias_initializer=init_0),
    keras.layers.Dense(1, activation=tf.nn.sigmoid)
])

opt = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=True)

model.compile(optimizer=opt, 
      loss='binary_crossentropy',
      metrics=['accuracy'])

model.fit(df.values, df_labels.values, epochs=10, batch_size=32, verbose=0)

df_labels.values只是一个0和1的一维数组.因此，我相信我确实需要在最后增加一个Dense(1)乙状结肠层以及'binary_crossentropy'损失.

df_labels.values is just a 1D array of 0s and 1s. So I believe I do need a Dense(1) sigmoid layer at the end, as well as 'binary_crossentropy' loss.

如果我仅传递数值数据，此模型将非常出色.但是，一旦我引入了热编码(分类数据)，就会出现此错误:

This model works excellent if I only pass numerical data. But as soon as I introduce hot encodings (categorical data), I get this error:

ValueError                                Traceback (most recent call last)
<ipython-input-91-b5e6232b375f> in <module>
     42     #trn_values = df_training_set.values[:,:,len(df_training_set.columns)]
     43     #trn_cat = df_trn_wtid.values.reshape(-1, 1)
---> 44     model.fit(df_training_set.values, df_training_labels.values, epochs=10, batch_size=32, verbose=0)
     45 
     46     #test_loss, test_acc = model.evaluate(df_test_set.values, df_test_labels.values)

~\Anaconda3\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1037                                         initial_epoch=initial_epoch,
   1038                                         steps_per_epoch=steps_per_epoch,
-> 1039                                         validation_steps=validation_steps)
   1040 
   1041     def evaluate(self, x=None, y=None,

~\Anaconda3\lib\site-packages\keras\engine\training_arrays.py in fit_loop(model, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
    197                     ins_batch[i] = ins_batch[i].toarray()
    198 
--> 199                 outs = f(ins_batch)
    200                 outs = to_list(outs)
    201                 for l, o in zip(out_labels, outs):

~\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~\Anaconda3\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs)
   2653                 array_vals.append(
   2654                     np.asarray(value,
-> 2655                                dtype=tf.as_dtype(tensor.dtype).as_numpy_dtype))
   2656         if self.feed_dict:
   2657             for key in sorted(self.feed_dict.keys()):

~\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    536 
    537     """
--> 538     return array(a, dtype, copy=False, order=order)
    539 
    540 

ValueError: setting an array element with a sequence.

请不要建议将one_hot数组中的每个值扩展到自己的列中.本示例是我的数据集的精简版本，其中包含6-8个分类列，其中一些one_hots是5000+大小的数组.因此，这对我来说不是可行的解决方案.我正在寻找可能改进我的顺序模型(或彻底检查keras模型)，以便与数字数据一起处理分类数据.

Please do not suggest expanding out each value in the one_hot arrays into their own columns. This example is a trimmed down version of my dataset, which contains 6-8 categorical columns, some of the one_hots are arrays of 5000+ size. So this is not a feasible solution for me. I'm looking to perhaps refine my Sequential model (or overhaul the keras model completely) in order to process categorical data along with numerical data.

请记住，训练标签是0/1值的一维数组.我既需要数字/分类训练集来预测一组结果，也不能从数字数据中获得一组预测，也不能从分类数据中获得一组预测.

Remember, the training labels are 1D array of 0/1 values. I need both numerical/categorical training sets predicting one set of outcomes, I can't have one set of predictions from the numerical data and one set of predictions from the categorical data.

将数字和分类数据混合到具有密集层的keras顺序模型中 [英] Mixing numerical and categorical data into keras sequential model with Dense layers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将数字和分类数据混合到具有密集层的keras顺序模型中 [英] Mixing numerical and categorical data into keras sequential model with Dense layers

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭