来自具有多个列名称的数据帧的流输入y_col会产生TYPE ERROR [英] Flow from dataframe with multiple column names fed into y_col generates TYPE ERROR

查看:94
本文介绍了来自具有多个列名称的数据帧的流输入y_col会产生TYPE ERROR的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用来自数据帧的流来解决具有14种可能标签的多标签分类问题,例如,所有列名都以字符串格式放置在列表中:

I am using flow from data frame for a multi-label classification problem with 14 possible labels, all column names are placed in a list in string format for example:

columns = ["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity", "Lung      Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis", "Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture", "Support Devices"]

然后将列表名称(列)输入到y_col中,例如:

The list name (columns) is then fed into y_col for example:

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col=columns,
batch_size=batch_size,
seed=42,
shuffle=True,
target_size=(224, 224))

我收到此错误:

TypeError: If class_mode="categorical", y_col="['No Finding', 'Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity', 'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax', 'Pleural Effusion', 'Pleural Other', 'Fracture', 'Support Devices']" column values must be type string, list or tuple.

我已经尝试过先前提出的解决方案:

I have already tried the solution previously proposed:

df['No Finding'] = df['No Finding'].astype(str)
df['Enlarged Cardiomediastinum'] = df['Enlarged Cardiomediastinum'].astype(str)
df['Cardiomegaly'] = df['Cardiomegaly'].astype(str)
df['Lung Opacity'] = df['Lung Opacity'].astype(str)
df['Lung Lesion'] = df['Lung Lesion'].astype(str)
df['Edema'] = df['Edema'].astype(str)
df['Consolidation'] = df['Consolidation'].astype(str)
df['Pneumonia'] = df['Pneumonia'].astype(str)
df['Atelectasis'] = df['Atelectasis'].astype(str)
df['Pneumothorax'] = df['Pneumothorax'].astype(str)
df['Pleural Effusion'] = df['Pleural Effusion'].astype(str)
df['Pleural Other'] = df['Pleural Other'].astype(str)
df['Fracture'] = df['Fracture'].astype(str)
df['Support Devices'] = df['Support Devices'].astype(str)

仅当我向y_col提供单个列名称时,该方法才有效.我正在使用keras 2.2.4,并且已经卸载了keras.preprocessing并安装了github版本.似乎来自目录函数的流不支持使用默认类模式作为分类将多个列名以列表格式馈送到y_col,因为这是一个多标签分类问题.我怀疑类型问题源于仅转换为对象的pandas数据框值,而keras预处理数据框迭代器代码仅允许字符串,列表或元组,而pandas不能直接转换为仅对象的字符串.下面是我的代码:

It only works when I'm feeding a single column name to y_col. I'm using keras 2.2.4 and I have already uninstalled keras.preprocessing and installed the github version. It seems that the flow from directory function does not support multiple column names being fed to y_col in list format using the default class mode as categorical since this is a muti-label classification problem. I suspect that the type issue stems from pandas dataframes values only being converted to objects and the keras preprocessing dataframe iterator code only allows string, list or tuple but pandas does not directly convert to string only to object. Below is my code:

df=pd.read_csv('/home/admin1/Downloads/CheXpert-v1.0/train.csv')

df = df.replace(np.nan, 0)
df['No Finding'].head()

df['No Finding'] = df['No Finding'].astype(str)
df['Enlarged Cardiomediastinum'] = df['Enlarged Cardiomediastinum'].astype(str)
df['Cardiomegaly'] = df['Cardiomegaly'].astype(str)
df['Lung Opacity'] = df['Lung Opacity'].astype(str)
df['Lung Lesion'] = df['Lung Lesion'].astype(str)
df['Edema'] = df['Edema'].astype(str)
df['Consolidation'] = df['Consolidation'].astype(str)
df['Pneumonia'] = df['Pneumonia'].astype(str)
df['Atelectasis'] = df['Atelectasis'].astype(str)
df['Pneumothorax'] = df['Pneumothorax'].astype(str)
df['Pleural Effusion'] = df['Pleural Effusion'].astype(str)
df['Pleural Other'] = df['Pleural Other'].astype(str)
df['Fracture'] = df['Fracture'].astype(str)
df['Support Devices'] = df['Support Devices'].astype(str)
df['Age'] = df['Age'].astype(str)

df.dtypes

columns=["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity",
"Lung Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis",
"Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture",
"Support Devices"]

datagen=ImageDataGenerator(rescale=1./255.)
test_datagen=ImageDataGenerator(rescale=1./255.)

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col=columns,
batch_size=batch_size,
seed=42,
shuffle=True,
target_size=(224, 224))

推荐答案

我遇到了同样的问题,并且能够通过将class_mode参数更改为'other'来解决此问题.我遇到了本教程在tensorflow文档中的几个链接后,找到flow_from_dataframe().

I was having this same issue and was able to solve it by changing the class_mode parameter to 'other'. I ran across this tutorial after following a few links in the tensorflow documentation for flow_from_dataframe().

因此,基于上面的内容,您只需要直接将class_mode设置为"other"即可,并且应该可以使用.

So based on what you have above, you only need to directly set your class_mode to 'other' and it should work.

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col=columns,
batch_size=batch_size,
class_mode='raw'
seed=42,
shuffle=True,
target_size=(224, 224))

我应该说,在tensorflow或keras文档中我都没有提到class_mode'other'.但是,它似乎确实可以正常工作,因此我现在就运行它.

I should say though, that I have have seen no mention of the class_mode 'other' in either the tensorflow or keras documentation. However, it does seem to work so I am running with it for now.

从那以后,我意识到在当前版本的keras中,"other"已被贬值.我已经更新了上面的代码,以反映应该为原始"的新的正确class_mode.

I have since realized that 'other' is depreciated in current versions of keras. I have updated the code above to reflect the new correct class_mode which should be 'raw'.

这篇关于来自具有多个列名称的数据帧的流输入y_col会产生TYPE ERROR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆