数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误 [英] Dataset API 'flat_map' method producing error for same code which works with 'map' method

查看:120
本文介绍了数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个创建管道,以使用TensorFlow Dataset API和Pandas读取多个CSV文件.但是,使用flat_map方法会产生错误.但是,如果使用map方法,则可以构建代码并在会话中运行它.这是我正在使用的代码.我已经在TensorFlow Github存储库中打开了#17415 问题.但是显然,这不是错误,他们要求我在此处发布.

I am trying to create a create a pipeline to read multiple CSV files using TensorFlow Dataset API and Pandas. However, using the flat_map method is producing errors. However, if I am using map method I am able to build the code and run it in session. This is the code I am using. I already opened #17415 issue in TensorFlow Github repository. But apparently, it is not an error and they asked me to post here.

folder_name = './data/power_data/'
file_names = os.listdir(folder_name)
def _get_data_for_dataset(file_name,rows=100):#
    print(file_name.decode())

    df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
                         usecols =['Wind_MWh','Actual_Load_MWh'],nrows = rows)
    X_data = df_input.as_matrix()
    X_data.astype('float32', copy=False)

    return X_data
dataset = tf.data.Dataset.from_tensor_slices(file_names)
dataset = dataset.flat_map(lambda file_name: tf.py_func(_get_data_for_dataset, 
[file_name], tf.float64))
dataset= dataset.batch(2)
fiter = dataset.make_one_shot_iterator()
get_batch = iter.get_next()

我收到以下错误:map_func must return a Dataset object.当我使用map时,管道可以正常工作,但不会给出我想要的输出.例如,如果Pandas从每个CSV文件中读取N行,我希望管道将B文件中的数据连接起来,并给我一个形状为(N * B,2)的数组.相反,它给了我(B,N,2),其中B是批处理大小. map正在添加另一个轴,而不是在现有轴上串联.据我在文档flat_map中所了解的,应该提供平坦的输出.在文档中,mapflat_map均返回数据集类型.那么我的代码如何与地图一起使用,而不与flat_map一起使用?

I get the following error: map_func must return a Dataset object. The pipeline works without error when I use map but it doesn't give the output I want. For example, if Pandas is reading N rows from each of my CSV files I want the pipeline to concatenate data from B files and give me an array with shape (N*B, 2). Instead, it is giving me (B, N,2) where B is the Batch size. map is adding another axis instead of concatenating on the existing axis. From what I understood in the documentation flat_map is supposed to give a flatted output. In the documentation, both map and flat_map returns type Dataset. So how is my code working with map and not with flat_map?

如果您能将我指向Dataset API与Pandas模块一起使用的代码,那也很好.

It would also great if you could point me towards code where Dataset API has been used with Pandas module.

推荐答案

As mikkola points out in the comments, the Dataset.map() and Dataset.flat_map() expect functions with different signatures: Dataset.map() takes a function that maps a single element of the input dataset to a single new element, whereas Dataset.flat_map() takes a function that maps a single element of the input dataset to a Dataset of elements.

如果要将_get_data_for_dataset()返回的数组的每一行都 成为单独的元素,您应该使用Dataset.flat_map()并使用tf.py_func()的输出转换为Dataset #from_tensor_slices"rel =" noreferrer> Dataset.from_tensor_slices() :

If you want each row of the array returned by _get_data_for_dataset() to become a separate element, you should use Dataset.flat_map() and convert the output of tf.py_func() to a Dataset, using Dataset.from_tensor_slices():

folder_name = './data/power_data/'
file_names = os.listdir(folder_name)

def _get_data_for_dataset(file_name, rows=100):
    df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
                         usecols=['Wind_MWh', 'Actual_Load_MWh'], nrows=rows)
    X_data = df_input.as_matrix()
    return X_data.astype('float32', copy=False)

dataset = tf.data.Dataset.from_tensor_slices(file_names)

# Use `Dataset.from_tensor_slices()` to make a `Dataset` from the output of 
# the `tf.py_func()` op.
dataset = dataset.flat_map(lambda file_name: tf.data.Dataset.from_tensor_slices(
    tf.py_func(_get_data_for_dataset, [file_name], tf.float32)))

dataset = dataset.batch(2)

iter = dataset.make_one_shot_iterator()
get_batch = iter.get_next()

这篇关于数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆