数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误 [英] Dataset API 'flat_map' method producing error for same code which works with 'map' method

查看：120 发布时间：2020/5/24 0:21:29 pandas tensorflow tensorflow-datasets

本文介绍了数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个创建管道，以使用TensorFlow Dataset API和Pandas读取多个CSV文件.但是，使用flat_map方法会产生错误.但是，如果使用map方法，则可以构建代码并在会话中运行它.这是我正在使用的代码.我已经在TensorFlow Github存储库中打开了#17415 问题.但是显然，这不是错误，他们要求我在此处发布.

I am trying to create a create a pipeline to read multiple CSV files using TensorFlow Dataset API and Pandas. However, using the flat_map method is producing errors. However, if I am using map method I am able to build the code and run it in session. This is the code I am using. I already opened #17415 issue in TensorFlow Github repository. But apparently, it is not an error and they asked me to post here.

folder_name = './data/power_data/'
file_names = os.listdir(folder_name)
def _get_data_for_dataset(file_name,rows=100):#
    print(file_name.decode())

    df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
                         usecols =['Wind_MWh','Actual_Load_MWh'],nrows = rows)
    X_data = df_input.as_matrix()
    X_data.astype('float32', copy=False)

    return X_data
dataset = tf.data.Dataset.from_tensor_slices(file_names)
dataset = dataset.flat_map(lambda file_name: tf.py_func(_get_data_for_dataset, 
[file_name], tf.float64))
dataset= dataset.batch(2)
fiter = dataset.make_one_shot_iterator()
get_batch = iter.get_next()

我收到以下错误:map_func must return a Dataset object.当我使用map时，管道可以正常工作，但不会给出我想要的输出.例如，如果Pandas从每个CSV文件中读取N行，我希望管道将B文件中的数据连接起来，并给我一个形状为(N * B，2)的数组.相反，它给了我(B，N，2)，其中B是批处理大小. map正在添加另一个轴，而不是在现有轴上串联.据我在文档flat_map中所了解的，应该提供平坦的输出.在文档中，map和flat_map均返回数据集类型.那么我的代码如何与地图一起使用，而不与flat_map一起使用?

I get the following error: map_func must return a Dataset object. The pipeline works without error when I use map but it doesn't give the output I want. For example, if Pandas is reading N rows from each of my CSV files I want the pipeline to concatenate data from B files and give me an array with shape (N*B, 2). Instead, it is giving me (B, N,2) where B is the Batch size. map is adding another axis instead of concatenating on the existing axis. From what I understood in the documentation flat_map is supposed to give a flatted output. In the documentation, both map and flat_map returns type Dataset. So how is my code working with map and not with flat_map?

如果您能将我指向Dataset API与Pandas模块一起使用的代码，那也很好.

It would also great if you could point me towards code where Dataset API has been used with Pandas module.

推荐答案

为

As mikkola points out in the comments, the Dataset.map() and Dataset.flat_map() expect functions with different signatures: Dataset.map() takes a function that maps a single element of the input dataset to a single new element, whereas Dataset.flat_map() takes a function that maps a single element of the input dataset to a Dataset of elements.

如果要将_get_data_for_dataset()返回的数组的每一行都成为单独的元素，您应该使用Dataset.flat_map()并使用tf.py_func()的输出转换为Dataset #from_tensor_slices"rel =" noreferrer> Dataset.from_tensor_slices() :

If you want each row of the array returned by _get_data_for_dataset() to become a separate element, you should use Dataset.flat_map() and convert the output of tf.py_func() to a Dataset, using Dataset.from_tensor_slices():

folder_name = './data/power_data/'
file_names = os.listdir(folder_name)

def _get_data_for_dataset(file_name, rows=100):
    df_input=pd.read_csv(os.path.join(folder_name, file_name.decode()),
                         usecols=['Wind_MWh', 'Actual_Load_MWh'], nrows=rows)
    X_data = df_input.as_matrix()
    return X_data.astype('float32', copy=False)

dataset = tf.data.Dataset.from_tensor_slices(file_names)

# Use `Dataset.from_tensor_slices()` to make a `Dataset` from the output of 
# the `tf.py_func()` op.
dataset = dataset.flat_map(lambda file_name: tf.data.Dataset.from_tensor_slices(
    tf.py_func(_get_data_for_dataset, [file_name], tf.float32)))

dataset = dataset.batch(2)

iter = dataset.make_one_shot_iterator()
get_batch = iter.get_next()

这篇关于数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误 [英] Dataset API 'flat_map' method producing error for same code which works with 'map' method

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误 [英] Dataset API &#39;flat_map&#39; method producing error for same code which works with &#39;map&#39; method

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

数据集API"flat_map"方法针对与"map"方法一起使用的相同代码产生错误 [英] Dataset API 'flat_map' method producing error for same code which works with 'map' method

登录关闭