U-SQL根据文件中的不同值将CSV文件拆分为多个文件 [英] U-SQL Split a CSV file to multiple files based on Distinct values in file

查看：126 发布时间：2020/5/22 20:16:49 azure output azure-data-lake u-sql data-partitioning

本文介绍了U-SQL根据文件中的不同值将CSV文件拆分为多个文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Azure Data Lake Store中存储了数据，并且正在使用带有U-SQL的Azure Data Analytic Job处理那里存在的数据.我有几个包含空间数据的CSV文件，类似于此:

I have the Data in Azure Data Lake Store and I am processing the data present there with Azure Data Analytic Job with U-SQL. I have several CSV files which contain spatial data, similar to this:

File_20170301.csv

    longtitude| lattitude | date         | hour | value1
    ----------+-----------+--------------+------+-------
    45.121    | 21.123    | 2017-03-01   | 01   | 20
    45.121    | 21.123    | 2017-03-01   | 02   | 10
    45.121    | 21.123    | 2017-03-01   | 03   | 50
    48.121    | 35.123    | 2017-03-01   | 01   | 60
    48.121    | 35.123    | 2017-03-01   | 02   | 15
    48.121    | 35.123    | 2017-03-01   | 03   | 80

File_20170302.csv

    longtitude| lattitude | date         | hour | value1
    ----------+-----------+--------------+------+-------
    45.121    | 21.123    | 2017-03-02   | 01   | 20
    45.121    | 21.123    | 2017-03-02   | 02   | 10
    45.121    | 21.123    | 2017-03-02   | 03   | 50
    48.121    | 35.123    | 2017-03-02   | 01   | 60
    48.121    | 35.123    | 2017-03-02   | 02   | 15
    48.121    | 35.123    | 2017-03-02   | 03   | 80

每个文件都包含不同日期和所有经纬度组合的数据.

Each file contains data for a different day and for all longtitude-lattitude combinations.

我想合并我拥有的所有文件并拆分数据，这样对于每种经纬度组合我都可以得到一个文件.

I would like to merge all files I have and split the data so I can end up with one file for each longtitude-lattitude combination.

因此，在遍历文件夹中的所有文件并附加所有数据后，我将得到以下结果:

So, after looping through all files in my folder and appending all data for all days, i would end up with the following:

File_45_21.csv

    longtitude| lattitude | date         | hour | value1
    ----------+-----------+--------------+------+-------
    45.121    | 21.123    | 2017-03-01   | 01   | 20
    45.121    | 21.123    | 2017-03-01   | 02   | 10
    45.121    | 21.123    | 2017-03-01   | 03   | 50
    45.121    | 21.123    | 2017-03-02   | 01   | 20
    45.121    | 21.123    | 2017-03-02   | 02   | 10
    45.121    | 21.123    | 2017-03-02   | 03   | 50

File_48_35.csv

    longtitude| lattitude | date         | hour | value1
    ----------+-----------+--------------+------+-------
    48.121    | 35.123    | 2017-03-01   | 01   | 60
    48.121    | 35.123    | 2017-03-01   | 02   | 15
    48.121    | 35.123    | 2017-03-01   | 03   | 80
    48.121    | 35.123    | 2017-03-02   | 01   | 60
    48.121    | 35.123    | 2017-03-02   | 02   | 15
    48.121    | 35.123    | 2017-03-02   | 03   | 80

理论上应该发生以下情况:

In theory the following should happen:

在数据中查找经度和纬度组合的不同值
采用上述不同值数组，为每个组合创建一个文件，并基于两个参数(经度和纬度)从源文件中提取数据

我正在努力的是如何基于源中的两个参数开始循环和提取数据，以及如何通过参数组合的不同值来划分"数据源.

What I am struggling with is how to get started on the looping and the extraction of data based on two parameters in the source and how to "partition" the data source by the distinct values of the combination of parameters.

U-SQL根据文件中的不同值将CSV文件拆分为多个文件 [英] U-SQL Split a CSV file to multiple files based on Distinct values in file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

U-SQL根据文件中的不同值将CSV文件拆分为多个文件 [英] U-SQL Split a CSV file to multiple files based on Distinct values in file

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭