读取 csv 的前 n 行而不解析 Power Query 中的整个文件 [英] Reading the first n rows of a csv without parsing the whole file in Power Query

查看:57
本文介绍了读取 csv 的前 n 行而不解析 Power Query 中的整个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这些 csv 数据文件,前五行是我的相关数据,下面是一堆格式错误的数据.当我使用过滤器行时,它仍然会读入下面的所有数据,这对我来说是个问题.这是有问题的,因为我正在阅读整个文件夹的价值,并且其中一些文件在我感兴趣的行下方具有不同数量的列.我不需要的前五行下方的那些列或数据,但电源查询抛出查找与最后一个文件相同数量的列时出错.我希望它只读取文件夹中所有文件之间统一的前 n 行.

有没有办法做到这一点或绕过错误?如果我可以提供任何其他信息来帮助我更好地理解我的问题,请告诉我.

我已经尝试过过滤行,但仍会读取整个文档并引发一些错误.

这与我正在寻找的类似,但尚不清楚如何编辑它以实现我想要的.

在阅​​读"之前跳过 6 行进入powerquery

最终我将阅读文件夹中所有文档的前 5 行.这在 Pandas 中要容易得多,但我需要一个同事的 Excel 解决方案.我得到的错误是意外的列数".我已经通过为我不需要的较低额外数据执行具有相同列数的文件子集来确认这一点.我想要一个足够强大的解决方案来处理所有文件.

解决方案

当您将 CSV 加载到查询编辑器中时,它可能会生成如下 M 代码:

letSource = Csv.Document(File.Contents("C:\FilePath\FileName.csv"), [Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),#"更改类型" = Table.TransformColumnTypes(Source, {{"Column1", type text}, {"Column2", type text}, {"Column3", type text}})在#更改类型"

删除最后一步#Changed Type,并将第一步中的Columns=3改为您实际想要的数字,而不是它自动检测到的数字.>

I have these csv data files with my relevant data in the first five rows and a bunch of mal formmatted data below it. When I use filter rows it still reads in all of the data below which causes problems for me. This is problematic because I am reading in a whole folder's worth and some of these files have a different number of columns below the rows I'm interested in. Those columns or data below those first five rows I do not need, but power query throws errors when it is looking for the same number of columns as the last file. I would like it to just read in the first n rows which are uniform between all files in the folder. 

Is there a way to do this or bypass the error? Let me know if there is anything else I can provide to help my question be better understood. 

I have already tried filter rows, but that still reads the whole document and throws some errors.

This is similar to what I'm looking for, but it's not clear how I can edit this to achieve what I want.

Skip 6 rows before "reading" into powerquery

Ultimately I'm going to be reading in the first 5 rows of all documents in the file folder. This is much easier in Pandas, but I need an Excel solution for a coworker. The error I get is "unexpected number of columns." I have confirmed this with doing a subset of files that has the same number of columns for the lower extra data I don't need. I would like a solution robust enough to handle all of the files.

解决方案

When you load a CSV into the Query Editor, it will likely generate M code like this:

let
    Source = Csv.Document(File.Contents("C:\FilePath\FileName.csv"), [Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source, {{"Column1", type text}, {"Column2", type text}, {"Column3", type text}})
in
    #"Changed Type"

Delete the last step, #Changed Type, and change Columns=3 in the first step to the number you actually want instead of what it automatically detected.

这篇关于读取 csv 的前 n 行而不解析 Power Query 中的整个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆