Excel Power Query:从具有多个未固定工作表的多个未固定文件中获取数据 [英] Excel Power Query: get data from multiple unfixed files with multiple unfixed sheets

查看:58
本文介绍了Excel Power Query:从具有多个未固定工作表的多个未固定文件中获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  1. 文件夹中有不固定数量的 excel 文件,如图 1 所示.
    (路径可能会改变,从任何单元格中寻找解决方案作为动态路径)
  2. 每个文件的页数不固定(最多 10 页).
  3. 每张表都有大约 10 到 40 行作为交易数据.
  4. 交易数据后有两个汇总-数量和金额(不固定的起始行)3a,3b,3c
    我正在寻找最终输出作为图像 4a、4b.使用电源查询.

Feb-19.xlsx 包含两个标签:

Jan-19.xlsx 包含三个标签:

我打开一个新的 Excel 文件,然后单击数据 > 新建查询 > 从文件 > 从文件夹,然后输入或使用浏览按钮导航到包含文件的文件夹的位置.(当我导航到我的 OneDrive 文件夹时,我的路径中有 SkyDrive.old,但它是您在上面第一张图片中看到的我的 OneDrive 文件夹.)然后我单击确定":

然后我点击转换数据:

出现这个:

我单击主页">管理参数"(带有下拉箭头的单词)>新参数",然后像这样设置并单击确定".

点击确定后出现:

您可以看到我输入了包含文件的文件夹的路径.如果我想使用不同的文件夹路径,我可以稍后更改此参数值.

为此,我会点击 在左窗格中.单击它会将我带到同一个地方,在那里我可以编辑值.

现在,我点击我已经开始的查询.它是当前左侧窗格中唯一的其他项目.点击它会在屏幕上重新显示:

我编辑公式栏中的文本,将 "C:\Users\MARC_000\SkyDrive.old\Test" 替换为 FolderPath.结果是完全相同的表格,但公式栏有 Folder.Files(FolderPath).现在,查询使用参数值,而不是使用硬编码的引用.

然后,只是因为我想要,我将查询的名称更改为主查询".您可以通过单击左侧窗格中的查询,然后更改右侧窗格顶部的属性中的名称来实现此目的.

接下来,我选择 ContentName 列,然后选择 Home > Remove Columns(单词,带有下拉箭头)> Remove Other Columns to得到这个:

然后我点击 按钮以组合 Content 列中的文件,这会显示此弹出窗口.然后我只点击文件夹,然后确定.

现在左窗格中有更多查询条目:

我点击新查询,从测试转换示例文件,然后看到:

我选择 DataItem 列,然后选择 Home > Remove Columns(单词,带有下拉箭头)> Remove Other Columns 得到这个:

---查看答案底部的编辑,它替换了以下内容---

然后我点击 按钮以展开 Data 列中的表格,这会显示此弹出窗口.然后我清除使用原始列名作为前缀"旁边的复选框,然后单击确定".

---从答案底部的编辑返回继续---

产生这个:

然后我从 Column1 列中过滤掉空值.(单击列顶部的向下箭头并取消选择 null.)

然后我点击Add Column > Conditional Column,像这样设置,然后点击OK:

产生这个:

然后我选择新的 Custom 列并点击 Transform > Fill > Down,得到这个:

然后我从 Column1 列中过滤掉Summary (Amount)"和Summary (Units)"条目.(点击列顶部的向下箭头并取消选择汇总(金额)"和汇总(单位)".)结果如下:

现在我回到主查询.换句话说,单击左窗格中的 Main Query.会有问题".我需要做的就是删除右窗格中的最后一个应用步骤:更改类型.一旦我删除它,一切都很好,我看到了这个:

但我也想要文件名,所以我点击当前选择的应用步骤之前的应用步骤,选择了来自测试的扩展转换文件",所以我点击删除其他列1",然后在公式栏,我将代码从 Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test"}) 更改为 Table.SelectColumns(#"Filtered Hidden Files1",{从测试转换文件",名称"}).这添加了 Name 列,我看到了:

然后我回到最后一个应用步骤,即来自测试的扩展转换文件",现在我看到了:

然后我点击转换 > 使用第一行作为标题并得到这个:

然后我将 DE 列重命名为 Sheet,将 Feb-19.xlsx 列重命名为 File Name.

然后我从 Custname 列中过滤掉Custname"条目.(点击列顶部的向下箭头并取消选择客户名称".)结果如下:

然后我对列重新排序以得到这个:

然后我选择 Summary Type 列并单击 Transform > Group By,然后像这样填写弹出框并单击 OK:

产生这个(这是你的两个表):

然后我右键单击左窗格中的 Main Query,然后选择 Reference.这给了我一个名为 Main Query (2) 的新查询,其中的表看起来就像上面的最后一张图片.现在我点击汇总(单位)行中的表格并得到这个:

然后我对摘要(金额)重复该过程:我右键单击左窗格中的主查询,选择引用,然后单击新查询的摘要(金额)行中的表以获取:

最后,我将两个最新的查询重命名为Summary (Units)"和Summary (Amount)"

当您关闭并加载时,这将为您提供三个新工作表.每个查询一个.如果您不需要主查询的工作表(如果您只需要汇总(单位)和汇总(金额)),则在您关闭并加载并返回 Excel 后,单击数据> 显示查询.然后右键单击右窗格中的主查询并单击加载到,然后选择仅创建连接"并单击加载.当您收到数据丢失警告时,点击继续.

最后一件事:不要不要将包含此查询的 Excel 工作簿和从中获取信息的文件放在其源文件夹中.把它分开.

---编辑以容纳具有交易信息的顶部行---

我正在添加以下内容来处理可能在汇总表上方包含信息行的工作表.这是我想出的:

在上面的答案中,在我说的步骤之后立即开始:我选择 DataItem 列,然后选择 Home > Remove Columns(单词,with下拉箭头)> 删除其他列以获得此:

我现在添加另一列(添加列 > 自定义列),我将其设置如下:

这会复制 Data 列,但会在每个嵌套表中添加一个索引,如下所示:

然后我添加另一列来确定与每个嵌套表中每个摘要的开头相关联的索引号:

(您可能想搜索Summary ("或Summary (Units)"而不是Summary")

请注意,它的构造与上一列类似,因为它基本上是 Indexed 列的副本,只是在每个嵌套中添加了 Summary Index 列表.

然后我像这样添加另一列,以确定每个嵌套表的第一个汇总表第一行的索引位置:

得到这个:

然后我再添加一个这样的列,以删除每个嵌套表中我不想要的顶部行:

这给了我这个:

(此图中选择的表格是顶部有额外信息的表格.该信息现已消失.)

然后我选择 TopRowsRemovedItem 列,然后 Home > Remove Columns(单词,带有下拉箭头)> Remove Other Columns 得到这个:

然后,我点击 按钮以展开 TopRowsRemoved 列(而不是 Data 列,这是我们之前做过的),它会弹出这个看起来与我们使用 Data 列时完全相同的弹出窗口.然后我清除使用原始列名作为前缀"旁边的复选框,然后单击确定".

然后我在右侧窗格中的 APPLIED STEPS 下删除旧的 Expanded Data 步骤.如果我不删除 Expanded Data 步骤,则会收到错误消息,因为它正在查找不存在的 Data 列.这次我没有使用 Data 列.相反,我使用了 TopRowsRemoved 列.

在这一点上,我之前的其余答案仍然适用,因此请返回我在上面写的 ---RETURN FROM EDIT AT BOTTOM OF ANSWER TO CONTINUE--- 以上.

这是我的主查询"查询的 M 代码:

let源 = Folder.Files(FolderPath),#"删除其他列" = Table.SelectColumns(Source,{"Content", "Name"}),#"Invoke Custom Function1" = Table.AddColumn(#"Removed Other Columns", "Transform File from Test", each #"Transform File from Test"([Content])),#"Filtered Hidden Files1" = Table.SelectRows(#"Invoke Custom Function1", each [Attributes]?[Hidden]? <> true),#"Removed Other Columns1" = Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test", "Name"}),#"从测试扩展转换文件" = Table.ExpandTableColumn(#"删除其他列1", "从测试转换文件", {"Column1", "Column2", "Column3", "Column4", "Column5", "Item", "Custom"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Item", "Custom"}),#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Transform File from Test", [PromoteAllScalars=true]),#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"CustName", type text}, {"Opening", type any}, {"In", type any}, {"Out", typeany}, {"Closing", type any}, {"DE", type text}, {"Summary (Units)", type text}, {"Feb-19.xlsx", type text}}),#"重命名的列" = Table.RenameColumns(#"更改类型",{{"DE", "Sheet"}, {"Feb-19.xlsx", "文件名"}, {"Summary (Units)","摘要类型"}}),#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([CustName] <> "CustName")),#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows",{"File Name", "Sheet", "CustName", "Opening", "In", "Out", "Closing", "Summary Type"}),#"Grouped Rows" = Table.Group(#"Reordered Columns", {"Summary Type"}, {{"AllData", each _, type table}})在#"分组行"

这是我的从测试转换示例文件"查询的 M 代码,更改以适应具有事务信息的顶部行:

letSource = Excel.Workbook(#"Sample File Parameter1", null, true),#"删除其他列" = Table.SelectColumns(Source,{"Data","Item"}),#"Added Index" = Table.AddColumn(#"Removed Other Columns", "Indexed", each Table.AddIndexColumn([Data],"Index", 0, 1)),#"Added Custom1" = Table.AddColumn(#"Added Index", "SummaryIndexed", each Table.AddColumn([Indexed],"Summary Index", each try if Text.StartsWith([Column1],"Summary") then[索引] 否则为空 否则为空)),#"Added Custom2" = Table.AddColumn(#"Added Custom1", "IndexMins", each List.Min([SummaryIndexed][Summary Index])),#"Added Custom3" = Table.AddColumn(#"Added Custom2", "TopRowsRemoved", each Table.RemoveFirstN([SummaryIndexed],[IndexMins])),#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"TopRowsRemoved", "Item"}),#"Expanded TopRowsRemoved" = Table.ExpandTableColumn(#"Removed Other Columns1", "TopRowsRemoved", {"Column1", "Column2", "Column3", "Column4", "Column5", "Index", "Summary Index"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Index", "Summary Index"}),#"Filtered Rows" = Table.SelectRows(#"Expanded TopRowsRemoved", each ([Column1] <> null)),#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each if Text.StartsWith([Column1],"Summary") then [Column1] else null),#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),#"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([Column1] <> "Summary (Amount)" and [Column1] <> "Summary (Units)"))在#"过滤的行1"

  1. There are unfixed number of excel files in folder as per image1.
    ( Path may change, looking for solution as dynamic path from any cell)
  2. There are unfixed number of sheets ( Max 10 sheets) in each file.
  3. Each sheet have approx 10 to 40 top rows as transaction data.
  4. After transaction data there are two summaries- Qty and Amount ( unfixed starting row) 3a,3b,3c
    I am looking for final output as image 4a, 4b. using power query.

Folder path of excel file; it may change.

Sample data sheets on file

Final output needed ( 2 separate sheets with two tables)

All pix in one

解决方案

Sorry for the length of this response, but there are a lot of steps involved and I included quite a bit of screen clips as well. I believe this solution does what you are looking for.

I start with files in a folder:

Feb-19.xlsx contains two tabs:

Jan-19.xlsx contains three tabs:

I open a new Excel file, then click Data > New Query > From File > From Folder and either type in, or use the Browse button to navigate to, the location of the folder that has the files. (When I navigate to my OneDrive folder, my path has SkyDrive.old in it, but it is my OneDrive folder that you saw in the first image above.) Then I click OK:

Then I click Transform Data:

This appears:

I click Home > Manage Parameters (the words, with the drop-down arrow) > New Parameter, and I set it up like this and click OK.

After clicking OK, this appears:

You can see that I entered the path for the folder that has the files. I can change this parameter value later if I want to use a different folder path.

To do that, I would click on in the left pane. Clicking it would bring me to this same place, where I can edit the value.

Now, I click on the query that I had already started. It is currently the only other item in the left pane. Clicking it brings this back up on the screen:

I edit the text in the formula bar, replacing "C:\Users\MARC_000\SkyDrive.old\Test" with FolderPath. The result is the exact same table, but the formula bar has Folder.Files(FolderPath). Now, instead of using a hard-coded reference, the query is using the parameter value.

Then, just because I want to, I change the query's name to "Main Query." You can do that by clicking on the query in the left pane, then changing the name in the PROPERTIES at the top of the right pane.

Next, I select both the Content and Name columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

Then I click the button to combine the files in the Content column, which brings up this pop-up. Then I click on just the folder only, and then OK.

Now there are more query entries in the left pane:

I click on the new query, Transform Sample File from Test, and see this:

I select the Data and Item columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

---SEE EDIT AT BOTTOM OF ANSWER, WHICH REPLACES THE FOLLOWING---

Then I click the button to expand the tables in the Data column, which brings up this pop-up. Then I clear the checkbox beside "Use original column name as prefix" and click OK.

---RETURN FROM EDIT AT BOTTOM OF ANSWER TO CONTINUE---

Which yields this:

Then I filter out null values from the Column1 column. (Click the down arrow at the top of the column and deselect null.)

Then I click Add Column > Conditional Column, and set it up like this, and click OK:

Which yields this:

Then I select the new Custom column and click Transform > Fill > Down, to get this:

Then I filter out "Summary (Amount)" and "Summary (Units)" entries from the Column1 column. (Click the down arrow at the top of the column and deselect 'Summary (Amount)' and 'Summary (Units)'.) Which yields this:

Now I go back to the Main Query. In other words, click on Main Query in the left pane. There will be a "problem." All I need to do is delete the last APPLIED STEP in the right pane: Changed Type. Once I delete that, all is good and I see this:

But I also want file names, so I click on the APPLIED STEP that is before the one that is currently selected, "Expanded Transform File from Test" is selected, so I click "Removed Other Columns 1", and in the formula bar, I change the code from Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test"}) to Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test", "Name"}). This adds the Name column and I see this:

Then I go back to the last APPLIED STEP, which is "Expanded Transform File from Test" and now I see this:

Then I click Transform > Use First Row as Headers and get this:

Then I rename the DE column to Sheet and the Feb-19.xlsx column to File Name.

Then I filter out "Custname" entries from the Custname column. (Click the down arrow at the top of the column and deselect 'Custname'.) Which yields this:

Then I reordered the columns to get this:

Then I select the Summary Type column and click Transform > Group By, and fill out the pop-up box like this and click OK:

Which yields this (these are your two tables):

So then I right-click on the Main Query in the left pane, and select Reference. That gives me a new query named Main Query (2), with a table that looks just like the last image above. Now I click on the table in Summary (Units) row and get this:

Then I repeat the process for the Summary (Amount): I right-click the Main Query in the left pane, select Reference, and then click on the table in the new query's Summary (Amount) Row to get this:

Lastly, I rename the two newest queries "Summary (Units)" and "Summary (Amount)"

When you close and load, this will give you three new worksheets. One for each query. If you don't want a worksheet for the Main Query (If you only want the Summary (Units) and Summary (Amount)) then, after you close and load and are back in Excel, click Data > Show Queries. Then right-click the Main Query in the right pane and click Load To, then select "Only Create Connection" and click Load. Click Continue when you get the data loss warning.

One more last thing: Do not put the Excel Workbook that has this query in it in its source folder, with the files it is getting the information from. Keep it separate.

---Edits to accommodate top rows having transactional information---

I'm adding the following to deal with sheets that might have rows of information above the Summary tables. Here's what I came up with:

In the answer above, beginning immediately after the step where I said: I select the Data and Item columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

I now add another column (Add Column > Custom Column), and I set it up like this:

This makes a duplicate of the Data column, but adds an index within each of the nested tables, like this:

Then I add another column to determine the index number associated with the start of each summary in each nested table:

(You may want to search for "Summary (" or "Summary (Units)" instead of "Summary")

Note that it is constructed similarly to the previous column, in that it is basically a duplicate of the Indexed column, only with the Summary Index column added within each nested table.

Then I add another column like this, to determine the index position of the first Summary table's first line for each nested table:

and get this:

Then I add one more column like this, to remove the top rows that I don't want within each nested table:

Which gives me this:

(The table selected in this image is the one that had the extra information at the top. That information is gone now.)

Then I select the TopRowsRemoved and Item columns, and then Home > Remove Columns (the words, with the drop-down arrow) > Remove Other Columns to get this:

Then, I click the button to expand the tables in the TopRowsRemoved column (instead of the Data column, which we had done before), which brings up this pop-up that looks exactly the same as when we'd used the Data column. Then I clear the checkbox beside "Use original column name as prefix" and click OK.

Then I delete the old Expanded Data step, under APPLIED STEPS in the right hand pane. If I don't delete the Expanded Data step, I'll get an error because it's looking for the Data column, which doesn't exist. I didn't use the Data column this time. Instead, I used the TopRowsRemoved column.

At this point, the rest of my previous answer still applies, so refer back to where I wrote ---RETURN FROM EDIT AT BOTTOM OF ANSWER TO CONTINUE--- above.

Here's my M code for the "Main Query" query:

let
Source = Folder.Files(FolderPath),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
#"Invoke Custom Function1" = Table.AddColumn(#"Removed Other Columns", "Transform File from Test", each #"Transform File from Test"([Content])),
#"Filtered Hidden Files1" = Table.SelectRows(#"Invoke Custom Function1", each [Attributes]?[Hidden]? <> true),
#"Removed Other Columns1" = Table.SelectColumns(#"Filtered Hidden Files1", {"Transform File from Test", "Name"}),
#"Expanded Transform File from Test" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from Test", {"Column1", "Column2", "Column3", "Column4", "Column5", "Item", "Custom"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Item", "Custom"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Transform File from Test", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"CustName", type text}, {"Opening", type any}, {"In", type any}, {"Out", type any}, {"Closing", type any}, {"DE", type text}, {"Summary (Units)", type text}, {"Feb-19.xlsx", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"DE", "Sheet"}, {"Feb-19.xlsx", "File Name"}, {"Summary (Units)", "Summary Type"}}),
#"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each ([CustName] <> "CustName")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows",{"File Name", "Sheet", "CustName", "Opening", "In", "Out", "Closing", "Summary Type"}),
#"Grouped Rows" = Table.Group(#"Reordered Columns", {"Summary Type"}, {{"AllData", each _, type table}})
in
#"Grouped Rows"

Here's my M code for the "Transform Sample File from Test" query, with the changes to accommodate top rows having transactional information:

let
Source = Excel.Workbook(#"Sample File Parameter1", null, true),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Data","Item"}),
#"Added Index" = Table.AddColumn(#"Removed Other Columns", "Indexed", each Table.AddIndexColumn([Data],"Index", 0, 1)),
#"Added Custom1" = Table.AddColumn(#"Added Index", "SummaryIndexed", each Table.AddColumn([Indexed],"Summary Index", each try if Text.StartsWith([Column1],"Summary") then [Index] else null otherwise null)),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "IndexMins", each List.Min([SummaryIndexed][Summary Index])),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "TopRowsRemoved", each Table.RemoveFirstN([SummaryIndexed],[IndexMins])),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"TopRowsRemoved", "Item"}),
#"Expanded TopRowsRemoved" = Table.ExpandTableColumn(#"Removed Other Columns1", "TopRowsRemoved", {"Column1", "Column2", "Column3", "Column4", "Column5", "Index", "Summary Index"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Index", "Summary Index"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded TopRowsRemoved", each ([Column1] <> null)),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each if Text.StartsWith([Column1],"Summary") then [Column1] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Filtered Rows1" = Table.SelectRows(#"Filled Down", each ([Column1] <> "Summary (Amount)" and [Column1] <> "Summary (Units)"))
in
#"Filtered Rows1"

这篇关于Excel Power Query:从具有多个未固定工作表的多个未固定文件中获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆