使用 Power Query 加载保存为 .xls 的 *.htm 文件(从第 5 行开始) [英] Load *.htm file saved as .xls (starting from row number 5) Using Power Query

查看:16
本文介绍了使用 Power Query 加载保存为 .xls 的 *.htm 文件(从第 5 行开始)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须导入一个保存为 .*htm、.*html 的 .xls 文件.我附上了一个链接,该链接提供了该格式的示例文件.

表格的实际第一行从第 5 行开始,但上面还有数据.

文件如下所示,

结果有点难看,但我怀疑如果您的实际数据文件不包含太多格式或不一致的表结构,那么您可以在这些方面得到一些很好的工作,特别是如果您只有一个列处理.

I have to import an .xls file which is saved as .*htm, .*html. I attached a link which provides a sample file of that format.

The Actual first row of the table starts from row number 5. But there are data above it.

The file looks as below,

The sample file.

But please make sure to include some rows on top of it with some test values and make it look like the screenshot above.

if there are no rows above it, then the following M Code provided by Alexis Olson works

let
    Source = Folder.Files("C:UsersaolsonDownloadsexample-html.xls"),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xls")),
    #"C:UsersaolsonDownloadsexample-html xls\_example-html xls" = #"Filtered Rows"{[#"Folder Path"="C:UsersaolsonDownloadsexample-html.xls",Name="example-html.xls"]}[Content],
    #"Imported Excel" = Web.Page(#"C:UsersaolsonDownloadsexample-html xls\_example-html xls"){0}[Data]
in
    #"Imported Excel"

When I add rows on top of the sample and click on save in excel - it gives me a warning whether, I want to continue with the same format then I click on "YES".

I tried to play with the children table on the Query Editor. But it is not taking me anywhere.

Source = Table cannot be found inside it at all.

解决方案

For whatever reason, the HTML in the sample file has unmatched tags that the XML parser doesn't like. You can get at the data though with some work if you load it as text and remove or fix any parts that the parser has trouble with.

Consider this M code:

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:UsersaolsonDownloadsexample-html.xlsexample-html.xls"))}),
    #"Kept Range of Rows" = Table.Range(Source,60,22),
    Column1 = Text.Combine(#"Kept Range of Rows"[Column1]),
    #"Parsed XML" = Xml.Tables(Column1),
    Table = #"Parsed XML"{0}[Table],
    #"Expanded td" = Table.ExpandTableColumn(Table, "td", {"i", "b", "span", "Element:Text"}, {"td.i", "td.b", "td.span", "td.Element:Text"}),
    #"Expanded td.span" = Table.ExpandTableColumn(#"Expanded td", "td.span", {"Element:Text", "Attribute:style"}, {"td.span.Element:Text", "td.span.Attribute:style"})
in
    #"Expanded td.span"

The steps here are roughly:

  1. Load the file as text
  2. Select just the <tbody> section.
  3. Concatenate those rows into a single text value.
  4. Parse that text as XML.
  5. Expand any tables that are found.

When I initially did this I noticed the <b> tag wasn't closed so I added a </b> in my source file.

The results are a bit ugly, but I suspect if your actual data files don't include much formatting or inconsistent table structure, then you can get something along these lines working passably well, especially if you only have a single column to deal with.

这篇关于使用 Power Query 加载保存为 .xls 的 *.htm 文件(从第 5 行开始)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆