将Excel单元格中的HTML解析为多个单元格 [英] Parsing HTML in excel cell to multiple cells

查看:82
本文介绍了将Excel单元格中的HTML解析为多个单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的excel单元中的数据非常混乱,看起来非常混乱,它包含大约100行HTML标签:

I have the following - really messy looking - data in my excel cell, which consists of around 100 rows of HTML Tags:

找到以下两个示例:

<ul class=""list-unstyled"">
    <li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://google.org/"" target=""_blank"">Website</a></li>
    <li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://www.google.com/"" target=""_blank"">Website 2</a></li>
    <li><span title=""Product"" class=""glyphicon glyphicon-search text-gray""></span> <a href=""http://amazon.com"" target=""_blank"">Product</a></li>
    <li><span title=""Product"" class=""glyphicon glyphicon-search text-gray""></span> <a href=""https://amazon.de/"" target=""_blank"">Product 2</a></li>          
    <li><span title=""Tags"" class=""glyphicon glyphicon glyphicon-tag text-gray""></span>
        <small><span class=""label label-warning"">Available</span></small>
        <small><span class=""label label-warning"">Country</span></small>
    </li>
</ul>

<ul class=""list-unstyled"">
    <li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://google.org/"" target=""_blank"">Website</a></li>
    <li><span title=""Website"" class=""glyphicon glyphicon-link text-gray""></span> <a href=""https://www.google.com/"" target=""_blank"">Website 2</a></li>
    <li><span title=""Product"" class=""glyphicon glyphicon-search text-gray""></span> <a href=""http://amazon.com"" target=""_blank"">Product</a></li>    
    <li><span title=""Tags"" class=""glyphicon glyphicon glyphicon-tag text-gray""></span>
        <small><span class=""label label-warning"">Not Available</span></small>
        <small><span class=""label label-warning"">State</span></small>
    </li>
</ul>

我的目标是创建一个如下所示的表:

My goal is to create a table that looks like the following:

| Website 1           | Website 2           | Website 3 | Product 1         | Product 2          | Product 3 | Available     | Country |
|---------------------|---------------------|-----------|-------------------|--------------------|-----------|---------------|---------|
| https://google.org/ | https://google.com/ |           | http://amazon.com | https://amazon.de/ |           | Available     | Country |
| https://google.org/ | https://google.com/ |           | http://amazon.com |                    |           | Not Available | State   |

老实说,我不知道如何应对这一挑战.

I honestly have no clue how to approach this challenge.

您身边有什么建议吗?

推荐答案

方法是:创建函数,该函数将HTML代码作为字符串作为参数,并将返回具有与表标题相同的键的字典.函数的主体为:

The approach is: create function, that will take HTML code as string as a parameter and will return dictionary with keys same as your table headers. The body of a function is:

Function ParseHTML(str As String) As Scripting.Dictionary
Set ParseHTML = New Scripting.Dictionary

Dim txt As String
Dim website As Long: website = 0
Dim product As Long: product = 0
Dim i As Long: i = 0

Do While True

    'get all text between <li> and <\li> tags
    'then extract all data from it: title attribute and link
    txt = Mid(str, InStr(1, str, "<li>") + 4, InStr(1, str, "</li>") - InStr(1, str, "<li>") - 4)
    'select which case it is: website, product or tags
    Select Case Mid(txt, InStr(1, txt, "title") + 8, InStr(1, txt, "class") - InStr(1, txt, "title") - 11)
        Case Is = "Website"
            website = website + 1
            'here you extract the link
            ParseHTML.Add "Website " & website, Mid(txt, InStr(1, txt, "<a href") + 10, InStr(1, txt, "target") - InStr(1, txt, "<a href") - 13)
        Case Is = "Product"
            product = product + 1
            'here you extract the link
            ParseHTML.Add "Product " & product, Mid(txt, InStr(1, txt, "<a href") + 10, InStr(1, txt, "target") - InStr(1, txt, "<a href") - 13)
        Case Is = "Tags"
            'if we reached Tags, then all websites are over and need different processing
            Exit Do
    End Select
    'delete processed text
    str = Mid(str, InStr(1, str, "</li>") + 5)

Loop

'since in your table you have 3 places for websites and products, so we need to add them
For i = website + 1 To 3
    ParseHTML.Add "Website " & i, ""
Next i
For i = product + 1 To 3
    ParseHTML.Add "Product " & i, ""
Next i

'now txt is the interior of last <li></li> tag and now we focus on what is
'between <small> and </small> tags
'also we don't need str variable anymore, so we can re-use it
str = Mid(txt, InStr(1, txt, "<small>") + 7, InStr(1, txt, "</small>") - InStr(1, txt, "<small>") - 7)
ParseHTML.Add "Available", Mid(str, InStr(1, str, ">") + 1, Len(str) - InStr(1, str, ">") - 7)
'remove processed part of html
txt = Mid(txt, InStr(1, txt, "</small>") + 8)
'take care of last <small> tag
str = Mid(txt, InStr(1, txt, "<small>") + 7, InStr(1, txt, "</small>") - InStr(1, txt, "<small>") - 7)
ParseHTML.Add "Country", Mid(str, InStr(1, str, ">") + 1, Len(str) - InStr(1, str, ">") - 7)


End Function

因此,总而言之,该函数返回带有键的字典 "Website 1""Website 2""Website 3""Product 1""Product 2""Product 3""Available""Country".

So, to sum up, the function returns dictionary with keys "Website 1", "Website 2", "Website 3", "Product 1", "Product 2", "Product 3", "Available", "Country".

现在,有了该功能,可以轻松填写​​所需的表格.这是一种实现方法:

Now, having that function, it's easy to fill the table you want. Here's one way of doing it:

Sub ProcessHTML()
'determine last row in A column
Dim lastRow As Long: lastRow = Cells(Rows.Count, 1).End(xlUp).Row
Dim dict As Scripting.Dictionary
Dim i As Long
Dim j As Long

For i = 2 To lastRow
    'parse HTML code with our function
    Set dict = ParseHTML(Cells(i, 1).Value)
    For j = 2 To 9
        'write corresponding values from dictionary to cells in table
        Cells(i, j).Value = dict(Cells(1, j).Value)
    Next j
    'get rid of object
    Set dict = Nothing
Next i
End Sub

它适用于这样排列的表(已填充):

It works with table arranged like that (already filled):

在列中包含这些标题非常重要.

It's very important to have these headers in column.

重要

在运行任何命令之前,请在您的VBA编辑器中转到: 工具->引用,然后在弹出的窗口中选择 Microsoft脚本运行时.

Before running anything, in your VBA editor go to: Tools -> References, and in the window that will pop-up you need to select Microsoft Scripting Runtime.

这篇关于将Excel单元格中的HTML解析为多个单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆