将 HTML 表格转换为 JSON [英] Convert a HTML Table to JSON

查看:28
本文介绍了将 HTML 表格转换为 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将通过 BeautifulSoup 提取的表格转换为 JSON.

到目前为止,我已经设法隔离了所有行,但我不确定如何处理来自此处的数据.任何建议将不胜感激.

[余额$18.30,<tr><td>卡片名称</td><td>名称</td></tr>,<tr><td>账户持有人</td><td>姓名</td></tr>,<tr><td>卡号</td><td>1234</td></tr>,<tr><td>状态</td><td>活动</td></tr>]

(为了可读性而换行)

这是我的尝试:

result = []allrows = table.tbody.findAll('tr')对于 allrows 中的行:结果.append([])allcols = row.findAll('td')对于 allcols 中的 col:thestrings = [unicode(s) for s in col.findAll(text=True)]thetext = ''.join(thestrings)结果[-1].append(thetext)

这给了我以下结果:

<预><代码>[[u'卡余额', u'$18.30'],[u'卡名', u'NAMEn'],[u'账户持有人', u'NAME'],[u'卡号', u'1234'],[u'状态', u'活动']]

解决方案

可能你的数据是这样的:

html_data = """<表格><tr><td>卡余额</td><td>$18.30</td></tr><tr><td>卡名</td><td>NAMEn</td></tr><tr><td>账户持有人</td><td>名字</td></tr><tr><td>卡号</td><td>1234</td></tr><tr><td>状态</td><td>主动</td></tr>"""

我们可以使用此代码从中获取您的结果列表:

from bs4 import BeautifulSouptable_data = [[cell.text for cell in row("td")]对于 BeautifulSoup(html_data)("tr")] 中的行

将结果转换为 JSON,如果您不关心顺序:

导入json打印 json.dumps(dict(table_data))

结果:

<代码>{"状态": "活动","卡名": "NAMEn",账户持有人":"NAME", "卡号": "1234",卡余额":$18.30"}

如果您需要相同的订单,请使用:

from collections import OrderedDict导入json打印 json.dumps(OrderedDict(table_data))

这给了你:

<代码>{"卡余额": "$18.30","卡名": "NAMEn",账户持有人姓名","卡号": "1234",状态":活动"}

I'm trying to convert a table I have extracted via BeautifulSoup into JSON.

So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.

[<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>, 
<tr><td>Card name</td><td>Name</td></tr>, 
<tr><td>Account holder</td><td>NAME</td></tr>, 
<tr><td>Card number</td><td>1234</td></tr>, 
<tr><td>Status</td><td>Active</td></tr>]

(Line breaks mine for readability)

This was my attempt:

result = []
allrows = table.tbody.findAll('tr')
for row in allrows:
    result.append([])
    allcols = row.findAll('td')
    for col in allcols:
        thestrings = [unicode(s) for s in col.findAll(text=True)]
        thetext = ''.join(thestrings)
        result[-1].append(thetext)

which gave me the following result:

[
 [u'Card balance', u'$18.30'],
 [u'Card name', u'NAMEn'],
 [u'Account holder', u'NAME'],
 [u'Card number', u'1234'],
 [u'Status', u'Active']
]

解决方案

Probably your data is something like:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>$18.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don't care about the order:

import json
print json.dumps(dict(table_data))

Result:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

If you need the same order, use this:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

{
    "Card balance": "$18.30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}

这篇关于将 HTML 表格转换为 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆