将 HTML 表格转换为 JSON [英] Convert a HTML Table to JSON
问题描述
我正在尝试将通过 BeautifulSoup 提取的表格转换为 JSON.
到目前为止,我已经设法隔离了所有行,但我不确定如何处理来自此处的数据.任何建议将不胜感激.
[余额 $18.30 ,<tr><td>卡片名称</td><td>名称</td></tr>,<tr><td>账户持有人</td><td>姓名</td></tr>,<tr><td>卡号</td><td>1234</td></tr>,<tr><td>状态</td><td>活动</td></tr>]
(为了可读性而换行)
这是我的尝试:
result = []allrows = table.tbody.findAll('tr')对于 allrows 中的行:结果.append([])allcols = row.findAll('td')对于 allcols 中的 col:thestrings = [unicode(s) for s in col.findAll(text=True)]thetext = ''.join(thestrings)结果[-1].append(thetext)
这给了我以下结果:
<预><代码>[[u'卡余额', u'$18.30'],[u'卡名', u'NAMEn'],[u'账户持有人', u'NAME'],[u'卡号', u'1234'],[u'状态', u'活动']]可能你的数据是这样的:
html_data = """<表格><tr><td>卡余额</td><td>$18.30</td></tr><tr><td>卡名</td><td>NAMEn</td></tr><tr><td>账户持有人</td><td>名字</td></tr><tr><td>卡号</td><td>1234</td></tr><tr><td>状态</td><td>主动</td></tr>"""
我们可以使用此代码从中获取您的结果列表:
from bs4 import BeautifulSouptable_data = [[cell.text for cell in row("td")]对于 BeautifulSoup(html_data)("tr")] 中的行
将结果转换为 JSON,如果您不关心顺序:
导入json打印 json.dumps(dict(table_data))
结果:
<代码>{"状态": "活动","卡名": "NAMEn",账户持有人":"NAME", "卡号": "1234",卡余额":$18.30"}
如果您需要相同的订单,请使用:
from collections import OrderedDict导入json打印 json.dumps(OrderedDict(table_data))
这给了你:
<代码>{"卡余额": "$18.30","卡名": "NAMEn",账户持有人姓名","卡号": "1234",状态":活动"}
I'm trying to convert a table I have extracted via BeautifulSoup into JSON.
So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.
[<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>,
<tr><td>Card name</td><td>Name</td></tr>,
<tr><td>Account holder</td><td>NAME</td></tr>,
<tr><td>Card number</td><td>1234</td></tr>,
<tr><td>Status</td><td>Active</td></tr>]
(Line breaks mine for readability)
This was my attempt:
result = []
allrows = table.tbody.findAll('tr')
for row in allrows:
result.append([])
allcols = row.findAll('td')
for col in allcols:
thestrings = [unicode(s) for s in col.findAll(text=True)]
thetext = ''.join(thestrings)
result[-1].append(thetext)
which gave me the following result:
[
[u'Card balance', u'$18.30'],
[u'Card name', u'NAMEn'],
[u'Account holder', u'NAME'],
[u'Card number', u'1234'],
[u'Status', u'Active']
]
Probably your data is something like:
html_data = """
<table>
<tr>
<td>Card balance</td>
<td>$18.30</td>
</tr>
<tr>
<td>Card name</td>
<td>NAMEn</td>
</tr>
<tr>
<td>Account holder</td>
<td>NAME</td>
</tr>
<tr>
<td>Card number</td>
<td>1234</td>
</tr>
<tr>
<td>Status</td>
<td>Active</td>
</tr>
</table>
"""
From which we can get your result as a list using this code:
from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
for row in BeautifulSoup(html_data)("tr")]
To convert the result to JSON, if you don't care about the order:
import json
print json.dumps(dict(table_data))
Result:
{
"Status": "Active",
"Card name": "NAMEn",
"Account holder":
"NAME", "Card number": "1234",
"Card balance": "$18.30"
}
If you need the same order, use this:
from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))
Which gives you:
{
"Card balance": "$18.30",
"Card name": "NAMEn",
"Account holder": "NAME",
"Card number": "1234",
"Status": "Active"
}
这篇关于将 HTML 表格转换为 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!