将HTML表格转换为JSON [英] Convert a HTML Table to JSON

查看:131
本文介绍了将HTML表格转换为JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将通过BeautifulSoup提取的表转换为JSON。



到目前为止,我已经设法隔离所有行,尽管我不知道如何处理来自这里的数据。任何建议都将非常感激。

  [< tr>< td>< strong>余额< / strong> < / td>< td>< strong> $ 18.30< / strong>< / td>< / tr>,
< tr>< td>卡名称< / td>< td>名称< / td>< / tr>,
< tr>< td>帐户持有人< / td>< td> NAME< / td>< / tr> < tr>< td>卡片编号< / td>< td> 1234< / td>< / tr>,
< tr>< td>状态< / td>< td> (< / tr>] /


这是我的尝试:

  result = [] 
allrows = row.findAll('t')
for row col in allcols:
thestrings = [unicode(s)for s in col.findAll(text = True)]
thetext =''.join(thestrings)
result [-1] .append(thetext)

这给了我以下结果:

  [
[u'Card balance',u'$ 18.30'],
[u'Card name',u' NAMEn'],
[u'Account holder',u'NAME'],
[u'Card number','1234'],
[u'Status',u' Active']
]


解决方案

可能是您的数据是这样的:

  html_data =
< table>
< tr>
< td>卡余额< / td>
< td> $ 18.30< / td>
< / tr>
< tr>
< td>卡片名称< / td>
< td> NAMEn< / td>
< / tr>
< tr>
< td>帐户持有人< / td>
< td> NAME< / td>
< / tr>
< tr>
< td>卡号< / td>
< td> 1234< / td>
< / tr>
< tr>
< td>状态< / td>
< td>活动< / td>
< / tr>
< / table>

我们可以使用此代码将结果作为列表获得:

  from bs4 import BeautifulSoup 
table_data = [[cell.text for cell in row(td)] $ BeautifulSoup(html_data)(tr)]

中的行的b $ b将结果转换为JSON,如果你不关心这个命令:

  import json 
print json.dumps(dict(table_data ))

结果:

  {
状态:有效,
卡片名称:NAMEn,
帐户持有人:
NAME, 卡号:1234,
卡余额:$ 18.30
}

如果您需要相同的订单,请使用以下代码:

  from collections import OrderedDict 
import json
print json.dumps(OrderedDict(table_data))

这给了你:

  {
Card余额:$ 18.30,
卡名:NAMEn,
账户持有人:NAME,
卡号:1234,
状态:活动
}


I'm trying to convert a table I have extracted via BeautifulSoup into JSON.

So far I've managed to isolate all the rows, though I'm not sure how to work with the data from here. Any advice would be very much appreciated.

[<tr><td><strong>Balance</strong></td><td><strong>$18.30</strong></td></tr>, 
<tr><td>Card name</td><td>Name</td></tr>, 
<tr><td>Account holder</td><td>NAME</td></tr>, 
<tr><td>Card number</td><td>1234</td></tr>, 
<tr><td>Status</td><td>Active</td></tr>]

(Line breaks mine for readability)

This was my attempt:

result = []
allrows = table.tbody.findAll('tr')
for row in allrows:
    result.append([])
    allcols = row.findAll('td')
    for col in allcols:
        thestrings = [unicode(s) for s in col.findAll(text=True)]
        thetext = ''.join(thestrings)
        result[-1].append(thetext)

which gave me the following result:

[
 [u'Card balance', u'$18.30'],
 [u'Card name', u'NAMEn'],
 [u'Account holder', u'NAME'],
 [u'Card number', u'1234'],
 [u'Status', u'Active']
]

解决方案

Probably your data is something like:

html_data = """
<table>
  <tr>
    <td>Card balance</td>
    <td>$18.30</td>
  </tr>
  <tr>
    <td>Card name</td>
    <td>NAMEn</td>
  </tr>
  <tr>
    <td>Account holder</td>
    <td>NAME</td>
  </tr>
  <tr>
    <td>Card number</td>
    <td>1234</td>
  </tr>
  <tr>
    <td>Status</td>
    <td>Active</td>
  </tr>
</table>
"""

From which we can get your result as a list using this code:

from bs4 import BeautifulSoup
table_data = [[cell.text for cell in row("td")]
                         for row in BeautifulSoup(html_data)("tr")]

To convert the result to JSON, if you don't care about the order:

import json
print json.dumps(dict(table_data))

Result:

{
    "Status": "Active",
    "Card name": "NAMEn",
    "Account holder":
    "NAME", "Card number": "1234",
    "Card balance": "$18.30"
}

If you need the same order, use this:

from collections import OrderedDict
import json
print json.dumps(OrderedDict(table_data))

Which gives you:

{
    "Card balance": "$18.30",
    "Card name": "NAMEn",
    "Account holder": "NAME",
    "Card number": "1234",
    "Status": "Active"
}

这篇关于将HTML表格转换为JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆