在python中以特定宽度存储来自td标签的信息 [英] Storing information from td tags with a specific width, in python
本文介绍了在python中以特定宽度存储来自td标签的信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试存储td标签中所有具有 width ="82"
的信息,或者也许有一种更有效的方法.
I am trying to store all the information from the td tags that have width="82"
or maybe there is a more efficient method.
<a name="AAKER"> </a>
<table border="" width="100%" cellpadding="5"><tbody><tr><td bgcolor="#FFFFFF"><b>AAKER</b>
<small>(<a href="http://google.com">Soundex
A260</a>)
— <i>See also</i>
<a href="http://google.com">ACKER</a>,
<a href="http://google.com">KEAR</a>,
<a href="http://google.com">TAAKE</a>.
</small>
</td></tr></tbody></table><br clear="all">
<table align="left" cellpadding="5">
<tbody><tr><td width="82" align="right" valign="top"> </td><td valign="top">
<img src="rd.gif" width="13" height="13">
<b><a name="954.35.65">Aaker, Casper Drengman</a> (b.1883)</b>
— also known as
<b>Casper D. Aaker</b> — of Minot,
<a href="http://google.com">WardCounty</a> , N.Dak. Born in Ridgeway,
<a href="http://google.com">Winneshiek County</a> , Iowa, August,
<a href="http://google.com">1883</a>. Republican.
<a href="http://google.com">Lawyer</a>; organizer, Trinity
<a href="http://google.com">Hospital</a>,
1922; delegate to Republican National Convention from North Dakota.
<table width="100%" align="left">
<tbody>
<tr><td width="20"> </td>
<td width="26" valign="top"><img src="hand.gif" width="26" height="17"></td>
<td valign="top">
<span style="font-size:8pt;"><i>Relatives:</i>
Son of Drengman Aaker and Christine (Ellefson) Aaker; married,
<a href="http://google.com">December 15, 1914</a>,
to Leda Mansfield.</span>
</td>
</tr>
</tbody>
</table>
</td></tr>
<tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="949.93.45">Aaker, H. H.</a></b> — of
<a href="http://google.com">Norman County</a>
, Minn. Prohibition candidate for
<a href="http://google.com">secretary of state of Minnesota</a>
, 1892.
<a href="http://google.com">Burial location unknown</a>.
</td></tr>
</tbody>
</table><br clear="all"><br>
<a name="AALL"> </a>
<table border="" width="100%" cellpadding="5">
<tbody><tr><td bgcolor="#FFFFFF"><b>AALL</b> <small>(
<a href="http://google.com">SoundexA400</a>
)— <i>See also</i>
<a href="http://google.com">AHL</a>,
<a href="http://google.com">AL</a>,
<a href="http://google.com">ALL</a>,
</small>
</td></tr>
</tbody></table><br clear="all">
<tbody><tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="961.32.34">Aamodt, Gary</a></b> — of Madison,
<a href="http://google.com">Dane County</a>, Wis.
Democrat. Delegate to Democratic National Convention from Wisconsin,
<a href="http://google.com">1976</a>. Still living as of 1976.
</td></tr>
<tr><td width="82" align="right" valign="top"> </td>
<td valign="top"><img src="rd.gif" width="13" height="13">
<b><a name="030.75.75">Aamodt, Marjorie M.</a></b> —
Democrat. Candidate for
<a href="http://google.com">Pennsylvania
state house of representatives</a> 13th District, 1980.
<a href="http://google.com">Female</a>.
Still living as of 1980.
</td>
</tr>
</tbody></table><br clear="all"><br>
到目前为止,我已经尝试定义一个对象:
So far I have tried defining an object:
ta = driver.find_element_by_tag_name('tbody').get_attribute('innerHTML')
pd.read_html(ta)
但是我希望将所有pd.read_html(ta)[i]存储在数据框中,而忽略表宽度="100"
But I wish to have all pd.read_html(ta)[i] stored in a dataframe ignoring the table width ="100"
推荐答案
您可以通过汤中的 widht =" 100%
.extract()
然后获取所有行.
You can .extract()
the tables with widht="100%
from the soup and then get all rows.
例如( txt
包含问题中的HTML代码段):
For example (txt
contains your HTML snippet from the question):
soup = BeautifulSoup(txt, 'html.parser')
for t in soup.select('table[width="100%"]'):
t.extract()
all_data = []
for row in soup.select('tr'):
name, desc = row.get_text(strip=True, separator=' ').split('—', maxsplit=1)
all_data.append([name, desc.strip()])
df = pd.DataFrame(all_data, columns=['name', 'description'])
print(df)
df.to_csv('data.csv')
打印:
name description
0 Aaker, Casper Drengman (b.1883) also known as Casper D. Aaker — of Minot, Ward...
1 Aaker, H. H. of Norman County , Minn. Prohibition candidate...
2 Aamodt, Gary of Madison, Dane County , Wis.\n Democr...
3 Aamodt, Marjorie M. Democrat. Candidate for Pennsylvania\n ...
并保存 data.csv
(来自LibreOffice的屏幕截图):
And saves data.csv
(screenshot from LibreOffice):
这篇关于在python中以特定宽度存储来自td标签的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文