如何使用BeautifulSoup获取选项文本 [英] How to get the option text using BeautifulSoup

查看:100
本文介绍了如何使用BeautifulSoup获取选项文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用BeautifulSoup在以下html中获取选项文本.例如:我想获得2002/12,2003/12等.

I want to using BeautifulSoup to get the option text in the following html. For example: I'd like to get 2002/12 , 2003/12 etc.

<select id="start_dateid">
<option value="0">2002/12</option>
<option value="1">2003/12</option>
<option value="2">2004/12</option>
<option value="3">2005/12</option>
<option value="4">2006/12</option>
<option value="5" selected="">2007/12</option>
<option value="6">2008/12</option>
<option value="7">2009/12</option>
<option value="8">2010/12</option>
<option value="9">2011/12</option>
</select>

获取内容的最佳方法是什么?现在,我正在使用以下代码,但是我不知道该如何使用漂亮的汤.如果html文件中有多个选定区域,则结果将不正确.这是我到目前为止的内容:

What's the best way to get the contents? Now I'm using the following code but I don't know how to use beautiful soup for that. If there are more than one selected areas in the html file, the result will be incorrect. Here is what I have so far:

    import urllib2
    from bs4 import BeautifulSoup
    import lxml

    soup = BeautifulSoup(urllib2.urlopen("./test.html").read(),"lxml");
    for item in soup.find_all('option'):
            print(''.join(str(item.find(text=True))));

推荐答案

您不必在这里使用lxml.我在机器上安装它时遇到了麻烦,因此我的答案没有使用它.

You don't have to use lxml here. I have trouble installing it on my machine, so my answer does not make use of it.

from bs4 import BeautifulSoup as BS
import urllib2

soup = BS(urllib2.urlopen("./test.html").read())
contents = [str(x.text) for x in soup.find(id="start_dateid").find_all('option')]

这样,您就避免了html文件中的多个选择区域的问题,因为我们首先要限制id='start_dateid',这保证了您拥有正确的<select>,因为在每个html文档中每个html如果元素具有id属性,则必须具有唯一的id属性.然后,我们仅在那个 <select>标记内搜索所有<option>标记,然后从每个<option>中获取所有值.

With this, you avoid the issue of multiple select areas in the html file, since we're first limiting by id='start_dateid', which guarantees for you that you have the right <select>, since within each html document each html element must have a unique id attribute if it has an id attribute. Then, we're searching for all of the <option> tags only within that <select> tag, and then we get all of the values from each <option>.

这篇关于如何使用BeautifulSoup获取选项文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆