如何使用BeautifulSoup获取选项文本 [英] How to get the option text using BeautifulSoup

查看：100 发布时间：2020/9/20 7:10:36 python html-parsing beautifulsoup

本文介绍了如何使用BeautifulSoup获取选项文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用BeautifulSoup在以下html中获取选项文本.例如:我想获得2002/12，2003/12等.

I want to using BeautifulSoup to get the option text in the following html. For example: I'd like to get 2002/12 , 2003/12 etc.

<select id="start_dateid">
<option value="0">2002/12</option>
<option value="1">2003/12</option>
<option value="2">2004/12</option>
<option value="3">2005/12</option>
<option value="4">2006/12</option>
<option value="5" selected="">2007/12</option>
<option value="6">2008/12</option>
<option value="7">2009/12</option>
<option value="8">2010/12</option>
<option value="9">2011/12</option>
</select>

获取内容的最佳方法是什么?现在，我正在使用以下代码，但是我不知道该如何使用漂亮的汤.如果html文件中有多个选定区域，则结果将不正确.这是我到目前为止的内容:

What's the best way to get the contents? Now I'm using the following code but I don't know how to use beautiful soup for that. If there are more than one selected areas in the html file, the result will be incorrect. Here is what I have so far:

    import urllib2
    from bs4 import BeautifulSoup
    import lxml

    soup = BeautifulSoup(urllib2.urlopen("./test.html").read(),"lxml");
    for item in soup.find_all('option'):
            print(''.join(str(item.find(text=True))));

推荐答案

您不必在这里使用lxml.我在机器上安装它时遇到了麻烦，因此我的答案没有使用它.

You don't have to use lxml here. I have trouble installing it on my machine, so my answer does not make use of it.

from bs4 import BeautifulSoup as BS
import urllib2

soup = BS(urllib2.urlopen("./test.html").read())
contents = [str(x.text) for x in soup.find(id="start_dateid").find_all('option')]

这样，您就避免了html文件中的多个选择区域的问题，因为我们首先要限制id='start_dateid'，这保证了您拥有正确的<select>，因为在每个html文档中每个html如果元素具有id属性，则必须具有唯一的id属性.然后，我们仅在那个 <select>标记内搜索所有<option>标记，然后从每个<option>中获取所有值.

With this, you avoid the issue of multiple select areas in the html file, since we're first limiting by id='start_dateid', which guarantees for you that you have the right <select>, since within each html document each html element must have a unique id attribute if it has an id attribute. Then, we're searching for all of the <option> tags only within that <select> tag, and then we get all of the values from each <option>.

这篇关于如何使用BeautifulSoup获取选项文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用BeautifulSoup获取选项文本 [英] How to get the option text using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何使用BeautifulSoup获取选项文本 [英] How to get the option text using BeautifulSoup

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭