使用Python刮嵌套的div,横跨在微博? [英] Using Python to Scrape Nested Divs and Spans in Twitter?
问题描述
我想从Twitter搜索结果刮喜好和锐推。
I'm trying to scrape the likes and retweets from the results of a Twitter search.
运行Python的下方后,我得到一个空列表, []
。我不使用Twitter的API,因为它不会在通过包括hashtag鸣叫看这远。
After running the Python below, I get an empty list, []
. I'm not using the Twitter API because it doesn't look at the tweets by hashtag this far back.
在code,我使用的是:
The code I'm using is:
from bs4 import BeautifulSoup
import requests
url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)
我可以成功地保存HTML使用这种code到文件。它缺少了大量的信息,当我搜索的文本,比如我找...
I can successfully save the html to file using this code. It is missing large amounts of information when I search the text, such as the class names I am looking for...
所以(的一部分)的问题显然是在准确访问源$ C $ C。
So (part of) the problem is apparently in accurately accessing the source code.
filename = 'newfile2.txt'
with open(filename, 'w') as handle:
handle.writelines(str(data))
该截图显示,我试图刮跨度。
This screenshot shows the span that I'm trying to scrape.
我看这个问题,和其他人一样,但我不太到达那里。结果
<一href=\"http://stackoverflow.com/questions/27355051/how-can-i-use-beautifulsoup-to-get-deeply-nested-div-values\">How我可以使用BeautifulSoup来获得深度嵌套的DIV值?
I've looked at this question, and others like it, but I'm not quite getting there.
How can I use BeautifulSoup to get deeply nested div values?
推荐答案
看来你的GET请求返回有效的HTML但在#timeline时间单元没有鸣叫的元素。然而,增加一个用户代理请求头,似乎解决这个问题。
It seems that your GET request returns valid HTML but with no tweet elements in the #timeline element. However, adding a user agent to the request headers seems to remedy this.
from bs4 import BeautifulSoup
import requests
url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}
r = requests.get(url, headers=headers)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)
这篇关于使用Python刮嵌套的div,横跨在微博?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!