For循环不适用于Web抓取python中的Google搜索 [英] For loop doesn't work for web scraping Google search in python
问题描述
我正在使用关键字列表在网络上搜寻Google搜索.用于抓取单个页面的嵌套For循环效果很好.但是,列表中的另一个for循环搜索关键字不起作用,因为我打算针对每个搜索结果抓取数据.结果没有获得前两个关键字的搜索结果,但仅得到了最后一个关键字的结果.
I'm working on web-scraping Google search with a list of keywords. The nested For loop for scraping a single page works well. However, the other for loop searching keywords in the list does not work as I intended to which scrap the data for each searching result. The results didn't get the search outcome of the first two keywords, but it got only the result of the last keyword.
这是代码:
browser = webdriver.Chrome(r"C:\...\chromedriver.exe")
df = pd.DataFrame(columns = ['ceo', 'value'])
baseUrl = 'https://www.google.com/search?q='
ceo_list = ["Bill Gates", "Elon Musk", "Warren Buffet"]
values =[]
for ceo in ceo_list:
browser.get(baseUrl + ceo)
table = browser.find_elements_by_css_selector('div.ifM9O')
for row in table:
ceo = str(([c.text for c in row.find_elements_by_css_selector('div.kno-ecr-pt.PZPZlf.gsmt.i8lZMc')])).strip('[]').strip("''")
value = str(([c.text for c in row.find_elements_by_css_selector('div.Z1hOCe')])).strip('[]').strip("''")
ceo = pd.Series(ceo)
value = pd.Series(value)
df = df.assign(**{'ceo': ceo, 'value': value})
print(df)
browser.close()
这是输出:
ceo value
0 Warren Buffett Born: August 30, 1930 (age 89 years), Omaha, N...
我期望的是:
ceo value
0 Bill Gates Born:..........
1 Elon Musk Born:...........
2 Warren Buffett Born: August 30, 1930 (age 89 years), Omaha, N...
不确定哪一部分丢失了.
Not sure which part was missing.
推荐答案
您需要将ceo创建为列表,并将其附加到for循环内,以免覆盖它
You need to create ceo as a list and append to it inside the for loop so you don't keep overwriting it
这篇关于For循环不适用于Web抓取python中的Google搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!