从 nowgoal 获取表值出现索引错误 [英] Getting table value from nowgoal has got an index error
问题描述
我对抓取很陌生.我收到来自
我尝试通过循环从表中获取数据:代码如下:
<代码>>#从页面获取联赛名称 htmlSource = driver.page_source>#将htmlsource传入soup soup = bs4.BeautifulSoup(htmlSource,'html.parser')>#Table table = soup.select('table[id=table_live"]')>#表格行数 all_rows = table[0].select('tr')>#遍历每一行对于 i , enumerate(all_rows[2:]) 中的行:>尝试:>key_word = row['class'][0]>打印(关键字)>if 'Leaguestitle' in key_word:#if Leagues 改变了>联赛 = row.a.text>打印(row.a.text)>如果 Leagues_list 中的 row.a.text:>j = 1>别的:>j = 0>elif j== 1:>home_team = row.findAll('a')[0].text #home team>打印(home_team)>away_team = row.findAll('a')[1].text #away team>match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))>#match_number>链接 = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'>#link 匹配代码中的 3 合 1 赔率>home_ranking = row.findAll('span')[0].text.strip('[]') #主队排名>away_ranking = row.findAll('span')[1].text.strip('[]') #客队排名>final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])>除了 KeyError:>尝试:>如果 row['style']=='display:none':>继续>elif j== 1:>home_team = row.findAll('a')[0].text #home team>away_team = row.findAll('a')[1].text #away team>home_ranking = row.findAll('span')[0].text.strip('[]') #主队排名>away_ranking = row.findAll('span')[1].text.strip('[]') #客队排名>match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))>#match_code 与每场比赛相关联>链接 = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'>#link 匹配代码中的 3 合 1 赔率>final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])>除了 KeyError :>打印('键错误')>>>除了索引错误:>如果 j== 1:>home_team = row.findAll('a')[0].text #home team>away_team = row.findAll('a')[1].text #away team>home_ranking = row.findAll('span')[0].text.strip('[]') #主队排名>away_ranking = row.findAll('span')[1].text.strip('[]') #客队排名>match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))>#match_code 与每场比赛相关联>链接 = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'>#link 匹配代码中的 3 合 1 赔率>final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])>打印('索引错误捕获')>>print(final_list)#显示最终结果 driver.quit()#关闭>浏览器
然后我打印出主队和以下结果
<块引用>Chelsea adtext-bg QC: MAY88.COM - NHÀ CÁI HỢP PHÁP NA UY - THƯỞNG NẠP100% - HOÀN TRẢ 100TR - HỖ TRỢ 24/7
然后它给我一个索引错误,如下所示:
回溯(最近一次调用最后一次):文件D:/Football things/Stratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py",第124行,在<模块中away_team = row.findAll('a')[1].text #away teamIndexError:列表索引超出范围在处理上述异常的过程中,又发生了一个异常:回溯(最近一次调用最后一次):文件D:/Football things/Stratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py",第149行,在<module<模块中away_team = row.findAll('a')[1].text #away teamIndexError:列表索引超出范围
league_list = League_list = [English Premier League", 'Italian Serie A',英格兰冠军"、西甲联赛"、瑞典Allsvenskan"、美国职业足球大联盟"、沙特"、荷兰杯"]#等待一段时间# 等待一段时间wait.until(EC.element_to_be_clickable((By.ID, "li_league"))).click()# 点击 -team 排名等到(EC.element_to_be_clickable((By.XPATH, "//label[@for='TeamOrderCheck']/span"))).click()对于 League_list 中的联赛:尝试:nextRow = wait.until(EC.presence_of_element_located((by.XPATH, '//tr[.//a[contains(text(),"{}")]]'.format(league))))id = nextRow.get_attribute(id").split(_")[1]尝试:row = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),{}")]] 和后续兄弟::tr[@id=";tr_{}"] 而不是(@style="display:none")]'.format(league, int(id)+1))))print("########The result for {} ########".format(league))对于我排:打印(i.get_attribute(textContent"))打印("###########Completed##############".format(联赛))除了:row = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),"{}")]] 而不是(@style="display:none")]'.format(联赛))))print("########The result for {} ########".format(league))对于我排:打印(i.get_attribute(textContent"))打印("###########Completed##############".format(联赛))继续除了:继续
你可以使用following和preceeding属性,因为没有唯一的方法来识别下一个后续元素,我们必须取id并用1增加它
I am quite new to scraping. I am getting links from nowgoal. Below is how I started navigating to above page. I do not wish to get link for all matches. But I will have an input text file, which is attached here and use the selected league and date.
The following code will initialize as input:
#Intialisation
league_index =[]
final_list = []
j = 0
#config load
config = RawConfigParser()
configFilePath = r'.\config.txt'
config.read(configFilePath)
date = config.get('database_config','date') #input file provided by user - provide in YYYY-MM-DD format
leagues = config.get('database_config','leagues') #input file provided by user - provide in windows format
headless_param =config.get('database_config','headless') #Headless param - set True if you want to see bowser operating in foreground!
leagues_list = leagues.split(',')
print(leagues_list)
After I initialized with the preferred date and league, I will set up for chrome driver as follow:
options = webdriver.ChromeOptions() #initialise webdriver options
#options.binary_location = brave_path #if you are running the script on brave - then enable it
if headless_param == 'True' :
print('headless')
options.headless = True # if headeless parameter is set to true - the chrome browser will not appear in foreground
options.add_argument('start-maximized') # Start the chrome maximised
options.add_argument('disable-infobars') # Disable infobars
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("prefs", {"profile.default_content_setting_values.cookies": 2})
options.add_experimental_option("prefs", {"profile.block_third_party_cookies": True})
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--incognito") #Incognito mode
#intiate the driver
driver = webdriver.Chrome(resource_path('./drivers/chromedriver.exe'),options=options)
#Format the url
url = 'http://www.nowgoal3.com/football/fixture/?f=ft0&date='+date
#get the url
driver.get(url)
#wait for some time
time.sleep(3)
driver.find_element_by_xpath('//*[@id="li_league"]').click()
time.sleep(5)
#click on the -team ranking
driver.find_element_by_xpath('//*[@id="TeamOrderCheck"]').click()
After this, you will be brought to the following page
I also add in the snap shot below
I try to get the data from the table by looping: the code is as follow:
> #Get the leagues name from page htmlSource = driver.page_source
> #Pass the htmlsource into soup soup = bs4.BeautifulSoup(htmlSource,'html.parser')
> #Table table = soup.select('table[id="table_live"]')
> #Rows of table all_rows = table[0].select('tr')
> #loop through each row
for i , row in enumerate(all_rows[2:]) :
> try:
> key_word = row['class'][0]
> print(key_word)
> if 'Leaguestitle' in key_word:#if leagues got changed
> league = row.a.text
> print(row.a.text)
> if row.a.text in leagues_list:
> j =1
> else:
> j =0
> elif j== 1:
> home_team = row.findAll('a')[0].text #home team
> print(home_team)
> away_team = row.findAll('a')[1].text #away team
> match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))
> #match_number
> link = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'
> #link for 3 in 1 odds from the match code
> home_ranking = row.findAll('span')[0].text.strip('[]') #home team ranking
> away_ranking = row.findAll('span')[1].text.strip('[]') #Away team ranking
> final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])
> except KeyError:
> try:
> if row['style']=='display:none':
> continue
> elif j== 1:
> home_team = row.findAll('a')[0].text #home team
> away_team = row.findAll('a')[1].text #away team
> home_ranking = row.findAll('span')[0].text.strip('[]') #home team ranking
> away_ranking = row.findAll('span')[1].text.strip('[]') #Away team ranking
> match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))
> #match_code associated with each match
> link = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'
> #link for 3 in 1 odds from the match code
> final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])
> except KeyError :
> print('KeyError')
>
>
> except IndexError:
> if j== 1:
> home_team = row.findAll('a')[0].text #home team
> away_team = row.findAll('a')[1].text #away team
> home_ranking = row.findAll('span')[0].text.strip('[]') #home team ranking
> away_ranking = row.findAll('span')[1].text.strip('[]') #Away team ranking
> match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))
> #match_code associated with each match
> link = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'
> #link for 3 in 1 odds from the match code
> final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])
> print('IndexError-captured')
>
> print(final_list)#show the final result driver.quit()#close the
> browser
Then I print out the hometeam and the following results
Chelsea adtext-bg QC: MAY88.COM - NHÀ CÁI HỢP PHÁP NA UY - THƯỞNG NẠP 100% - HOÀN TRẢ 100TR - HỖ TRỢ 24/7
Then it threw me an index error as follow:
Traceback (most recent call last):
File "D:/Football matters/Sttratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py", line 124, in <module>
away_team = row.findAll('a')[1].text #away team
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/Football matters/Sttratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py", line 149, in <module>
away_team = row.findAll('a')[1].text #away team
IndexError: list index out of range
league_list = league_list = ["English Premier League", 'Italian Serie A',
'England Championship', 'Spanish La Liga', 'Swedish Allsvenskan', 'USA Major League Soccer','Saudi','Dutch Cup']
#wait for some time
# wait for some time
wait.until(EC.element_to_be_clickable((By.ID, "li_league"))).click()
# click on the -team ranking
wait.until(EC.element_to_be_clickable(
(By.XPATH, "//label[@for='TeamOrderCheck']/span"))).click()
for league in league_list:
try:
nextRow = wait.until(EC.presence_of_element_located(
(By.XPATH, '//tr[.//a[contains(text(),"{}")]]'.format(league))))
id = nextRow.get_attribute("id").split("_")[1]
try:
row = wait.until(EC.presence_of_all_elements_located(
(By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),"{}")]] and following-sibling::tr[@id="tr_{}"] and not(@style="display:none")]'.format(league, int(id)+1))))
print("########The result for {} ########".format(league))
for i in row:
print(i.get_attribute("textContent"))
print("###########Completed##############".format(league))
except:
row = wait.until(EC.presence_of_all_elements_located(
(By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),"{}")]] and not(@style="display:none")]'.format(league))))
print("########The result for {} ########".format(league))
for i in row:
print(i.get_attribute("textContent"))
print("###########Completed##############".format(league))
continue
except:
continue
you can use following and preceeding property , as there is no unique way to identify next following element we have to take id and increment it with 1
这篇关于从 nowgoal 获取表值出现索引错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!