从 nowgoal 获取表值出现索引错误 [英] Getting table value from nowgoal has got an index error

查看:28
本文介绍了从 nowgoal 获取表值出现索引错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对抓取很陌生.我收到来自

我尝试通过循环从表中获取数据:代码如下:

<代码>>#从页面获取联赛名称 htmlSource = driver.page_source>#将htmlsource传入soup soup = bs4.BeautifulSoup(htmlSource,'html.parser')>#Table table = soup.select('table[id=table_live"]')>#表格行数 all_rows = table[0].select('tr')>#遍历每一行对于 i , enumerate(all_rows[2:]) 中的行:>尝试:>key_word = row['class'][0]>打印(关键字)>if 'Leaguestitle' in key_word:#if Leagues 改变了>联赛 = row.a.text>打印(row.a.text)>如果 Leagues_list 中的 row.a.text:>j = 1>别的:>j = 0>elif j== 1:>home_team = row.findAll('a')[0].text #home team>打印(home_team)>away_team = row.findAll('a')[1].text #away team>match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))>#match_number>链接 = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'>#link 匹配代码中的 3 合 1 赔率>home_ranking = row.findAll('span')[0].text.strip('[]') #主队排名>away_ranking = row.findAll('span')[1].text.strip('[]') #客队排名>final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])>除了 KeyError:>尝试:>如果 row['style']=='display:none':>继续>elif j== 1:>home_team = row.findAll('a')[0].text #home team>away_team = row.findAll('a')[1].text #away team>home_ranking = row.findAll('span')[0].text.strip('[]') #主队排名>away_ranking = row.findAll('span')[1].text.strip('[]') #客队排名>match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))>#match_code 与每场比赛相关联>链接 = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'>#link 匹配代码中的 3 合 1 赔率>final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])>除了 KeyError :>打印('键错误')>>>除了索引错误:>如果 j== 1:>home_team = row.findAll('a')[0].text #home team>away_team = row.findAll('a')[1].text #away team>home_ranking = row.findAll('span')[0].text.strip('[]') #主队排名>away_ranking = row.findAll('span')[1].text.strip('[]') #客队排名>match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))>#match_code 与每场比赛相关联>链接 = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'>#link 匹配代码中的 3 合 1 赔率>final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])>打印('索引错误捕获')>>print(final_list)#显示最终结果 driver.quit()#关闭>浏览器

然后我打印出主队和以下结果

<块引用>

Chelsea adtext-bg QC: MAY88.COM - NHÀ CÁI HỢP PHÁP NA UY - THƯỞNG NẠP100% - HOÀN TRẢ 100TR - HỖ TRỢ 24/7

然后它给我一个索引错误,如下所示:

回溯(最近一次调用最后一次):文件D:/Football things/Stratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py",第124行,在<模块中away_team = row.findAll('a')[1].text #away teamIndexError:列表索引超出范围在处理上述异常的过程中,又发生了一个异常:回溯(最近一次调用最后一次):文件D:/Football things/Stratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py",第149行,在<module<模块中away_team = row.findAll('a')[1].text #away teamIndexError:列表索引超出范围

解决方案

league_list = League_list = [English Premier League", 'Italian Serie A',英格兰冠军"、西甲联赛"、瑞典Allsvenskan"、美国职业足球大联盟"、沙特"、荷兰杯"]#等待一段时间# 等待一段时间wait.until(EC.element_to_be_clickable((By.ID, "li_league"))).click()# 点击 -team 排名等到(EC.element_to_be_clickable((By.XPATH, "//label[@for='TeamOrderCheck']/span"))).click()对于 League_list 中的联赛:尝试:nextRow = wait.until(EC.presence_of_element_located((by.XPATH, '//tr[.//a[contains(text(),"{}")]]'.format(league))))id = nextRow.get_attribute(id").split(_")[1]尝试:row = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),{}")]] 和后续兄弟::tr[@id=";tr_{}"] 而不是(@style="display:none")]'.format(league, int(id)+1))))print("########The result for {} ########".format(league))对于我排:打印(i.get_attribute(textContent"))打印("###########Completed##############".format(联赛))除了:row = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),"{}")]] 而不是(@style="display:none")]'.format(联赛))))print("########The result for {} ########".format(league))对于我排:打印(i.get_attribute(textContent"))打印("###########Completed##############".format(联赛))继续除了:继续

你可以使用following和preceeding属性,因为没有唯一的方法来识别下一个后续元素,我们必须取id并用1增加它

I am quite new to scraping. I am getting links from nowgoal. Below is how I started navigating to above page. I do not wish to get link for all matches. But I will have an input text file, which is attached here and use the selected league and date.

The following code will initialize as input:

#Intialisation
league_index =[]
final_list = []
j = 0
#config load
config = RawConfigParser()
configFilePath = r'.\config.txt'
config.read(configFilePath)
date = config.get('database_config','date')                     #input file provided by user - provide in YYYY-MM-DD format
leagues = config.get('database_config','leagues')               #input file provided by user - provide in windows format
headless_param =config.get('database_config','headless')        #Headless param - set True if you want to see bowser operating in foreground!
leagues_list = leagues.split(',')
print(leagues_list)

After I initialized with the preferred date and league, I will set up for chrome driver as follow:

options = webdriver.ChromeOptions()         #initialise webdriver options
#options.binary_location = brave_path        #if you are running the script on brave - then enable it
if headless_param == 'True' :
    print('headless')
    options.headless = True                 # if headeless parameter is set to true - the chrome browser will not appear in foreground
options.add_argument('start-maximized')     # Start the chrome maximised 
options.add_argument('disable-infobars')    # Disable infobars
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("prefs", {"profile.default_content_setting_values.cookies": 2})
options.add_experimental_option("prefs", {"profile.block_third_party_cookies": True})
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--incognito")         #Incognito mode


#intiate the driver
driver = webdriver.Chrome(resource_path('./drivers/chromedriver.exe'),options=options) 
#Format the url

url =  'http://www.nowgoal3.com/football/fixture/?f=ft0&date='+date


#get the url
driver.get(url)
#wait for some time
time.sleep(3)

driver.find_element_by_xpath('//*[@id="li_league"]').click()
time.sleep(5)
#click on the -team ranking
driver.find_element_by_xpath('//*[@id="TeamOrderCheck"]').click()

After this, you will be brought to the following page

I also add in the snap shot below

I try to get the data from the table by looping: the code is as follow:

> #Get the leagues name from page htmlSource = driver.page_source
> #Pass the htmlsource into soup soup = bs4.BeautifulSoup(htmlSource,'html.parser')
> #Table table = soup.select('table[id="table_live"]')
> #Rows of table all_rows = table[0].select('tr')
> #loop through each row 
for i , row in enumerate(all_rows[2:]) :
>     try:
>         key_word = row['class'][0]
>         print(key_word)
>         if 'Leaguestitle' in key_word:#if leagues got changed
>             league = row.a.text
>             print(row.a.text)
>             if row.a.text in leagues_list:
>                 j =1
>             else:
>                 j =0                
>         elif j== 1:
>             home_team = row.findAll('a')[0].text                                                #home team
>             print(home_team)
>             away_team = row.findAll('a')[1].text                                                #away team
>             match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))    
> #match_number
>             link  = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'            
> #link for 3 in 1 odds from the match code
>             home_ranking = row.findAll('span')[0].text.strip('[]')                              #home team ranking
>             away_ranking = row.findAll('span')[1].text.strip('[]')                              #Away team ranking
>             final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])
>     except KeyError:
>         try:
>             if row['style']=='display:none':
>                 continue
>             elif j== 1:
>                 home_team = row.findAll('a')[0].text                                            #home team
>                 away_team = row.findAll('a')[1].text                                            #away team
>                 home_ranking = row.findAll('span')[0].text.strip('[]')                          #home team ranking
>                 away_ranking = row.findAll('span')[1].text.strip('[]')                          #Away team ranking
>                 match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))
> #match_code associated with each match
>                 link  = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'            
> #link for 3 in 1 odds from the match code
>                 final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])
>         except KeyError :
>             print('KeyError')
>             
> 
>     except IndexError:
>         if j== 1:
>             home_team = row.findAll('a')[0].text                                            #home team
>             away_team = row.findAll('a')[1].text                                            #away team
>             home_ranking = row.findAll('span')[0].text.strip('[]')                          #home team ranking
>             away_ranking = row.findAll('span')[1].text.strip('[]')                          #Away team ranking
>             match_number = ''.join(filter(str.isdigit,row.findAll('a')[2]['href'].strip()))
> #match_code associated with each match
>             link  = 'http://data.nowgoal.group/3in1odds/'+match_number+'.html'            
> #link for 3 in 1 odds from the match code
>             final_list.append([home_team,home_ranking,away_team,away_ranking,league,match_number,link])
>             print('IndexError-captured')        
> 
> print(final_list)#show the final result driver.quit()#close the
> browser

Then I print out the hometeam and the following results

Chelsea adtext-bg QC: MAY88.COM - NHÀ CÁI HỢP PHÁP NA UY - THƯỞNG NẠP 100% - HOÀN TRẢ 100TR - HỖ TRỢ 24/7

Then it threw me an index error as follow:

Traceback (most recent call last):
  File "D:/Football matters/Sttratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py", line 124, in <module>
    away_team = row.findAll('a')[1].text                                                #away team
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/Football matters/Sttratagem data access/Games By Numbers/Nowgoal scraping project/codes/NOWGOAL-20200721T024808Z-001/NOWGOAL/PYFILES/Link_extractor_v1.3.py", line 149, in <module>
    away_team = row.findAll('a')[1].text                                            #away team
IndexError: list index out of range

解决方案

league_list = league_list = ["English Premier League", 'Italian Serie A',
                             'England Championship', 'Spanish La Liga', 'Swedish Allsvenskan', 'USA Major League Soccer','Saudi','Dutch Cup']
#wait for some time



# wait for some time
wait.until(EC.element_to_be_clickable((By.ID, "li_league"))).click()
# click on the -team ranking
wait.until(EC.element_to_be_clickable(
    (By.XPATH, "//label[@for='TeamOrderCheck']/span"))).click()

for league in league_list:
    try:
        nextRow = wait.until(EC.presence_of_element_located(
            (By.XPATH, '//tr[.//a[contains(text(),"{}")]]'.format(league))))
        id = nextRow.get_attribute("id").split("_")[1]
        try:

            row = wait.until(EC.presence_of_all_elements_located(
                (By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),"{}")]] and following-sibling::tr[@id="tr_{}"] and not(@style="display:none")]'.format(league, int(id)+1))))
            print("########The result for {} ########".format(league))
            for i in row:
                print(i.get_attribute("textContent"))
            print("###########Completed##############".format(league))
        except:
            row = wait.until(EC.presence_of_all_elements_located(
                (By.XPATH, '//tr[preceding-sibling::tr[.//a[contains(text(),"{}")]] and not(@style="display:none")]'.format(league))))
            print("########The result for {} ########".format(league))
            for i in row:
                print(i.get_attribute("textContent"))
            print("###########Completed##############".format(league))
            continue
    except:
        continue

you can use following and preceeding property , as there is no unique way to identify next following element we have to take id and increment it with 1

这篇关于从 nowgoal 获取表值出现索引错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆