使用Python BeautifulSoup对NSE期权价格进行网络爬取，以解决编码校正问题 [英] Webscraping NSE Options prices using Python BeautifulSoup, regarding encoding correction

查看：40 发布时间：2021/4/15 19:08:49 python beautifulsoup character-encoding

本文介绍了使用Python BeautifulSoup对NSE期权价格进行网络爬取，以解决编码校正问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有:

为整个FnO世界实现了完全自动化的分钟级数据收集.
自动适应不断变化的FnO世界，退出和新条目.
在非市场营业时间关闭.
关闭假期，包括新宣布的假期.
在年度Muhurat交易数据期间自动启动.

我对网页抓取有点陌生，不习惯使用'tr'&"td"的东西，因此这个疑问.我正在尝试从该线程'https://www.quantinsti.com/blog/option-chain-extraction-for-nse-stocks-using-python'复制Python 3中的Python 2.7代码.

I am a bit new to web scraping and not used to 'tr' & 'td' stuff and thus this doubt. I am trying to replicate this Python 2.7 code in my Python 3 from this thread 'https://www.quantinsti.com/blog/option-chain-extraction-for-nse-stocks-using-python'.

此旧代码使用.ix进行索引，我可以轻松使用.iloc进行更正.但是，行< tr = tr.replace('，'，'')>显示错误'即使我在< tr = utf_string.encode('utf8')>之前写它，也需要一个类似字节的对象，而不是'str'.

This old code uses .ix for indexing which I can correct using .iloc easily. However, the line <tr = tr.replace(',' , '')> show up error 'a bytes-like object is required, not 'str'' even if I write it before <tr = utf_string.encode('utf8')>.

我已经检查了其他

I have checked this other link from stackoverflow and couldn't solve my problem

我想我已经发现了为什么会这样.这是因为先前用于定义变量tr的for循环.如果我省略了这一行，那么我将获得一个带有数字的DataFrame以及一些附加的文本.我可以使用循环遍历整个DataFrame对此进行过滤，但是更好的方法必须是正确使用replace()函数.我不明白这一点.

I think I have spotted why this is happening. It's because of the previous for loop used previously to define variable tr. If I omit this line, then I get a DataFrame with the numbers with some attached text. I can filter this with a loop over the entire DataFrame, but a better way must be by properly using the replace() function. I can't figure this bit out.

这是我的完整代码.我已经在一行中专门使用##########################标记了我引用的代码的关键部分，以便可以找到该行.快速(甚至通过Ctrl + F键):

Here is my full code. I have marked the critical sections of the code I have referred using ######################### exclusively in a line so that the line can be found out quickly (even by Ctrl + F):

import requests
import pandas as pd
from bs4 import BeautifulSoup

Base_url = ("https://nseindia.com/live_market/dynaContent/"+
        "live_watch/option_chain/optionKeys.jsp?symbolCode=2772&symbol=UBL&"+
        "symbol=UBL&instrument=OPTSTK&date=-&segmentLink=17&segmentLink=17")

page = requests.get(Base_url)
#page.status_code
#page.content

soup = BeautifulSoup(page.content, 'html.parser')
#print(soup.prettify())

table_it = soup.find_all(class_="opttbldata")
table_cls_1 = soup.find_all(id = "octable")

col_list = []

# Pulling heading out of the Option Chain Table

#########################
for mytable in table_cls_1:
    table_head = mytable.find('thead')

    try:
        rows = table_head.find_all('tr')
        for tr in rows:
            cols = tr.find_all('th')
            for th in cols:
                er = th.text
                #########################
                ee = er.encode('utf8')
                col_list.append(ee)
    except:
        print('no thread')

col_list_fnl = [e for e in col_list if e not in ('CALLS', 'PUTS', 'Chart', '\xc2\xa0')]
#print(col_list_fnl)

table_cls_2 = soup.find(id = "octable")
all_trs = table_cls_2.find_all('tr')
req_row = table_cls_2.find_all('tr')

new_table = pd.DataFrame(index=range(0,len(req_row)-3),columns = col_list_fnl)

row_marker = 0

for row_number, tr_nos in enumerate(req_row):
    if row_number <= 1 or row_number == len(req_row)-1:
        continue # To insure we only choose non empty rows
    
    td_columns = tr_nos.find_all('td')

    # Removing the graph column
    select_cols = td_columns[1:22]
    cols_horizontal = range(0,len(select_cols))

    for nu, column in enumerate(select_cols):
    
        utf_string = column.get_text()
        utf_string = utf_string.strip('\n\r\t": ')
        #########################
        tr = tr.replace(',' , '') # Commenting this out makes code partially work, getting numbers + text attached to the numbers in the table

        # That is obtained by commenting out the above line with tr variable & running the entire code.
        tr = utf_string.encode('utf8')
    
        new_table.iloc[row_marker,[nu]] = tr
            
    row_marker += 1

print(new_table)

使用Python BeautifulSoup对NSE期权价格进行网络爬取，以解决编码校正问题 [英] Webscraping NSE Options prices using Python BeautifulSoup, regarding encoding correction

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python BeautifulSoup对NSE期权价格进行网络爬取，以解决编码校正问题 [英] Webscraping NSE Options prices using Python BeautifulSoup, regarding encoding correction

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭