在 chartink.com 上抓取网页 [英] web scraping of webpage on chartink.com

查看:20
本文介绍了在 chartink.com 上抓取网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请帮我抓取这个链接.链接 - https://chartink.com/screener/time-pass-48我正在尝试进行网络抓取,但没有显示我想要的表格.请帮助我.

Please help me to scrape this link. link - https://chartink.com/screener/time-pass-48 I am trying to web scrape but it is not showing the table which I want. please help me the same.

我已经尝试过这段代码,但它没有给我想要的结果.

I have tried this code, but it is not giving me the desired result.

import requests
from bs4 import BeautifulSoup

URL = 'https://chartink.com/screener/time-pass-48'
page = requests.get(URL)
print(page)

soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

推荐答案

数据确实来自 POST 请求.您不需要允许 JavaScript 运行.您只需要获取一个 cookie(ci_session - 可以使用 Session 对象来保存初始登录页面请求中的 cookie,以便在随后的 POST 中传递)和一个令牌(X-CSRF-TOKEN - 可以从初始请求响应中的 meta 标签中提取):

Data indeed comes from a POST request. You don't need to allow JavaScript to run. You simply need to pick up one cookie (ci_session - which can be done using Session object to hold cookies from initial landing page request to pass on with subsequent POST), and one token (X-CSRF-TOKEN - which can be pulled from a meta tag in the initial request response):

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

data = {
  'scan_clause': '( {cash} ( monthly rsi( 14 ) > 60 and weekly rsi( 14 ) > 60 and latest rsi( 14 ) > 60 and 1 day ago  rsi( 14 ) <= 60 and latest volume > 100000 ) ) '
}

with requests.Session() as s:
    r = s.get('https://chartink.com/screener/time-pass-48')
    soup = bs(r.content, 'lxml')
    s.headers['X-CSRF-TOKEN'] = soup.select_one('[name=csrf-token]')['content']
    r = s.post('https://chartink.com/screener/process', data=data).json()
    #print(r.json())
    df = pd.DataFrame(r['data'])
    print(df)

这篇关于在 chartink.com 上抓取网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆