Beautifulsoup 和 selenium:单击 svg 路径进入下一页并从该页面获取数据 [英] beautifulsoup and selenium: clicking on an svg path to get to the next page and get data from that page

查看:52
本文介绍了Beautifulsoup 和 selenium:单击 svg 路径进入下一页并从该页面获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在做一个项目,网站上有一个表格,里面填满了数据,表格有 7 页长.这是本网站上的表格:

最后一个打印是该项目的所有内容.这将使您看到结构并帮助您获取数据.它看起来像这样:

<代码>{名称":meebits",总计":{所有时间":{计数":15622,交易者":5023,美元":157251919.08,平均值":10066.06,transfer_count":29826,transfer_unique_assets":19981,asset_unique_owners":4812,asset_usd":95541331.68,资产平均值":10566.39},某一天":{计数":0,交易者":0,美元":0,平均":0,transfer_count":0,transfer_unique_assets":0,asset_unique_owners":0,asset_usd":0,资产平均值":0},两天前":{计数":0,交易者":0,美元":0,平均":0},第七天":{计数":144,交易者":165,美元":703913.21,平均":4888.29,transfer_count":265,transfer_unique_assets":204,asset_unique_owners":125,asset_usd":611620.92,资产平均值":5412.57},第三天":{计数":1663,交易者":1167,美元":12662841.8,平均值":7614.46,transfer_count":2551,transfer_unique_assets":1704,asset_unique_owners":781,asset_usd":9908945.2,资产平均值":9107.49}}}

I'm working on a project where there is a table on a website that is filled with data, and the table is 7 pages long. it is the table on this website: https://nonfungible.com/market/history . You get to the next page through an svg path. I have to get data from all 7 pages. I don't know how to click on this svg path. Please let me know if you know how to click on the path. even though the svg doesn't have an aria-label or a class.

this is a photo of the source code.

I have tried many different things including:

    driver.find_element_by_xpath('//div[@id="icon-chevron-right"]/*[name()="svg"]/*[name()="path"]').click()

this is the error that I am getting: raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@id="icon-chevron-right"]/[name()="svg"]/[name()="path"]"} (Session info: chrome=92.0.4515.107)

Thank you for your help. please help me with this.

解决方案

Slightly different approach than using the GUI - but have a look at the below. Doing it this way offers much more data than the front end shows.

It looks like the /market/history page comes down with the JSON data (it's not a separate call in can identify in dev tools). However - if you:

  1. Get the page with the python requests library
  2. Parse the html and find the json data object which is @id="__NEXT_DATA__"
  3. Get the right part of the json which has the table data
  4. Filter the object to get rid of a few bits and pieces (where name != none)

from lxml import html
import requests
import json

url = "https://nonfungible.com/market/history"

#get the page and parse
response = requests.get(url)
page = html.fromstring(response.content)

#get the data and convert to json
datastring = page.xpath('//script[@id="__NEXT_DATA__"]/text()')
data = json.loads(datastring[0])
#print(json.dumps(data, indent=4)) #this prints everything

#Get the relevant part of the json (it has lots of other cr*p in there - it was effort to find this
tabledata = data['props']['pageProps']['currentTotals']
# this filters out some of the unneeded data
AllItems = list(filter(lambda x: x['name'] !=None, tabledata)) 

#print out each item - which relates to an row in the table 
for item in  AllItems:
    print (item['name'])
    print (item['totals']['alltime']['usd'])
    print (json.dumps(item, indent=4))

What you need to do from here is extract what you want from the json.

I've started you off... The first 2 prints in the loop output this:

meebits

157251919.08

Which match the items on the website:

The last print does this is everything that item has. This will let you see the structure and help you get your data out. It looks like this:

{
    "name": "meebits",
    "totals": {
        "alltime": {
            "count": 15622,
            "traders": 5023,        
            "usd": 157251919.08,    
            "average": 10066.06,    
            "transfer_count": 29826,
            "transfer_unique_assets": 19981,
            "asset_unique_owners": 4812,
            "asset_usd": 95541331.68,
            "asset_average": 10566.39
        },
        "oneday": {
            "count": 0,
            "traders": 0,
            "usd": 0,
            "average": 0,
            "transfer_count": 0,
            "transfer_unique_assets": 0,
            "asset_unique_owners": 0,
            "asset_usd": 0,
            "asset_average": 0
        },
        "twodayago": {
            "count": 0,
            "traders": 0,
            "usd": 0,
            "average": 0
        },
        "sevenday": {
            "count": 144,
            "traders": 165,
            "usd": 703913.21,
            "average": 4888.29,
            "transfer_count": 265,
            "transfer_unique_assets": 204,
            "asset_unique_owners": 125,
            "asset_usd": 611620.92,
            "asset_average": 5412.57
        },
        "thirtyday": {
            "count": 1663,
            "traders": 1167,
            "usd": 12662841.8,
            "average": 7614.46,
            "transfer_count": 2551,
            "transfer_unique_assets": 1704,
            "asset_unique_owners": 781,
            "asset_usd": 9908945.2,
            "asset_average": 9107.49
        }
    }
}

这篇关于Beautifulsoup 和 selenium:单击 svg 路径进入下一页并从该页面获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆