Beautifulsoup 和 selenium:单击 svg 路径进入下一页并从该页面获取数据 [英] beautifulsoup and selenium: clicking on an svg path to get to the next page and get data from that page
问题描述
我正在做一个项目,网站上有一个表格,里面填满了数据,表格有 7 页长.这是本网站上的表格:
最后一个打印是该项目的所有内容.这将使您看到结构并帮助您获取数据.它看起来像这样:
<代码>{名称":meebits",总计":{所有时间":{计数":15622,交易者":5023,美元":157251919.08,平均值":10066.06,transfer_count":29826,transfer_unique_assets":19981,asset_unique_owners":4812,asset_usd":95541331.68,资产平均值":10566.39},某一天":{计数":0,交易者":0,美元":0,平均":0,transfer_count":0,transfer_unique_assets":0,asset_unique_owners":0,asset_usd":0,资产平均值":0},两天前":{计数":0,交易者":0,美元":0,平均":0},第七天":{计数":144,交易者":165,美元":703913.21,平均":4888.29,transfer_count":265,transfer_unique_assets":204,asset_unique_owners":125,asset_usd":611620.92,资产平均值":5412.57},第三天":{计数":1663,交易者":1167,美元":12662841.8,平均值":7614.46,transfer_count":2551,transfer_unique_assets":1704,asset_unique_owners":781,asset_usd":9908945.2,资产平均值":9107.49}}}
I'm working on a project where there is a table on a website that is filled with data, and the table is 7 pages long. it is the table on this website: https://nonfungible.com/market/history . You get to the next page through an svg path. I have to get data from all 7 pages. I don't know how to click on this svg path. Please let me know if you know how to click on the path. even though the svg doesn't have an aria-label or a class.
this is a photo of the source code.
I have tried many different things including:
driver.find_element_by_xpath('//div[@id="icon-chevron-right"]/*[name()="svg"]/*[name()="path"]').click()
this is the error that I am getting: raise exception_class(message, screen, stacktrace) selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@id="icon-chevron-right"]/[name()="svg"]/[name()="path"]"} (Session info: chrome=92.0.4515.107)
Thank you for your help. please help me with this.
Slightly different approach than using the GUI - but have a look at the below. Doing it this way offers much more data than the front end shows.
It looks like the /market/history
page comes down with the JSON data (it's not a separate call in can identify in dev tools).
However - if you:
- Get the page with the python
requests
library - Parse the html and find the json data object which is
@id="__NEXT_DATA__"
- Get the right part of the json which has the table data
- Filter the object to get rid of a few bits and pieces (where
name != none
)
from lxml import html
import requests
import json
url = "https://nonfungible.com/market/history"
#get the page and parse
response = requests.get(url)
page = html.fromstring(response.content)
#get the data and convert to json
datastring = page.xpath('//script[@id="__NEXT_DATA__"]/text()')
data = json.loads(datastring[0])
#print(json.dumps(data, indent=4)) #this prints everything
#Get the relevant part of the json (it has lots of other cr*p in there - it was effort to find this
tabledata = data['props']['pageProps']['currentTotals']
# this filters out some of the unneeded data
AllItems = list(filter(lambda x: x['name'] !=None, tabledata))
#print out each item - which relates to an row in the table
for item in AllItems:
print (item['name'])
print (item['totals']['alltime']['usd'])
print (json.dumps(item, indent=4))
What you need to do from here is extract what you want from the json.
I've started you off... The first 2 prints in the loop output this:
meebits
157251919.08
Which match the items on the website:
The last print does this is everything that item has. This will let you see the structure and help you get your data out. It looks like this:
{
"name": "meebits",
"totals": {
"alltime": {
"count": 15622,
"traders": 5023,
"usd": 157251919.08,
"average": 10066.06,
"transfer_count": 29826,
"transfer_unique_assets": 19981,
"asset_unique_owners": 4812,
"asset_usd": 95541331.68,
"asset_average": 10566.39
},
"oneday": {
"count": 0,
"traders": 0,
"usd": 0,
"average": 0,
"transfer_count": 0,
"transfer_unique_assets": 0,
"asset_unique_owners": 0,
"asset_usd": 0,
"asset_average": 0
},
"twodayago": {
"count": 0,
"traders": 0,
"usd": 0,
"average": 0
},
"sevenday": {
"count": 144,
"traders": 165,
"usd": 703913.21,
"average": 4888.29,
"transfer_count": 265,
"transfer_unique_assets": 204,
"asset_unique_owners": 125,
"asset_usd": 611620.92,
"asset_average": 5412.57
},
"thirtyday": {
"count": 1663,
"traders": 1167,
"usd": 12662841.8,
"average": 7614.46,
"transfer_count": 2551,
"transfer_unique_assets": 1704,
"asset_unique_owners": 781,
"asset_usd": 9908945.2,
"asset_average": 9107.49
}
}
}
这篇关于Beautifulsoup 和 selenium:单击 svg 路径进入下一页并从该页面获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!