使用Python进行Web搜刮-获取属性值时出现问题 [英] Web Scraping with Python - Issue getting attribute value

查看:48
本文介绍了使用Python进行Web搜刮-获取属性值时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Web爬网的新手,我正努力从特定元素中获取两个属性的值

I'm very new to Webscraping and I'm struggling to get the values of two attributes from specific element

我想找到data-diffusion-decimal& data-diffusion-history

I want to find the data-diffusion-decimal & data-diffusion-history

soup.findAll('div',attrs={"class":"RC-runnerPriceWrapper"})

我得到的是:

<div class="RC-runnerPriceWrapper PC-bestOddsContainer js-diffusionHorsesList js-horsesList js-bestOddsPriceContainer" data-diffusion-horsename="Dinons">  <a class="ui-btn RC-runnerPrice ui-priceBtn_noPrice js-diffusionPriceValue js-betHandler js-runnerPrice js-runnerPriceBestOdds" data-test-selector="RC-cardPage-runnerPrice" href="#"></a>

据我所知,结果中没有包含我所需要的.任何建议,不胜感激

This is as far as I get but what I need isn't contained in the result. Any advice greatly appreciated

推荐答案

也许这些属性是在javascript中动态设置的.要知道这一点,请不要使用控制台,而是右键单击页面,然后单击查看页面源代码".

Maybe these attributes are set dynamically in javascript. To know that, do not use the console but right click on the page then 'View page source'.

如果您无法在源代码中找到这些属性,请使用javascript进行设置,并且需要使用 Selenium 执行页面的动态部分.

If you cannot find these attributes in the source code, they are set with javascript and you need a tool like Selenium to execute the dynamic part of the page.

解决方法:使用浏览器控制台的网络"标签,您可以尝试查看是否执行了ajax请求以获取属性中的数据.您可以调用相同的请求,也可以获取json格式的信息,而不用解析页面.

Workaround : using the 'Network' tab of your browser console, you can try to see if an ajax request is executed to get the data in the attributes. Instead of parsing your page, you can call the same request and perhaps get the informations in json format.

这篇关于使用Python进行Web搜刮-获取属性值时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆