首页
其他开发
Web抓取Oracle(ATG)Commerce

Web抓取Oracle(ATG)Commerce [英] Web scraping Oracle (ATG) Commerce

查看：125 发布时间：2020/7/24 21:32:41 web-scraping oracle-commerce

本文介绍了Web抓取Oracle(ATG)Commerce的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我不熟悉Web抓取，并且使用以下工具和方法进行抓取:

I am new to web scraping, and I use the following tool and method to scrap:

我使用R(带有Curl，XML等程序包)来读取网页(带有url链接)，并使用htmlTreeParse函数来解析html页面.
然后，为了知道要获取的数据，我首先使用Chrome的开发人员工具来检测代码.
当我知道数据在哪个节点时，我使用xpathApply来获取它们.

通常，它运作良好.但是我对此网站有疑问: http://www.sephora.fr/Parfum/Parfum-Femme/C309/2

Usually, it works well. But I had an issue with this site: http://www.sephora.fr/Parfum/Parfum-Femme/C309/2

单击链接时，您将加载页面，实际上它是(产品的)页面1.
您必须再次加载URL(通过第二次输入URL)才能获得第2页.
当我使用常规过程读取数据时. htmlTreeParse函数始终为我提供page1.

我试图进一步了解该网站:

I tried to understand more this web site:

它似乎是基于Oracle Commerce(ATG Commerce)构建的.
真实" URL被隐藏，当您单击过滤器(例如，选择一个品牌)时，您将获得带有requestid的URL:
这无助于我选择哪个选项.

This doesn't help to know which selection I made.

能请你帮忙吗?
- 如何获得更多产品?
谢谢

推荐答案

我找到了解决方案:硒！我认为这是网络抓取的终极工具.我发布了几个有关网页抓取的问题，现在有了硒，几乎一切皆有可能.

I found the solution: selenium ! I think that it is the ultimate tool for web scraping. I posted several questions concerning web scraping, now with rselenium, almost everything is possible.

这篇关于Web抓取Oracle(ATG)Commerce的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

相关文章

登录关闭

扫码关注1秒登录

发送“验证码”获取 | 15天全站免登陆