屏幕抓取问题 [英] screen scraping issues

查看:86
本文介绍了屏幕抓取问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做屏幕报废。目前我在c#中使用web客户端来获取网页的页面源。但问题是我需要按一些按钮才能获得正确的
数据。我不能使用selenium导致selenium使用像firefox这样的web浏览器并且具有可见的最终用户界面。

主要问题是我想在屏幕报废期间隐藏由selenium或任何第三方组件执行的活动来自最终用户。

可以建议我吗?

i want to do screen scrapping. currently i am using web client in c# to get the page source of the web page. but the problem is that i need to press some buttons in order to get proper
data. i cant use the selenium cause selenium use web browser like firefox and having visible interface to the end user.
the major problem is that i want to hide activity performed by selenium or any third party component during screen scrapping from the end user.
can suggest me accordingly ?

推荐答案

既然你提到了Selenium和web客户端,我会继续并假设你不是在谈论屏幕抓取(请注意,该字中只有一个p)。 Selenium是一个可以做到这一点的工具,但显然是一个具有用户界面的工具。既然你没有说明你的最终目标,我真的不能告诉你是否真的需要Selenium。使用 Html Agility Pack [ ^ ]。这是一个免费且很棒的实现,我之前使用过,也有很多会员正在使用它。



如果您正在使用的页面依赖JavaScript来获取任何数据,你可能需要使用隐藏的webbrowser控件在后台完全加载页面,然后在内容正确加载后对内容进行操作。



问候,



- Manfred
Since you mentioned Selenium and web client, I''ll just go on and assume you were not talking about screen scraping (note that there is only one p in that word). Selenium is a tool that will do that, but obviously one with a user interface. Since you have not stated your ultimate goal, I can''t really tell if you really need Selenium. Web/Page scraping can be done quite easily per code with the Html Agility Pack[^]. This is a free and great implementation which I have used myself before and there are also quite a few of our members who are using it.

If the pages you are using rely on JavaScript in order to have any data to be scraped, you''ll probably need to use a hidden webbrowser control to fully load the page in the background and then operate on the content once it has been properly loaded.

Regards,

— Manfred


这篇关于屏幕抓取问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆