为什么InputStreamReader返回与浏览器不同的内容? [英] Why is InputStreamReader returning different content than browser?

查看:130
本文介绍了为什么InputStreamReader返回与浏览器不同的内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果您在浏览器网址中输入以下内容:

If you enter this in a browser url:

它返回大量数据.但是,如果我尝试使用Input StreamReader捕获该数据,则返回的唯一数据是

It returns a lot of data. But if I try to capture that data with an Input StreamReader, the only data returned is

{"retHTML":","rlist":"}

{"retHTML":"", "rlist":""}

这是程序:

List<Property> scrapePropertyInfo(List<Date> auctionDates) {
    List<Property> properties = new ArrayList<>();
    String urlStr = "https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&AUCTIONDATE=07/16/2019";
    String str = null;
    try {
        URL url = new URL(urlStr);
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
        StringBuilder stringBuilder = new StringBuilder();
        while ((str = in.readLine()) != null) {
            stringBuilder.append(str);
        }
        System.out.println("Url: "+urlStr);
        System.out.println(stringBuilder.toString());
        in.close();
    } catch (MalformedURLException ex) {
        Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
    } catch (IOException ex) {
        Logger.getLogger(CharlotteCtyFL.class.getName()).log(Level.SEVERE, null, ex);
    }
    return properties;
}

有人知道为什么吗?

现在稍微聪明一点 因此,显然,不仅URL,还需要将更多的内容发送到服务器.由于这是动态的ajax数据,仅当您在原始网页上要求很好时才进行填充,因此需要在java中进行模拟.

a little smarter now So apparently more stuff is required to be sent to the server than just the url. Since this is dynamic ajax data being populated only if you ask it nice using the original web page, need to simulate that in java.

我发现了如何在chrome F12调试器控制台中获取该信息.在网络"->"XHR"->预览"下,单击每个项目,直到看到所需的数据.然后右键单击它,然后选择复制"->复制请求标头".

I discovered how to get that info in the chrome F12 debugger console. Under Network->XHR->Preview, click on each item until you see the expected data. Then right-click on it and select Copy->Copy Request Headers.

这是复制的内容:

GET/index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563231065712&bypassPage=1&test=1&_=1563231065712 HTTP/1.1 主持人:charlotte.realforeclose.com 连接:保持活动状态 接受:application/json,text/javascript,/; q = 0.01 X-Requested-With:XMLHttpRequest 用户代理:Mozilla/5.0(Windows NT 10.0; Win64; x64)AppleWebKit/537.36(KHTML,如Gecko)Chrome/75.0.3770.100 Safari/537.36 来源: http://evil.com/ 推荐人: https://charlotte. realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 接受编码:gzip,deflate,br 接受语言:en-US,en; q = 0.9 Cookie:cfid = 6f228aa1-bb7e-4734-92ff-39eabf23ed9b; cftoken = 0; CF_CLIENT_CHARLOTTE_REALFORECLOSE_TC = 1563229207612; AWSELB = E7779D5F1C1F6ABE3513A5C5B6B0C754520B66675A407900314ABAC5333A52E93FD1A8D7401D89BC8D5E8B98059C8AAC5507D12A2C6ED07F7E7CB77311BD7FB09B738DB945; _ga = GA1.2.1823487290.1563231012; _gid = GA1.2.1418453663.1563231012; _gat = 1; _gcl_au = 1.1.273755450.1563231013; __utma = 65865852.1823487290.1563231012.1563231014.1563231014.1; __utmc = 65865852; __utmz = 65865852.1563231014.1.1.utmcsr = realauction.com | utmccn =(referral)| utmcmd = referral | utmcct =/client-sites; __utmt_UA-51657054-1 = 1; __utmb = 65865852.2.10.1563231014; testcookiesenabled =启用; CF_CLIENT_CHARLOTTE_REALFORECLOSE_LV = 1563231067363; CF_CLIENT_CHARLOTTE_REALFORECLOSE_HC = 73

GET /index.cfm?zaction=AUCTION&Zmethod=UPDATE&FNC=LOAD&AREA=W&PageDir=0&doR=1&tx=1563231065712&bypassPage=1&test=1&_=1563231065712 HTTP/1.1 Host: charlotte.realforeclose.com Connection: keep-alive Accept: application/json, text/javascript, /; q=0.01 X-Requested-With: XMLHttpRequest User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36 Origin: http://evil.com/ Referer: https://charlotte.realforeclose.com/index.cfm?zaction=AUCTION&Zmethod=PREVIEW&AUCTIONDATE=07/16/2019 Accept-Encoding: gzip, deflate, br Accept-Language: en-US,en;q=0.9 Cookie: cfid=6f228aa1-bb7e-4734-92ff-39eabf23ed9b; cftoken=0; CF_CLIENT_CHARLOTTE_REALFORECLOSE_TC=1563229207612; AWSELB=E7779D5F1C1F6ABE3513A5C5B6B0C754520B66675A407900314ABAC5333A52E93FD1A8D7401D89BC8D5E8B98059C8AAC5507D12A2C6ED07F7E7CB77311BD7FB09B738DB945; _ga=GA1.2.1823487290.1563231012; _gid=GA1.2.1418453663.1563231012; _gat=1; _gcl_au=1.1.273755450.1563231013; __utma=65865852.1823487290.1563231012.1563231014.1563231014.1; __utmc=65865852; __utmz=65865852.1563231014.1.1.utmcsr=realauction.com|utmccn=(referral)|utmcmd=referral|utmcct=/client-sites; __utmt_UA-51657054-1=1; __utmb=65865852.2.10.1563231014; testcookiesenabled=enabled; CF_CLIENT_CHARLOTTE_REALFORECLOSE_LV=1563231067363; CF_CLIENT_CHARLOTTE_REALFORECLOSE_HC=73

现在如何将其放入Java请求中?我知道如何用JavaScript而不是Java做到这一点.

Now how do I get that into the request from java? I know how to do it in javascript but not java.

推荐答案

实际上,我在浏览器中打开了您的URL并得到

Actually, I opened your URL in the browser and got

{"retHTML":","rlist":"}

{"retHTML":"", "rlist":""}

然后,我编写了与您的代码相似的自己的代码,并且得到了相同的String作为响应.因此,对我来说,浏览器和Java代码获取了相同的信息.但这很容易解释,不一定是这种情况.服务器可以检查并检测发送请求的客户端是否是浏览器,以及发送哪种类型以及从哪个位置发送请求.服务器可以根据这些详细信息发送回定制的响应.

Then I wrote my own code similar to yours and got the same String in response. So for me browser and Java code fetched the same info. But It is easily explainable how it doesn't have to be the case. Server can check and detect whether or not client that sends request is a browser and what kind and from which location request was sent. Based on those details server can send back customized response.

这篇关于为什么InputStreamReader返回与浏览器不同的内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆