在Xamarin中等待带有HtmlAgilityPack的AJAX [英] Await AJAX with HtmlAgilityPack in Xamarin

查看:55
本文介绍了在Xamarin中等待带有HtmlAgilityPack的AJAX的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个似乎以前曾被问过的问题,但有一点不同.我正在尝试从此网站抓取数据,但问题是好像它已加载了AJAX.因此,我的应用程序无法在我正在寻找的HTML中找到ID和类.

I have a question that seems to have been asked before, but is a bit different. I'm trying to scrape data from this website but the problem is that is seems like it's loaded with AJAX. Because of that my application is unable to find the id's and classes in the HTML that I'm looking for.

您可以通过检查元素或查看源来重现此内容.在查看源代码的同时,与查看元素相比,发现的内容要少得多.

You can reproduce this by inspecting an element or viewing the source. Whilst viewing the source I'm seeing a lot less than whilst inspecting an element.

我认为我可以通过按F12进入网络选项卡并选择XHR来查找包含AJAX的文件以加载此html,但是我找不到它.

I thought that I could track down the file that contains the AJAX to load this html by pressing F12, going to the network tab and selecting XHR, but I'm unable to find it.

我的问题是:如何检索此数据或找出什么文件 用来收集数据吗?

My question is: how do I retrieve this data or find out what file is used to collect the data?

我的代码示例(我找不到Timetable_toolbar_elementSelect_popup0):

An example of my code (I'm unable to find the Timetable_toolbar_elementSelect_popup0):

private async Task GetHtmlDocument(string url)
        {
            HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
            //request.Credentials = new LoginCredentials().Credentials;

            try
            {
                WebResponse myResponse = await request.GetResponseAsync();
                HtmlDocument htmlDoc = new HtmlDocument();
                htmlDoc.OptionFixNestedTags = true;
                htmlDoc.Load(myResponse.GetResponseStream());
                var test = htmlDoc.GetElementbyId("Timetable_toolbar_elementSelect_popup0");
            }
            catch (Exception e)
            {
            }
        }

推荐答案

使用Webrequest调用ajax方法的解决方案.

所以我很无聊,并且想出了大部分.下面缺少的是如何通过ID识别Klase.下面的示例将提取klase'1GLD'.我们之所以需要Cookie是为了让请求知道我们正在从哪个学校获取Klase.同样,下面的代码仅返回JSON,而不返回HTML,因为它是我们称为ajax的方法.

Solution where you call the ajax method using a webrequest.

So I got bored and figured most of it out. What is missing below is how to identify the Klase by id. The below example will fetch the klase '1GLD'. The reason why we need cookies is in order for the request to know which school we are fetching the Klase from. Also the below code only returns JSON - and not HTML since it is an ajax method we call.

CookieContainer cookies = new CookieContainer();
try
{
    string webAddr = "https://roosters.windesheim.nl/";
    var httpWebRequest = (HttpWebRequest)WebRequest.Create(webAddr);
    httpWebRequest.ContentType = "application/json; charset=utf-8";
    httpWebRequest.Method = "POST";
    httpWebRequest.CookieContainer = cookies;        
    httpWebRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
    httpWebRequest.Headers.Add("X-Requested-With", "XMLHttpRequest");

    var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
    using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
    {
        cookies.Add(httpWebRequest.CookieContainer.GetCookies(httpWebRequest.RequestUri));
    }
}
catch (WebException ex)
{
    Console.WriteLine(ex.Message);
}

//According to my web debugger the cookie will last until the 10th of December. So need to fix a new cookie until then.
//I noticed the url used unixtimestamps at the end of the url. So we just add the unixtimestamp at the end for each request.
long unixTimeStamp = new DateTimeOffset(DateTime.Now).ToUnixTimeMilliseconds() - 100;

//we are now ready to call the ajax method and get the JSON.
try
{
    string webAddr = "https://roosters.windesheim.nl/WebUntis/Timetable.do?request.preventCache="+unixTimeStamp.ToString();
    var httpWebRequest = (HttpWebRequest)WebRequest.Create(webAddr);
    httpWebRequest.ContentType = "application/x-www-form-urlencoded; charset=utf-8";
    httpWebRequest.Method = "POST";
    httpWebRequest.CookieContainer = cookies;
    httpWebRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
    httpWebRequest.Headers.Add("X-Requested-With", "XMLHttpRequest");

    using (var streamWriter = new StreamWriter(httpWebRequest.GetRequestStream()))
    {
        string json = "ajaxCommand=getWeeklyTimetable&elementType=1&elementId=13090&date=20171126&formatId=7&departmentId=0&filterId=-2";

        //The command below will return a JSON datastructure containing all the klases and their relevant ID.
        //string otherJson = "ajaxCommand=getPageConfig&type=1&filter=-2"


        streamWriter.Write(json);
        streamWriter.Flush();
    }


    var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
    using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
    {
        var responseText = streamReader.ReadToEnd();
        //THE RESULTS GETS PRINTED HERE.
        Console.Write(responseText);
    }
}
catch (WebException ex)
{
    Console.WriteLine(ex.Message);
}

带有Firefox驱动程序的Selenium的其他解决方案.

这是更容易做到的方式.但是还需要一些时间并非所有线程睡眠都是必需的.就像您所要求的那样,这将使HTML可以与istead一起使用.但是我发现在最后一个foreach循环中有必要.

Other solution with Selenium with Firefox driver.

This is way easier to do. but it also takes some time. Not all the thread sleeps are necessary. This will give an HTML to work with isntead just like you requested. But I found it necessary in the last foreach loop.

public static void Main(string[] args)
{
    HtmlDocument doc = new HtmlDocument();
    //According to my web debugger the cookie will last until the 10th of December. So need to fix a new cookie until then.
    //I noticed the url used unixtimestamps at the end of the url. So we just add the unixtimestamp at the end for each request.
    long unixTimeStamp = new DateTimeOffset(DateTime.Now).ToUnixTimeMilliseconds() - 100;
    string webAddr = "https://roosters.windesheim.nl/WebUntis/Timetable.do?request.preventCache="+unixTimeStamp.ToString();
    var ffOptions = new FirefoxOptions();
    ffOptions.BrowserExecutableLocation = @"C:\Program Files (x86)\Mozilla Firefox\firefox.exe";
    ffOptions.LogLevel = FirefoxDriverLogLevel.Default;
    ffOptions.Profile = new FirefoxProfile { AcceptUntrustedCertificates = true };
    var service = FirefoxDriverService.CreateDefaultService();

    var driver = new FirefoxDriver(service, ffOptions, TimeSpan.FromSeconds(120));


    driver.Navigate().GoToUrl(webAddr);


    driver.FindElement(By.XPath("//input[@id='school']")).SendKeys("Windesheim"+Keys.Enter);
    Thread.Sleep(2000);
    driver.FindElement(By.XPath("//span[@id='dijit_PopupMenuBarItem_0_text' and text() ='Lesrooster']")).Click();

    driver.FindElement(By.XPath("//td[@id='dijit_MenuItem_0_text' and text() ='Klassen']")).Click();
    Thread.Sleep(2000);

    driver.FindElement(By.XPath("//div[@id='widget_Timetable_toolbar_elementSelect']//input[@class='dijitReset dijitInputField dijitArrowButtonInner']")).Click();

    //we get all the options for Klase
    doc.LoadHtml(driver.PageSource);
    HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@id='Timetable_toolbar_elementSelect_popup']/div[@item]");
    List<String> options = new List<String>();
    foreach (HtmlNode n in nodes)
    {
        options.Add(n.InnerText);
    }

    foreach(string s in options)
    {
        driver.FindElement(By.XPath("//input[@id='Timetable_toolbar_elementSelect']")).Clear();
        driver.FindElement(By.XPath("//input[@id='Timetable_toolbar_elementSelect']")).SendKeys(s);
        Thread.Sleep(2000);
        driver.FindElement(By.XPath("//body")).SendKeys(Keys.Enter);
        Thread.Sleep(2000);
        doc.LoadHtml(driver.PageSource);
        //Console.WriteLine(driver.Url); //Now we can see the id of the current Klase
    }

    Console.WriteLine(doc.DocumentNode.InnerHtml);

    Console.ReadKey();
}

最新更新

使用Selenium解决方案,我能够获得所有课程的ID.我已经在文件此处添加了文件,因此您可以将其与ajax和Web请求一起使用.

Last update

Using the Selenium solution I was able to get the ID's for all courses. I have included the file here so you can use it with your ajax and web requests.

这篇关于在Xamarin中等待带有HtmlAgilityPack的AJAX的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆