如何在C#HttpClient中循环调用分页URL以从JSON结果下载所有页面 [英] How to Loop calls to Pagination URL in C# HttpClient to download all Pages from JSON results

查看:111
本文介绍了如何在C#HttpClient中循环调用分页URL以从JSON结果下载所有页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一个问题,请保持友好...:)

My 1st question, so please be kind... :)

我正在使用 C# HttpClient调用Jobs API端点.

I'm using the C# HttpClient to invoke Jobs API Endpoint.

这是端点:作业API端点(不会不需要密钥,您可以单击它)

这给了我这样的JSON.

This gives me JSON like so.

{
  "count": 1117,
  "firstDocument": 1,
  "lastDocument": 50,
  "nextUrl": "\/api\/rest\/jobsearch\/v1\/simple.json?areacode=&country=&state=&skill=ruby&city=&text=&ip=&diceid=&page=2",
  "resultItemList": [
    {
      "detailUrl": "http:\/\/www.dice.com\/job\/result\/90887031\/918715?src=19",
      "jobTitle": "Sr Security Engineer",
      "company": "Accelon Inc",
      "location": "San Francisco, CA",
      "date": "2017-03-30"
    },
    {
      "detailUrl": "http:\/\/www.dice.com\/job\/result\/cybercod\/BB7-13647094?src=19",
      "jobTitle": "Platform Engineer - Ruby on Rails, AWS",
      "company": "CyberCoders",
      "location": "New York, NY",
      "date": "2017-04-16"
    }
 ]
}

我粘贴了完整的JSON代码段,因此您可以在答案中使用它.完整的结果在这里真的很长.

I've pasted a complete JSON snippet so you can use it in your answer. The full results are really long for here.

这里是C#类.

using Newtonsoft.Json;
using System.Collections.Generic;

namespace MyNameSpace
{
    public class DiceApiJobWrapper
    {
        public int count { get; set; }
        public int firstDocument { get; set; }
        public int lastDocument { get; set; }
        public string nextUrl { get; set; }

        [JsonProperty("resultItemList")]
        public List<DiceApiJob> DiceApiJobs { get; set; }
    }

    public class DiceApiJob
    {
        public string detailUrl { get; set; }
        public string jobTitle { get; set; }
        public string company { get; set; }
        public string location { get; set; }
        public string date { get; set; }
    }
}

当我使用HttpClient调用URL并使用JSON.NET反序列化时,确实可以正确获取数据.

When I invoke the URL using HttpClient and deserialize using JSON.NET, I do get the data back properly.

这是我从控制台应用程序的Main方法调用的代码(因此static列表,我认为这可以更好地重构?)

Here's the code I am calling from my Console App's Main method (hence the static list, I think this could be better refactored??)

   private static List<DiceApiJob> GetDiceJobs()
    {
        HttpClient httpClient = new HttpClient();
        var jobs = new List<DiceApiJob>();

        var task = httpClient.GetAsync("http://service.dice.com/api/rest/jobsearch/v1/simple.json?skill=ruby")
          .ContinueWith((taskwithresponse) =>
          {
              var response = taskwithresponse.Result;
              var jsonString = response.Content.ReadAsStringAsync();
              jsonString.Wait();

              var result =  JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString.Result);
              if (result != null)
              {
                  if (result.DiceApiJobs.Any())
                      jobs = result.DiceApiJobs.ToList();

                  if (result.nextUrl != null)
                  {
                      //
                      // do this GetDiceJobs again in a loop? How?? Any other efficient elegant way??
                  }
              }
          });
        task.Wait();

        return jobs;
    }

但是,现在,如何使用nextUrl字段检查是否还有更多作业?我知道我可以检查它是否不为null,如果不为null,则意味着还有更多的工作要下拉.

But now, how do I check if there are more jobs using the nextUrl field? I know I can check to see if it's not null, and if if not, that means there are more jobs to pull down.

调试和逐步调试的结果

我该如何递归地执行此操作,并且不会挂起并且有一些延迟,因此我不会超出API限制?我想我必须使用TPL(任务并行库),但感到很困惑.

How do I do this recursively, and without hanging and with some delays so I don't cross the API limits? I think I have to use TPL ( Task Parallel Library) but am quite baffled.

谢谢! 〜西恩(Sean)

Thank you! ~Sean

推荐答案

如果您担心应用程序的响应时间,并希望在从API实际获取所有页面/数据之前返回一些结果,则可以运行循环处理,并为它提供一个回调方法以在从API获取每页数据时执行.

If you are concerned about response time of your app and would like to return some results before you actually get all pages/data from the API, you could run your process in a loop and also give it a callback method to execute as it gets each page of data from the API.

以下是示例:

public class Program
{
    public static void Main(string[] args)
    {
        var jobs = GetDiceJobsAsync(Program.ResultCallBack).Result;
        Console.WriteLine($"\nAll {jobs.Count} jobs displayed");
        Console.ReadLine();
    }

    private static async Task<List<DiceApiJob>> GetDiceJobsAsync(Action<DiceApiJobWrapper> callBack = null)
    {
        var jobs = new List<DiceApiJob>();
        HttpClient httpClient = new HttpClient();
        httpClient.BaseAddress = new Uri("http://service.dice.com");
        var nextUrl = "/api/rest/jobsearch/v1/simple.json?skill=ruby";

        do
        {
            await httpClient.GetAsync(nextUrl)
                .ContinueWith(async (jobSearchTask) =>
                {
                    var response = await jobSearchTask;
                    if (response.IsSuccessStatusCode)
                    {
                        string jsonString = await response.Content.ReadAsStringAsync();
                        var result = JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString);
                        if (result != null)
                        {
                            // Build the full list to return later after the loop.
                            if (result.DiceApiJobs.Any())
                                jobs.AddRange(result.DiceApiJobs.ToList());

                            // Run the callback method, passing the current page of data from the API.
                            if (callBack != null)
                                callBack(result);

                            // Get the URL for the next page
                            nextUrl = (result.nextUrl != null) ? result.nextUrl : string.Empty;
                        }
                    }
                    else
                    {
                        // End loop if we get an error response.
                        nextUrl = string.Empty;
                    }
                });                

        } while (!string.IsNullOrEmpty(nextUrl));
        return jobs;
    }


    private static void ResultCallBack(DiceApiJobWrapper jobSearchResult)
    {
        if (jobSearchResult != null && jobSearchResult.count > 0)
        {
            Console.WriteLine($"\nDisplaying jobs {jobSearchResult.firstDocument} to {jobSearchResult.lastDocument}");
            foreach (var job in jobSearchResult.DiceApiJobs)
            {
                Console.WriteLine(job.jobTitle);
                Console.WriteLine(job.company);
            }
        }
    }
}

请注意,上面的示例允许回调方法访问GetDiceJobsAsync方法接收的每一页数据.在这种情况下,控制台将在每个页面可用时显示它.如果您不希望使用回调选项,则只需将任何内容都不传递给GetDiceJobsAsync.

Note that the above sample allows the callback method to access each page of data as it is received by the GetDiceJobsAsync method. In this case, the console, displays each page as it becomes available. If you do not want the callback option, you can simply pass nothing to GetDiceJobsAsync.

但是GetDiceJobsAsync在完成时也会返回所有作业.因此,您可以选择在GetDiceJobsAsync末尾对整个列表进行操作.

But the GetDiceJobsAsync also returns all the jobs when it completes. So you can choose to act on the whole list at the end of GetDiceJobsAsync.

对于达到API限制,您可以在重复循环之前立即在循环中插入一个小的延迟.但是当我尝试它时,我没有遇到限制我的请求的API,因此没有在示例中包含它.

As for reaching API limits, you can insert a small delay within the loop, right before you repeat the loop. But when I tried it, I did not encounter the API limiting my requests so I did not include it in the sample.

这篇关于如何在C#HttpClient中循环调用分页URL以从JSON结果下载所有页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆