等到最后一个文件下载完毕 [英] Wait till the last file is downloaded

查看:59
本文介绍了等到最后一个文件下载完毕的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用于下载PDF文件的代码.现在,当我执行下一个任务时遇到了一个问题,但是最后一个文件的下载尚未完成.执行完当前代码后,最后一个文件约为650 Mb,应为1300 Mb.此外,由于无法完全下载,因此无法将其打开,这就是为什么损坏了.

I have a code for downloading PDF files. Now I have run into a problem when I am executing next task but download of last file is not yet finished. After execution of my current code last file is something like 650 Mb and it should be 1300 Mb. Also it is not possible to open it as it is not fully downloaded and that's why broken.

该进程无法访问该文件,因为该文件正在被另一个文件使用过程.

The process cannot access the file because it is being used by another process.

如何确保下载文件?

            HtmlDocument htmlDoc = new HtmlWeb().Load("http://example.com/");

            // Thread.Sleep(5000); // wait some time

            HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
            foreach (HtmlNode src in ProductListPage)
            {
                htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);

                // Thread.Sleep(5000); // wait some time

                HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
                if (LinkTester != null)
                {
                    foreach (var dllink in LinkTester)
                    {
                        string LinkURL = dllink.Attributes["href"].Value;
                        Console.WriteLine(LinkURL);

                        string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
                        var DLClient = new WebClient();

                        // Thread.Sleep(5000); // wait some time

                        DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
                    }
                }
            }

我的下一个过程是重命名下载的文件:

My next process is to rename downloaded files:

    var files = Directory.GetFiles(@"C:\temp\", "*.pdf");
    // string prefix = "SomePrefix";
    foreach (var file in files)
    {
        string newFileName = Path.Combine(Path.GetDirectoryName(file), file.Replace("-", " "));
        File.Move(file, newFileName);
    }

重命名可以顺利进行,直到最后一个文件没有完全下载,这就是我遇到错误的地方.

Renaming goes smooth until last file that is not completely downloaded and that's where I am getting an error.

我添加了 Thread.Sleep(5000);//在这两者之间等待一段时间,但这可能不是最好的解决方案,因为当前的等待时间还不够,而且可以根据互联网的连接而改变?

I have added Thread.Sleep(5000); // wait some time between these two, but that's maybe not the best solution as current waiting time is not enough and it can change according to internet connection?

这是完整的代码:

using System;
using System.Net;
using HtmlAgilityPack;
using System.IO;
using System.Threading;


namespace Crawler
{

    class Program
    {
        static void Main(string[] args)
        {

            {
                HtmlDocument htmlDoc = new HtmlWeb().Load("http://example.com");

                // Thread.Sleep(5000); // wait some time

                HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
                foreach (HtmlNode src in ProductListPage)
                {
                    htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);

                    // Thread.Sleep(5000); // wait some time

                    HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
                    if (LinkTester != null)
                    {
                        foreach (var dllink in LinkTester)
                        {
                            string LinkURL = dllink.Attributes["href"].Value;
                            Console.WriteLine(LinkURL);

                            string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
                            var DLClient = new WebClient();

                            // Thread.Sleep(5000); // wait some time

                            DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
                        }
                    }
                }
            }

            Thread.Sleep(5000); // wait some time

            var files = Directory.GetFiles(@"C:\temp\", "*.pdf");
            // string prefix = "SomePrefix";
            foreach (var file in files)
            {
                string newFileName = Path.Combine(Path.GetDirectoryName(file), file.Replace("-", " "));
                File.Move(file, newFileName);
            }


        }


    }

}

推荐答案

您当然不希望使用 WebClient.DownloadFileAsync ,但要使用其较新的继任者

You most certainly do not want to use WebClient.DownloadFileAsync but its newer successor WebClient.DownloadFileTaskAsync. This would be used like this:

await DLClient.DownloadFileTaskAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);

这是一个 async 进程,因此您的调用方法也必须是 async .通过 await (等待)来确保您的程序仅在下载完成(或失败)之后才继续.

This is an async process, so your calling method will need to be async as well. By awaiting it, you make sure that your program continues only after the download is complete (or has failed).

这篇关于等到最后一个文件下载完毕的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆