等到最后一个文件下载完毕 [英] Wait till the last file is downloaded

查看：59 发布时间：2021/5/15 18:36:37 c# web-scraping web-crawler html-agility-pack

本文介绍了等到最后一个文件下载完毕的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个用于下载PDF文件的代码.现在，当我执行下一个任务时遇到了一个问题，但是最后一个文件的下载尚未完成.执行完当前代码后，最后一个文件约为650 Mb，应为1300 Mb.此外，由于无法完全下载，因此无法将其打开，这就是为什么损坏了.

I have a code for downloading PDF files. Now I have run into a problem when I am executing next task but download of last file is not yet finished. After execution of my current code last file is something like 650 Mb and it should be 1300 Mb. Also it is not possible to open it as it is not fully downloaded and that's why broken.

该进程无法访问该文件，因为该文件正在被另一个文件使用过程.

The process cannot access the file because it is being used by another process.

如何确保下载文件?

            HtmlDocument htmlDoc = new HtmlWeb().Load("http://example.com/");

            // Thread.Sleep(5000); // wait some time

            HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
            foreach (HtmlNode src in ProductListPage)
            {
                htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);

                // Thread.Sleep(5000); // wait some time

                HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
                if (LinkTester != null)
                {
                    foreach (var dllink in LinkTester)
                    {
                        string LinkURL = dllink.Attributes["href"].Value;
                        Console.WriteLine(LinkURL);

                        string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
                        var DLClient = new WebClient();

                        // Thread.Sleep(5000); // wait some time

                        DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
                    }
                }
            }

我的下一个过程是重命名下载的文件:

My next process is to rename downloaded files:

    var files = Directory.GetFiles(@"C:\temp\", "*.pdf");
    // string prefix = "SomePrefix";
    foreach (var file in files)
    {
        string newFileName = Path.Combine(Path.GetDirectoryName(file), file.Replace("-", " "));
        File.Move(file, newFileName);
    }

重命名可以顺利进行，直到最后一个文件没有完全下载，这就是我遇到错误的地方.

Renaming goes smooth until last file that is not completely downloaded and that's where I am getting an error.

我添加了 Thread.Sleep(5000);//在这两者之间等待一段时间，但这可能不是最好的解决方案，因为当前的等待时间还不够，而且可以根据互联网的连接而改变?

I have added Thread.Sleep(5000); // wait some time between these two, but that's maybe not the best solution as current waiting time is not enough and it can change according to internet connection?

这是完整的代码:

using System;
using System.Net;
using HtmlAgilityPack;
using System.IO;
using System.Threading;


namespace Crawler
{

    class Program
    {
        static void Main(string[] args)
        {

            {
                HtmlDocument htmlDoc = new HtmlWeb().Load("http://example.com");

                // Thread.Sleep(5000); // wait some time

                HtmlNodeCollection ProductListPage = htmlDoc.DocumentNode.SelectNodes("//div[@class='productContain padb6']//div[@class='large-4 medium-4 columns']/a");
                foreach (HtmlNode src in ProductListPage)
                {
                    htmlDoc = new HtmlWeb().Load(src.Attributes["href"].Value);

                    // Thread.Sleep(5000); // wait some time

                    HtmlNodeCollection LinkTester = htmlDoc.DocumentNode.SelectNodes("//div[@class='row padt6 padb4']//a");
                    if (LinkTester != null)
                    {
                        foreach (var dllink in LinkTester)
                        {
                            string LinkURL = dllink.Attributes["href"].Value;
                            Console.WriteLine(LinkURL);

                            string ExtractFilename = LinkURL.Substring(LinkURL.LastIndexOf("/"));
                            var DLClient = new WebClient();

                            // Thread.Sleep(5000); // wait some time

                            DLClient.DownloadFileAsync(new Uri(LinkURL), @"C:\temp\" + ExtractFilename);
                        }
                    }
                }
            }

            Thread.Sleep(5000); // wait some time

            var files = Directory.GetFiles(@"C:\temp\", "*.pdf");
            // string prefix = "SomePrefix";
            foreach (var file in files)
            {
                string newFileName = Path.Combine(Path.GetDirectoryName(file), file.Replace("-", " "));
                File.Move(file, newFileName);
            }


        }


    }

}

等到最后一个文件下载完毕 [英] Wait till the last file is downloaded

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

等到最后一个文件下载完毕 [英] Wait till the last file is downloaded

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭