如何加快Word Interop的处理速度? [英] How to speed up Word Interop processing?

查看:41
本文介绍了如何加快Word Interop的处理速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是C#的新手,并且编写了相当笨拙的代码.我一直在网上做很多课程,很多人说有几种方法可以解决问题.现在,我已经制作了一个程序,该程序将加载一个.Doc Word文件,然后使用if语句搜索相关信息.

I am Very new at C# and have written a fairly clunky code. I have been doing a lot of courses online and a lot say that there are several ways to approach problems. Now i have made a program that will Load up a .Doc Word file and then search for the relevant information using if statements.

现在,我的解决方案存在的问题是该程序永远需要!!!我说的是30分钟-1小时才能完成以下代码.

Now my problem with my solution is that this program takes FOREVER!!! I am talking about 30Mins - 1Hour to complete the following code.

关于如何使我的小程序变得不那么笨拙的任何想法?我希望解决此问题的方法能大大增加我的知识,所以在此先感谢大家!

Any ideas of how to make my little program a little less clunky? I hope that solutions to this will increase my knowledge substantially so thanks in advance everyone!

致谢克里斯

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace WindowsFormsApplication3
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
        public int id = 0;
        public int[] iD = new int[100];
        public string[] timeOn = new string[100];
        public string[] timeOff = new string[100];
        public string[] dutyNo = new string[100];
        public string[] day = new string[100];

        private void button1_Click(object sender, EventArgs e)
        {



            Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
            Microsoft.Office.Interop.Word.Document document = application.Documents.Open("c:\\Users\\Alien\\Desktop\\TESTJOBS.doc");
            //the following for will loop for all words

            int count = document.Words.Count;
            for (int i = 1; i <= count; i++)
            {
                // the following if statement will look for the first word that is On
                // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                if (document.Words[i].Text == "On")
                {
                    iD[id] = id;
                   // Console.WriteLine("ID Number ={0}", iD[id]);
                    dutyNo[id] = document.Words[i - 14].Text;
                   // Console.WriteLine("duty No set to:{0}", dutyNo[id]);
                    timeOn[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
                   // Console.WriteLine("on time set to:{0}", timeOn[id]);
                    // the following if (runs if the last word was not "On" and then searches for the word "Off" which procedes "On" in the file format)
                    // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                }
                else if (document.Words[i].Text == "Off")
                {
                    timeOff[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
                    //Console.WriteLine("off time set to:{0}", timeOff[id]);
                    // the following if (runs if the last word was not "Off" and then searches for the word "Duty" which procedes "Off" in the file format)
                    // this is then (on the file) proceded by  04:00 (thus i+2/3/4 respectively)
                }
                else if (document.Words[i].Text == "Days" && !(document.Words[i + 3].Text == "Type"))
                {

                    day[id] = document.Words[i + 2].Text;
                    //Console.WriteLine("day set to:{0}", day[id]);
                    //we then print the whole new duty out to ListBox1
                    listBox1.Items.Add(string.Format("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]));
                    id++;
                }


            }

            for (int i = 1; i <= 99; i++)
            {
                Console.WriteLine("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]);
            }


        }
    }
}

推荐答案

Office Interop是相当慢.

Office Interop is fairly slow.

Openxml可能已经

Openxml may have been faster, but the file is .doc, so it probably won't be able to handle it.

但是就像此问题中的Excel 可以提高性能-请勿访问按索引范围 ,因为AFAIK会导致创建单独的 RCW ,它是应用程序中性能瓶颈的主要候选对象.

But just like with Excel in this question there is a way you can improve the performance - do not access each word in a Range by index, because AFAIK it causes creation of a separate Range instance wrapped in RCW, and that is primary candidate for a performance bottleneck in your application.

这意味着提高性能的最佳选择是将所有单词( .Text )加载 String 的可索引集合中s 在实际处理之前,然后才使用该集合创建输出.

That means that your best bet to improve the performance is to load all the words (.Text) into some indexable collection of Strings before the actual processing, and only then use that collection to create the output.

如何以最快的方式做到这一点?我不确定,但是您可以尝试从 _Document.Words 中获取所有单词,或者枚举器(虽然它可能会或可能不会更高效,但是至少您将能够看到检索所需单词所花费的时间):

How to do it in the fastest way? I am not exactly sure, but you can try either getting all the words from _Document.Words enumerator (though it may or may not be more performant, but at least you will be able to see how long it takes to just retrieve the required words):

var words = document
    .Cast<Range>()
    .Select(r => 
        r.Text)
    .ToList();

,或者您可以尝试使用 _Document.Content 范围 Text ,尽管您随后必须自己分隔单个单词.

or you may try to use _Document.Content range Text, though you would then have to separate individual words by yourself.

这篇关于如何加快Word Interop的处理速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆