如何加快Word Interop的处理速度? [英] How to speed up Word Interop processing?
问题描述
我是C#的新手,并且编写了相当笨拙的代码.我一直在网上做很多课程,很多人说有几种方法可以解决问题.现在,我已经制作了一个程序,该程序将加载一个.Doc Word文件,然后使用if语句搜索相关信息.
I am Very new at C# and have written a fairly clunky code. I have been doing a lot of courses online and a lot say that there are several ways to approach problems. Now i have made a program that will Load up a .Doc Word file and then search for the relevant information using if statements.
现在,我的解决方案存在的问题是该程序永远需要!!!我说的是30分钟-1小时才能完成以下代码.
Now my problem with my solution is that this program takes FOREVER!!! I am talking about 30Mins - 1Hour to complete the following code.
关于如何使我的小程序变得不那么笨拙的任何想法?我希望解决此问题的方法能大大增加我的知识,所以在此先感谢大家!
Any ideas of how to make my little program a little less clunky? I hope that solutions to this will increase my knowledge substantially so thanks in advance everyone!
致谢克里斯
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WindowsFormsApplication3
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
public int id = 0;
public int[] iD = new int[100];
public string[] timeOn = new string[100];
public string[] timeOff = new string[100];
public string[] dutyNo = new string[100];
public string[] day = new string[100];
private void button1_Click(object sender, EventArgs e)
{
Microsoft.Office.Interop.Word.Application application = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document document = application.Documents.Open("c:\\Users\\Alien\\Desktop\\TESTJOBS.doc");
//the following for will loop for all words
int count = document.Words.Count;
for (int i = 1; i <= count; i++)
{
// the following if statement will look for the first word that is On
// this is then (on the file) proceded by 04:00 (thus i+2/3/4 respectively)
if (document.Words[i].Text == "On")
{
iD[id] = id;
// Console.WriteLine("ID Number ={0}", iD[id]);
dutyNo[id] = document.Words[i - 14].Text;
// Console.WriteLine("duty No set to:{0}", dutyNo[id]);
timeOn[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
// Console.WriteLine("on time set to:{0}", timeOn[id]);
// the following if (runs if the last word was not "On" and then searches for the word "Off" which procedes "On" in the file format)
// this is then (on the file) proceded by 04:00 (thus i+2/3/4 respectively)
}
else if (document.Words[i].Text == "Off")
{
timeOff[id] = document.Words[i + 2].Text + document.Words[i + 3].Text + document.Words[i + 4].Text;
//Console.WriteLine("off time set to:{0}", timeOff[id]);
// the following if (runs if the last word was not "Off" and then searches for the word "Duty" which procedes "Off" in the file format)
// this is then (on the file) proceded by 04:00 (thus i+2/3/4 respectively)
}
else if (document.Words[i].Text == "Days" && !(document.Words[i + 3].Text == "Type"))
{
day[id] = document.Words[i + 2].Text;
//Console.WriteLine("day set to:{0}", day[id]);
//we then print the whole new duty out to ListBox1
listBox1.Items.Add(string.Format("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]));
id++;
}
}
for (int i = 1; i <= 99; i++)
{
Console.WriteLine("new duty ID:{0} Time on:{1} Time off:{2} Duty No:{3} Day:{4}", iD[id], timeOn[id], timeOff[id], dutyNo[id], day[id]);
}
}
}
}
推荐答案
Office Interop是相当慢.
Office Interop is fairly slow.
Openxml may have been faster, but the file is .doc, so it probably won't be able to handle it.
但是就像此问题中的Excel 可以提高性能-请勿访问按索引范围 ,因为AFAIK会导致创建单独的 RCW ,它是应用程序中性能瓶颈的主要候选对象.
But just like with Excel in this question there is a way you can improve the performance - do not access each word in a Range
by index, because AFAIK it causes creation of a separate Range
instance wrapped in RCW, and that is primary candidate for a performance bottleneck in your application.
这意味着提高性能的最佳选择是将所有单词( .Text
)加载到 String
的可索引集合中s 在实际处理之前,然后才使用该集合创建输出.
That means that your best bet to improve the performance is to load all the words (.Text
) into some indexable collection of String
s before the actual processing, and only then use that collection to create the output.
如何以最快的方式做到这一点?我不确定,但是您可以尝试从 _Document.Words
中获取所有单词,或者枚举器(虽然它可能会或可能不会更高效,但是至少您将能够看到检索所需单词所花费的时间):
How to do it in the fastest way? I am not exactly sure, but you can try either getting all the words from _Document.Words
enumerator (though it may or may not be more performant, but at least you will be able to see how long it takes to just retrieve the required words):
var words = document
.Cast<Range>()
.Select(r =>
r.Text)
.ToList();
,或者您可以尝试使用 _Document.Content
范围 Text
,尽管您随后必须自己分隔单个单词.
or you may try to use _Document.Content
range Text
, though you would then have to separate individual words by yourself.
这篇关于如何加快Word Interop的处理速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!