如何通过C#读取访问数据库中"OLE对象"字段中存储的Word文档时删除垃圾字符? [英] How to remove junk characters while reading a word document stored in 'OLE Object' field in an access database through C#?

查看：149 发布时间：2020/5/12 22:32:19 ms-access ms-access-2007

本文介绍了如何通过C#读取访问数据库中"OLE对象"字段中存储的Word文档时删除垃圾字符?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在通过C#访问Ms Access数据库.我能够阅读所有字段.我遇到的问题是，在读取表的OLE Object字段中存储的.txt和.doc文件时，在诸如-.

I am accessing an Ms Access database through C#. I am able to read all the fields. The problem that I am getting is, while reading .txt and .doc files that are stored in OLE Object field of the table, a lot of extra junk characters are also getting read before and after the actual text like- ÿÿÿÿ‡€ ÿÿÿÿÿÿÿÿˆ ÿÿÿÿÿÿÿÿ€ ˆˆˆˆˆˆˆˆ€ ÿÿÿÿÿÿÿÿþ i 8 @ñÿ 8 N o r m a l CJ _H aJ mH sH tH < A@òÿ¡ < D e f a u l t P a r a g r a p h F o n t … ÿÿÿÿ ( f p ³ ú ÿ A Ä M • À ' n î 0 q Œ Ï.

我的C#代码类似于- `

My C# code is like- `

/*Read from the query and write in a temporary file*/
var oleBytes = (Byte[])Cmd.ExecuteScalar();
MemoryStream ms = new MemoryStream();
ms.Write(oleBytes, 0, oleBytes.Length - 0);
var file = Path.GetTempFileName();
using (var fileStream = File.OpenWrite(file))
 {
    var buffer = ms.GetBuffer();
    fileStream.Write(buffer, 0, (int)ms.Length);
 }

然后像单词文档一样读取此临时文件- `

Then read this temporary file like a word document- `

Microsoft.Office.Interop.Word.ApplicationClass wordObject = new ApplicationClass();
object fpath = file; //this is the path
object nullobject = System.Reflection.Missing.Value;
Microsoft.Office.Interop.Word.Document docs = wordObject.Documents.Open
(ref fpath, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject,
ref nullobject, ref nullobject, ref nullobject, ref nullobject);

docs.ActiveWindow.Selection.WholeStory();

docs.ActiveWindow.Selection.Copy();

IDataObject iData = Clipboard.GetDataObject();

if (iData != null)
  data = iData.GetData(DataFormats.Text).ToString();

不知道出了什么问题?我是否也在从表中读取字段元数据?如果是这样，如何避免呢?读取存储除图像以外的文件的OLE Object字段的有效方法是什么?

Don't know what is going wrong? Am I reading the fields metadata also from the table? If so how to avoid it? What would be the efficient way to read OLE Object field that stores files other than images?

推荐答案

我找到了Word文档(.doc文件)的解决方案. Ms Access中的OLE对象存储在实际数据之前包含一些标头信息，因此仅将字段内容提取为字节数组并将其保存到磁盘是行不通的.任何OLE对象文件都有一些标准签名.对于Word文档，OLEheaderLength is 85 bytes.因此，我从字节数组的两端剥离了85个字节，例如-

I found the solution for word documents (.doc files). OLE object storage in Ms Access contains some header information before actual data, so simply extracting the field contents as a byte array and saving it to disk does not work. Any OLE Object file has some standard signature. For word documents, OLEheaderLength is 85 bytes. So I strip 85 bytes from both ends of the byte array like-

Con.Open();
string _query="select licenseDoc from Products where ID=56";
//Column licenseDoc contains word and text douments as OLE Objects
OleDbCommand Cmd = new OleDbCommand(_query, Con);

const int offset =85;
var oleBytes = (Byte[])Cmd.ExecuteScalar();
MemoryStream ms = new MemoryStream();
ms.Write(oleBytes, offset, oleBytes.Length - offset);

var file = Path.GetTempFileName();
using (var fileStream = File.OpenWrite(file))
{
  var buffer = ms.GetBuffer();
  fileStream.Write(buffer, 0, (int)ms.Length);
}

变量file将包含.tmp文件的路径，该文件包含从存储为OLE object in Ms Access的word文档中读取的数据.该文件可以直接作为word document打开，或者其扩展名可以更改为.doc.

The variable file will contain the path of the .tmp file, which contains the data read from from the word document stored as an OLE object in Ms Access. This file can be directly opened in as a word document or it's extension can be changed .doc.

其他格式的OLEheaderLength如下:

1] JPEG/JPG=224
2] BMP=78
3] PDF=85
4] SNP=74
5] DOC=85/90
6] DOCX=87

我不知道.txt(Simple Text) files的OLEheaderLength.不幸的是，上述解决方案仅适用于.doc文件.但是，当涉及到.docx文件和任何其他文件格式时，它将失败.

I don't know the OLEheaderLength of .txt(Simple Text) files. Unfortunately the above solution works only for .doc files. But when it comes to .docx files and any other file formats, it fails.

为了找出ole标头的长度，您可以简单地使用已说明并可以从此处下载的库- 查看全文

如何通过C#读取访问数据库中"OLE对象"字段中存储的Word文档时删除垃圾字符? [英] How to remove junk characters while reading a word document stored in 'OLE Object' field in an access database through C#?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何通过C#读取访问数据库中"OLE对象"字段中存储的Word文档时删除垃圾字符? [英] How to remove junk characters while reading a word document stored in &#39;OLE Object&#39; field in an access database through C#?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何通过C#读取访问数据库中"OLE对象"字段中存储的Word文档时删除垃圾字符? [英] How to remove junk characters while reading a word document stored in 'OLE Object' field in an access database through C#?

登录关闭