使用 POI 获取嵌入对象的行和列 [英] Get Row and Col for embedded Object with POI

查看:85
本文介绍了使用 POI 获取嵌入对象的行和列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在处理 Excel 文件 (*.xlsm) 和 Apache POI,并且我一直在为一项任务努力.我收到一些嵌入了 PDF 的 excel 文件,我想提取它们并根据它们所在的行和列重命名它们.这看起来很奇怪,因为我知道嵌入的对象表示为图像,它们可以占用多个单元格,而且从技术上讲,它们不在单元格中.

下面的代码片段让我提取嵌入的 PDF,但它们被命名为 OleObject[1..2..3.etc..] ,但没有给我任何线索.

inStream = new FileInputStream(file);XSSFWorkbook 工作簿 = 新 XSSFWorkbook(inStream);for (PackagePart pPart : workbook.getAllEmbedds()) {String contentType = pPart.getContentType();if (contentType.equals("application/vnd.openxmlformats-officedocument.oleObject")){POIFSFileSystem fs = 新的POIFSFileSystem(pPart.getInputStream());TikaInputStream stream = TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));byte[] bytes = IOUtil.toByteArray(stream);流.关闭();OutputStream outStream = new FileOutputStream(new File(ROOT_DIRECTORY.getAbsolutePath()+"\\PDF"+i+".pdf"));IOUtil.copy(bytes, outStream);outStream.close();}}

我想知道 org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet 是否会让我看到 Excel 表格的 xml 代码,也许我可以得到我需要的信息.像这样.

<oleObject progId="Acrobat 文档" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"><objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row<xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><行>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr></oleObject></mc:Choice>;mc:Fallback><oleObject progId="Acrobat 文档" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"/></mc:Fallback></mc:AlternateContent></oleObjects>

--

<objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to>xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:colOff></to></anchor></objectPr>

我想使用锚信息是可能的,但我只是无法找到如何获取它.

希望这些信息能让我清楚地知道我要做什么.

提前致谢.

解决方案

我查看了当前 poi-ooxml-schemas 源 jar 的源代码,您可以在此处找到:http://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.3/

org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet 扩展了 org.apache.xmlbeans.XmlObject,它可以使用继承的 .toString() 方法将 XML 作为字符串提供.或者,您可以通过在 CTWorksheet 对象上调用 getOleObjects() 来快速访问工作表中的 OLE 对象列表.

/*** 获取oleObjects"元素*/org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObjects getOleObjects();

CTOleObjects 本身扩展了 org.apache.xmlbeans.XmlObject 并且您可以再次使用 toString() 获取 XML 进行解析,或者获取 org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject OLE 对象的列表以使用迭代CTOleObjects.getOleObjectList().

/*** 获取oleObject"元素列表*/java.util.ListgetOleObjectList();

CTOleObject 似乎没有 getter 方法来获取 XML 元素和子 XML 元素以允许您确定列,因此我认为您需要进行一些 XML 解析或字符串搜索以获取此信息(如果包含)在字符串 XML 表示中.

希望这会有所帮助.

i'm currently working with Excel files (*.xlsm) and Apache POI , and i have been cracking my head over a task. I receive some excel files that have PDFs embedded in it and i want to extract them and rename them based on the row and column they are in. This seems weird as i know the embedded objects are represented as images ,they can occupy more than one cell and technically they are not "In" the cell.

The following code snippet lets me extract the embedded PDFs but they are named OleObject[1..2..3.etc..] wich doesnt give me any clue.

inStream = new FileInputStream(file);
XSSFWorkbook workbook = new XSSFWorkbook(inStream);
for (PackagePart pPart : workbook.getAllEmbedds()) {
    String contentType = pPart.getContentType();
    if (contentType.equals("application/vnd.openxmlformats-officedocument.oleObject")){
        POIFSFileSystem fs = new POIFSFileSystem(pPart.getInputStream());
        TikaInputStream stream =  TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));

        byte[] bytes = IOUtil.toByteArray(stream);
        stream.close();
        OutputStream outStream = new FileOutputStream(new File(ROOT_DIRECTORY.getAbsolutePath()+"\\PDF"+i+".pdf"));
        IOUtil.copy(bytes, outStream);
        outStream.close();
    }}

I wanted to know if org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet will let me see the xml code of the excell sheet and maybe eith taht i can get the info i need. Like this.

<oleObjects><mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"><mc:Choice Requires="x14"><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"><objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr></oleObject></mc:Choice><mc:Fallback><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"/></mc:Fallback></mc:AlternateContent></oleObjects>

--

<objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr>

I guess using the anchor information would be possible but im just unable to find how to get it.

Hope this information makes things clear on what im trying to do .

Thanks in advance.

解决方案

I've looked at the source code for the current poi-ooxml-schemas sources jars which you can locate here: http://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.3/

org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet extends org.apache.xmlbeans.XmlObject which can give you the XML as a string using the inherited .toString() method. Or you can quickly access the list of OLE objects in the worksheet by calling getOleObjects() on your CTWorksheet object.

/**
 * Gets the "oleObjects" element
 */
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObjects getOleObjects();

CTOleObjects itself extends org.apache.xmlbeans.XmlObject and again you can get the XML using toString() for parsing, or get a list of org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject OLE objects for iteration using CTOleObjects.getOleObjectList().

/**
 * Gets a List of "oleObject" elements
 */
java.util.List<org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject> getOleObjectList();

CTOleObject doesn't seem to have getter methods to get the and child XML elements to allow you to determine the columns, so I think you would need to do some XML parsing, or string searching to get this info if it is contained in the string XML representation.

Hope this helps.

这篇关于使用 POI 获取嵌入对象的行和列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆