如何解码/获取文件编码(Power BI桌面文件) [英] how to decode/ get encoding of file (Power BI desktop file)

查看:228
本文介绍了如何解码/获取文件编码(Power BI桌面文件)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解码Power BI桌面报告(pbix)内部文件(DataMashup)。
我的目标是使用任何编程语言创建Power-BI桌面报告,数据模型。我最初使用Java。

I am having power BI desktop report(pbix) internal file (DataMashup), which i am trying to decode. My Aim is to create Power-BI desktop report, Data Model using any programming language. I am using Java for initial.

文件是使用某种编码技术进行编码的。

files are encoded with some encoding technique.

我试图获取文件的编码,并且它返回的是Windows1254。但是解码没有发生。

I tried to get encoding of file and it is returning windows 1254. but decoding is not happening.

File f = new File("example.txt");

    String[] charsetsToBeTested = {"UTF-8", "windows-1254", "ISO-8859-7"};

    CharsetDetector cd = new CharsetDetector();
    Charset charset = cd.detectCharset(f, charsetsToBeTested);

    if (charset != null) {
        try {
            InputStreamReader reader = new InputStreamReader(new FileInputStream(f), charset);
            int c = 0;
            while ((c = reader.read()) != -1) {
                System.out.print((char)c);
            }
            reader.close();
        } catch (FileNotFoundException fnfe) {
            fnfe.printStackTrace();
        }catch(IOException ioe){
            ioe.printStackTrace();
        }

    }else{
        System.out.println("Unrecognized charset.");
    }

解压缩文件也不起作用

public void unZipIt(String zipFile, String outputFolder)
{
    byte buffer[] = new byte[1024];
    try
    {
        File folder = new File(outputFolder);
        if(!folder.exists())
        {
            folder.mkdir();
        }
        ZipInputStream zis = new ZipInputStream(new FileInputStream(zipFile));
        System.out.println(zis);

        System.out.println(zis.getNextEntry());
        for(ZipEntry ze = zis.getNextEntry(); ze != null; ze = zis.getNextEntry())
        {
            String fileName = ze.getName();
            System.out.println(ze);
            File newFile = new File((new StringBuilder(String.valueOf(outputFolder))).append(File.separator).append(fileName).toString());
            System.out.println((new StringBuilder("file unzip : ")).append(newFile.getAbsoluteFile()).toString());
            (new File(newFile.getParent())).mkdirs();
            FileOutputStream fos = new FileOutputStream(newFile);
            int len;
            while((len = zis.read(buffer)) > 0) 
            {
                fos.write(buffer, 0, len);
            }
            fos.close();
        }

        zis.closeEntry();
        zis.close();
        System.out.println("Done");
    }
    catch(IOException ex)
    {
        ex.printStackTrace();
    }
}


推荐答案

文件包含二进制标头,然后指定了UTF-8的XML。
标头数据似乎包含文件名(Config / Package.xml),因此假定zip格式是可以理解的。使用zip格式时,文件末尾也会有二进制数据。

The file contains a binary header and then XML with UTF-8 specified. The header data seems to hold the file name (Config/Package.xml), so assuming a zip format is understandable. With a zip format also there would be binary data at the end of file.

也许文件是使用FTP下载的,并且经过了文本转换( \n \r\n)完成了。然后,拉链将被损坏。将文件重命名为.zip可能有助于使用zip工具测试文件。

Maybe the file was downloaded using FTP, and a text conversion ("\n" to "\r\n") was done. Then the zip would be corrupted. Renaming the file to .zip might help testing the file with zip tools.

首先尝试使用.tar格式。 XML文件未压缩。将.tar添加到文件结尾。

Try first the .tar format. This would be logical as the XML file is not compressed. Add .tar to the file ending.

否则,如果内容始终为UTF-8 XML:

Otherwise, if the content is always UTF-8 XML:

Path f = Paths.get("example.txt");
String start ="<?xml";
String end = ">";
byte[] bytes = Files.readAllBytes(f);
String s = new String(bytes, StandardCharsets.ISO_8859_1); // Single byte encoding.
int startI = s.indexOf(start);
int endI = s.lastIndexOf(end) + end.length();
//bytes = Arrays.copyOfRange(bytes, startI, endI);
String xml = new String(bytes, startI, endI - startI, StandardCharsets.UTF_8);

这篇关于如何解码/获取文件编码(Power BI桌面文件)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆