Linux如何将文件识别为某种文件类型,以及如何以编程方式更改它? [英] How does Linux recognize a file as a certain file type, and how to programmatically change it?
问题描述
我正在用Java创建一个程序,该程序读取文件的输入流,并根据密码是通过更改字节数来对其进行加密,然后创建一个新的加密文件.
I am crating a program in java that reads an Input Stream of a file, encrypts it by changing around the numbers of the bytes based on what the password is, and creates a new encrypted file.
例如:
我创建了一个包含以下单词的测试文件:
This is a test to see if the encrypter project works.
当我在Java中读取字节时,得到:
[84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 116, 101, 115, 116, 32, 116, 111, 32, 115, 101, 101, 32, 105, 102, 32, 116, 104, 101, 32, 101, 110, 99, 114, 121, 112, 116, 101, 114, 32, 112, 114, 111, 106, 101, 99, 116, 32, 119, 111, 114, 107, 115, 46, 10]
因此,我将获取每个字节的值,然后减去密码的unicode值,并获得该值的绝对值.然后我将其写入文件.
For example:
I created a test file that contained the words:
This is a test to see if the encrypter project works.
When I read the bytes in java, I get:
[84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 116, 101, 115, 116, 32, 116, 111, 32, 115, 101, 101, 32, 105, 102, 32, 116, 104, 101, 32, 101, 110, 99, 114, 121, 112, 116, 101, 114, 32, 112, 114, 111, 106, 101, 99, 116, 32, 119, 111, 114, 107, 115, 46, 10]
So then I take the value of each byte, and subtract the unicode value of the passwords, and get the absolute value of that. Then I write that to a file.
我正在尝试使用不同的算法对其进行加密,然后开始在测试文本文件中对其进行测试.我使用的是Linux,因此没有文件扩展名(例如.txt,.pdf等),经过几次加密后,我注意到计算机不再将其识别为文本文件,而是,作为图像文件! (也就是说,默认情况下,当您单击它时,它将尝试在图像编辑器中打开文件)
I was playing around with different algorithms to encrypt it, and started testing it out on a test text file. I am using Linux, so so there are no file extensions (eg. .txt, .pdf, etc...) I noticed after a few times of encrypting it, that the computer no longer recognized it as a text file, but instead, as an image file! (meaning when you click on it, by default, it tries to open the file in an image editor)
这是我的问题:
- 我猜想它与文件中某些字节有关,但除此之外,我迷路了.
- 我希望即使加密后也能将文件保持为相同的文件类型,所以我在想,例如,如果文件类型信息位于前10个字节中,我将对所有内容进行加密在那之后,但例如将前十个字节留空.
- 这些字节的含义是否在所有平台上都是标准的(即pdf文件是pdf文件,与使用它的计算机无关)是因为
.pdf
扩展名,还是因为文件中某处的字节.)
- Do these bytes have a meaning that is standard across all platforms (ie. a pdf file is a pdf file no mater what computer you use it on. Is that because of the
.pdf
extension, or is it because of the bytes that are somewhere in the file.)
- 在哪里可以找到文件中什么字节的清单?
推荐答案
在传统的UNIX系统上,仅通过查找文件中出现的特定字节模式来识别文件.
On traditional UNIX systems, files are identified solely by looking for particular patterns of bytes appearing in the file.
file
命令使用magic
配置文件(通常为/etc/magic
或/usr/share/file/magic
),该文件包含定义这些字节模式的规则.
The file
command uses a magic
configuration file (often /etc/magic
, or /usr/share/file/magic
) which contains the rules defining those byte patterns.
就是这样-没有特殊的额外元数据-全部通过内容分析来完成.
That's it - there's no special extra meta-data - it's all done by analysis of the content.
这篇关于Linux如何将文件识别为某种文件类型,以及如何以编程方式更改它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!