是什么原因导致计算机将文件识别为某种文件类型?以及如何更改它(使用java)? [英] What causes the computer to recognize a file as a certain file type? and how can I change it (with java)?
问题描述
我在java中创建一个程序来读取文件的输入流,通过根据密码更改字节数来加密它,并创建一个新的加密文件。
I am crating a program in java that reads an Input Stream of a file, encrypts it by changing around the numbers of the bytes based on what the password is, and creates a new encrypted file.
例如:
我创建了一个包含以下字词的测试文件:
这是验证加密器项目是否有效的测试。
当我读取java中的字节时,我得到:
[84 ,104,105,115,32,105,115,32,97,32,116,101,115,116,32,116,111,32,115,101,101,32,105,102,32,116 ,104,101,32,101,110,99,114,121,112,116,101,114,32,112,114,111,106,101,99,116,32,119,111,114,107 ,115,46,10]
然后我取每个字节的值,并减去密码的unicode值,并得到它的绝对值。然后我把它写到一个文件。
For example:
I created a test file that contained the words:
This is a test to see if the encrypter project works.
When I read the bytes in java, I get:
[84, 104, 105, 115, 32, 105, 115, 32, 97, 32, 116, 101, 115, 116, 32, 116, 111, 32, 115, 101, 101, 32, 105, 102, 32, 116, 104, 101, 32, 101, 110, 99, 114, 121, 112, 116, 101, 114, 32, 112, 114, 111, 106, 101, 99, 116, 32, 119, 111, 114, 107, 115, 46, 10]
So then I take the value of each byte, and subtract the unicode value of the passwords, and get the absolute value of that. Then I write that to a file.
我正在玩不同的算法来加密它,并开始在测试文本文件上测试它。我正在使用Linux,所以没有文件扩展名(例如.txt,.pdf等...)我注意到几次加密后,计算机不再将其识别为文本文件,而是,作为图像文件! (意思是当你点击它时,默认情况下,它会尝试在图像编辑器中打开文件)
I was playing around with different algorithms to encrypt it, and started testing it out on a test text file. I am using Linux, so so there are no file extensions (eg. .txt, .pdf, etc...) I noticed after a few times of encrypting it, that the computer no longer recognized it as a text file, but instead, as an image file! (meaning when you click on it, by default, it tries to open the file in an image editor)
所以这是我的问题:
- 我猜它与某些字节有关看着文件中的某个地方,但除此之外,我迷路了。
- 我希望能够将文件保持为相同的文件类型,即使加密后也是如此,所以我在想,如果,例如,如果文件类型信息在前10个字节中,我会在之后对所有内容进行加密,但保留前10个字节,例如。
- 这些字节是否具有所有平台的标准含义(即pdf文件是pdf file no mater是什么计算机你使用它。这是因为
.pdf
扩展名,还是因为文件中某处的字节。)
- Do these bytes have a meaning that is standard across all platforms (ie. a pdf file is a pdf file no mater what computer you use it on. Is that because of the
.pdf
extension, or is it because of the bytes that are somewhere in the file.)
- 我在哪里可以找到列出哪些字节意味着文件中的内容?
推荐答案
在传统的UNIX系统上,文件是仅通过查找文件中出现的特定字节模式来识别。
On traditional UNIX systems, files are identified solely by looking for particular patterns of bytes appearing in the file.
文件
命令使用 magic
配置文件(通常 / etc / magic
,或 / usr / share / file / magic
)包含定义这些字节模式的规则。
The file
command uses a magic
configuration file (often /etc/magic
, or /usr/share/file/magic
) which contains the rules defining those byte patterns.
就是这样 - 没有特别的额外元数据 - 这些都是通过分析内容完成的。
That's it - there's no special extra meta-data - it's all done by analysis of the content.
这篇关于是什么原因导致计算机将文件识别为某种文件类型?以及如何更改它(使用java)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!