如何识别/处理Java中的文本文件换行符? [英] How to identify/handle text file newlines in Java?

查看:1432
本文介绍了如何识别/处理Java中的文本文件换行符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到不同格式的文件来自不同的系统,我需要导入到我们的数据库。部分导入过程中检查行的长度以确保格式正确。
我们似乎在添加一个字符的UNIX系统中遇到了文件问题。我怀疑这是由于回车在UNIX和Windows平台上的编码方式不同。

有没有办法检测文件是在哪个文件系统上创建的,除了检查行上的最后一个字符?或者,也许一种阅读文件的方式,而不是二进制文件,我怀疑是问题?

谢谢你们!

解决方案

Unix系统使用 \\\
行结束而Windows使用 \r\\\
和mac使用 \r
您无法检测文件系统,因为它根本不重要。例如,如果我的编辑器支持,我可以在Windows上使用\ n。这只是在这些操作系统的标准,而不是一个要求。



正确的方法 - 假设你没有一个功能,无论文件结束使用什么行 - 搜索\ n或a \r,然后结束当前行,并从开始下一行之前的剩余数据中删除\r或\ n中的所有字符。
但是,如果您有空白的行并且需要保留它们,这将会导致问题。在这种情况下,你必须在换行更仔细地看:




  • 阅读\\\
    时,结束当前行,并开始下一行读取一个\r时,结束当前行,如果下一个字符是\ n,则跳过它,并开始下一行,否则立即开始新行。 / li>

I get files in different formats coming from different systems that I need to import into our database. Part of the import process it to check the line length to make sure the format is correct. We seem to be having issues with files coming from UNIX systems where one character is added. I suspect this is due to the return carriage being encoded differently on UNIX and windows platform.

Is there a way to detect on which file system a file was created, other than checking the last character on the line? Or maybe a way of reading the files as text and not binary which I suspect is the issue?

Thanks Guys !

解决方案

Unix systems use \n line endings while windows uses \r\n and mac uses \r. You cannot detect the file system since it doesn't matter at all. I can use \n on windows if my editor supports it for example. It's just the standard on those OS, not a requirement.

The proper way - assuming you don't have a function which properly tokenizes no matter what line ending the file uses - is to search for a \n OR a \r and then end the current line and strip all chars from the remaining data which are either \r or \n before you begin the next line. However, this will cause issues if you have blank lines and need to keep them. In this case you have to look at linebreaks more carefully:

  • when reading a \n, end the current line and start the next line
  • when reading a \r, end the current line and, if the next char is \n, skip it, and start the next line, otherwise start the new line immediately.

这篇关于如何识别/处理Java中的文本文件换行符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆