C:将文件解析为行流时出现问题 [英] C: problems when parsing a file as a stream of lines

查看:88
本文介绍了C:将文件解析为行流时出现问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我会在循环中使用fgets()迭代文件行并处理它们,这很常见。通常它运作良好但在某些情况下很糟糕。



示例1:如果您需要知道有关下一行或上一行的信息决定如何处理当前行。



示例2:当前行表示您已阅读太多,您必须将其发送到解析器的不同上下文。



我知道很多变通方法,但总的来说它们往往是站不住脚的黑客。



我能读懂整个文件作为一个字符串数组,然后相对容易地爬过它。



我认为这是最好的解决方案,但它有一个主要的缺点:内存消耗相当于文件大小。据我所知,fopen()和fgets()没有这样做。



您对最佳妥协的看法是什么?用小缓存创建文件I / O接口?



谢谢



我是什么尝试过:



一次存储两到三行并使用引用将它们作为一组进行处理



如果已经读取并且需要读取线路,则使用非获取线函数将线条推回堆栈,从堆栈中抽出新线条直到它用完为止然后获取新线条。 div class =h2_lin>解决方案

如何创建索引?

在这种情况下,你必须阅读两次文件,但是,你将完全了解您的数据集。



您将通过读取每个字符并寻找换行来创建索引,当您按下换行符在内存中注册它时,作为数组或链接列表。使用它你甚至不需要缓存任何字符串,只需从索引跳转到索引。


当应用程序的工作解析完整文件时应该没问题。测试它。



最好使用一些标准容器,比如int 文件阅读示例代码


Quote:

示例1:如果你需要知道有关下一行或上一行的信息,以决定如何处理当前行。



示例2:当前行表示您已阅读太多,你必须将它发送到你的解析器的不同上下文。



这两个问题通常在使用前瞻机制的解析器中解决(参见例如解析 - 维基百科 [ ^ ]),如果我没有错,你已经发现自己是一个可能的解决方案

引用:

存储两三行s一次,并使用引用将它们作为一个集处理


It's pretty common that I will use fgets() in a loop to iterate through the lines of a file and process them. Normally it works well but in some cases it sucks.

Example 1: If you need to know information about the next line(s) or previous line(s) to decide how to process the current line.

Example 2: The current line indicates you have read too far and you must send it to a different context of your parser.

I know of many workarounds, but overall they tend to be untenable hacks.

I could do read the entire file in as an array of strings, and then crawl through it with relative ease.

I think this is the best solution but it comes with a major downside: memory consumption equivalent to the file size. As far as I know fopen() and fgets() are not doing that.

What are your thoughts on a best compromise? Create a file I/O interface with a small cache?

Thanks

What I have tried:

Storing two or three lines at a time and using references to process them as a set

Using an un-gets-line function to push lines back onto a stack if they have been read and need to be unread, drawing new lines off of the stack until it runs out and then fetching fresh ones.

解决方案

How about create index?
In this case, you will have to read the file twice, but, you will have complete knowledge of your data set.

You will create indexes by reading each character and seeking new line, when you hit the newline register it in memory, as array or as linked list. Using this you don't even have to cache any string, just hop from index to index.


When it is the work of the app to parse the complete file it should be no problem. Test it.

At best you use some standard containers like int this file reading example code.


Quote:

Example 1: If you need to know information about the next line(s) or previous line(s) to decide how to process the current line.

Example 2: The current line indicates you have read too far and you must send it to a different context of your parser.


Both of such problems are usually solved in parsers using the look-ahead mechanism (see for instance Parsing - Wikipedia[^] ), that, if I am not wrong, you already find yourself as a possible solution

Quote:

Storing two or three lines at a time and using references to process them as a set


这篇关于C:将文件解析为行流时出现问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆