pandas :将文件读入DataFrame时,忽略特定字符串后的所有行 [英] Pandas: ignore all lines following a specific string when reading a file into a DataFrame
问题描述
我有一个Pandas DataFrame,可以总结为:
I have a pandas DataFrame which can be summarized as this:
[Header]
Some_info = some_info
[Data]
Col1 Col2
0.532 Point
0.234 Point
0.123 Point
1.455 Square
14.64 Square
[Other data]
Other1 Other2
Test1 PASS
Test2 FAIL
我的目标是仅读取[Data]
和[Other data]
之间的文本部分,该部分是可变的(不同长度).标头的长度始终相同,因此可以使用pandas.read_csv
中的skiprows
.但是,skipfooter
需要行数才能跳过,这可以在文件之间更改.
My goal is to read only the portion of text between [Data]
and [Other data]
, which is variable (different length). The header has always the same length, so skiprows
from pandas.read_csv
can be used. However, skipfooter
needs the number of lines to skip, which can change between files.
什么是最好的解决方案?除非没有其他解决方案,否则我想避免从外部更改文件.
What would be the best solution here? I would like to avoid altering the file externally unless there's no other solution.
推荐答案
此方法必须对文件运行两次.
This method has to run over the file twice.
import itertools as it
def get_footer(file_):
with open(file_) as f:
g = it.dropwhile(lambda x: x != '[Other data]\n', f)
footer_len = len([i for i, _ in enumerate(g)])
return footer_len
footer_len = get_footer('file.txt')
df = pd.read_csv('file.txt', … skipfooter=footer_len)
这篇关于 pandas :将文件读入DataFrame时,忽略特定字符串后的所有行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!