如何从文本文件中读取值并计算值重复多少次然后求平均值? [英] How can I read in values from a text file and calculate how many times a value repeats and then find the average?
问题描述
我有一个名为text.txt的文本文件,如下所示:
5.H6 7.891 0.3
6.H6 7.693 0.3
7.H8 8.16859 0.3
8.H6 7.446 0.3
5.H6 7.72158 0.3
9.H8 8.1053 0.3
8.H6 7.65014 0.3
10.H6 7.54 0.3
12.H6 8.067 0.3
13.H6 8.047 0.3
14.H6 7.69624 0.3
6.H6 7.70272 0.3
17 .H8 7.169 0.3
16.H8 8.27957 0.3
18.H6 7.385 0.3
19.H8 7.657 0.3
20.H8 7.78512 0.3
21.H8 8.06057 0.3
我想创建一个新的输出文本文件,如下所示:
Atom nVa predppm avgppm
7.H2 2 7.674 7.853
9.H2 2 7.434 7.458
20.H2 2 7.602 7.898
21.H2 1 7.959 7.898
8.H1'1 5.363 5.238
本质上,我想从text.txt中读取值,并查看第一列中的值是否重复。例如, 现在,我只是尝试编写代码以从示例输出文件中获取第一,第二和第四列。但是后来我想向文件中添加单独的列,例如 这是我当前的代码: 我不确定如何继续,因为我不断收到错误,所以无法测试我的代码。 编辑:错误 使用 使用 这提供了用于分组和聚合的工作流,但它的格式并不符合您的要求。我们可以删除多级列结构,尽管并不是严格地计算您感兴趣的值: 请参见 I have a text file called text.txt which looks like this: I want to create a new output text file which looks like this: Essentially I want to read in values from text.txt and see if values in the first column repeat. For example, Right now, I'm just trying to code to get the first, second and fourth column from my sample output file. But later on I want to add separate columns to my file like This is my current code: I'm not sure how to proceed, I can't test my code out because I keep getting errors. Edit: Error
With Use That gives the workflow for grouping and aggregating, but it's not quite in the format you requested. We can drop the multi-level column structure, although it's not strictly necessary to compute the values you're interested in: See 这篇关于如何从文本文件中读取值并计算值重复多少次然后求平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! text.txt
中的 5.H6
在第1行和第5行中重复。 5.H6
分别是7.891和7.72158,我想计算它们的平均值并将它们放在输出文件中 avgppm $下的列中c $ c>在我的示例输出文件中。另外,在示例输出文件的第二列,称为
nVa
中,我想计算文本第一列中的值多少次。 txt
重复。例如, 5.H6
重复了两次,因此对于 Atom 5.H6
,第二列应该为2。 p>
predppm
, stdev
, delta
等。
以pd
filename ='text.txt'导入熊猫
df = pd.read_csv(filename,sep = r'/ s +',header = None)
df [df.duplicated([''],keep = False)]
df.sum(axis = 1)/ len(df.columns)
df .to_csv( output.txt,sep = r'/ s +',header = None)
gb =(df.groupby( Atom,as_index = False).agg({ ppm:[ count, mean]})。rename(columns = { count: nVa, mean: avgppm})))
文件 /Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/generic.py,行4416,在groupby中b $ b ** kwargs)
文件 /Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py ,第1699行,在groupby
中返回klass(obj,by,** kwds)
File /Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx- 10.11-intel.egg / pandas / core / groupby.py,第392行,位于__init__
mutated = self.mutated)
文件 /Library/Python/2.7/site-packages/pandas-0.20 .3-py2.7-macosx-10.11-intel.egg / pandas / core / groupby.py,第2690行,在_get_grouper
中引发KeyError(gpr)
KeyError:'Atom'
df
as:
原子ppm不清楚
0 5.H6 7.89100 0.3
1 6.H6 7.69300 0.3
2 7.H8 8.16859 0.3
3 8.H6 7.44600 0.3
4 5.H6 7.72158 0.3
5 9.H8 8.10530 0.3
6 8.H6 7.65014 0.3
7 10.H6 7.54000 0.3
8 12.H6 8.06700 0.3
9 13.H6 8.04700 0.3
10 14.H6 7.69624 0.3
11 6.H6 7.70272 0.3
12 17.H8 7.16900 0.3
13 16.H8 8.27957 0.3
14 18.H6 7.38500 0.3
15 19.H8 7.65700 0.3
16 20.H8 7.78512 0.3
17 21.H8 8.06057 0.3
groupby()
收集每个 Atom
的信息,然后根据需要应用聚合函数:
gb =(df.groupby( Atom,as_index = False)
.agg({ ppm :[ count, mean]})
.rename(columns = { count: nVa, mean: avgppm}))
gb.head()
原子ppm
nVa平均ppm
0 10.H6 1 7.54000
1 12.H6 1 8.06700
2 13.H6 1 8.04700
3 14.H6 1 7.69624
4 16.H8 1 8.27957
gb .columns = gb.columns.droplevel()
gb = gb.rename(columns = {: Atom})
Atom nVa avgppm
0 10.H6 1 7.54000
1 12.H6 1 8.06700
2 13.H6 1 8.04700
3 14.H6 1 7.69624
4 16.H8 1 8.27957
5 17.H8 1 7.16900
6 18.H6 1 7.38500
7 19.H8 1 7.65700
8 20.H8 1 7.78512
9 21.H8 1 8.06057
10 5.H6 2 7.80629
11 6.H6 2 7.69786
12 7.H8 1 8.16859
13 8.H6 2 7.54807
14 9.H8 1 8.10530
groupby()
文档进行全面处理。5.H6 7.891 0.3
6.H6 7.693 0.3
7.H8 8.16859 0.3
8.H6 7.446 0.3
5.H6 7.72158 0.3
9.H8 8.1053 0.3
8.H6 7.65014 0.3
10.H6 7.54 0.3
12.H6 8.067 0.3
13.H6 8.047 0.3
14.H6 7.69624 0.3
6.H6 7.70272 0.3
17.H8 7.169 0.3
16.H8 8.27957 0.3
18.H6 7.385 0.3
19.H8 7.657 0.3
20.H8 7.78512 0.3
21.H8 8.06057 0.3
Atom nVa predppm avgppm
7.H2 2 7.674 7.853
9.H2 2 7.434 7.458
20.H2 2 7.602 7.898
21.H2 1 7.959 7.898
8.H1' 1 5.363 5.238
5.H6
from text.txt
repeats in row 1 and 5. The values in the second columns for 5.H6
are 7.891 and 7.72158, I want to calculate the average for them and put them in a column in my output file under avgppm
in my sample output file. Also, in my second column of my sample output file, called nVa
I want to count how many times my a value from the first column of text.txt
is repeated. For example, 5.H6
is repeated twice so the second column should be 2 for Atom 5.H6
.predppm
, stdev
, delta
, etc. import pandas as pd
filename = 'text.txt'
df = pd.read_csv(filename,sep = r'/s+', header = None)
df[df.duplicated([' '], keep=False)]
df.sum(axis=1) / len(df.columns)
df.to_csv("output.txt",sep = r'/s+',header=None)
gb = (df.groupby("Atom", as_index=False).agg({"ppm":["count","mean"]}).rename(columns={"count":"nVa", "mean":"avgppm"}))
File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/generic.py", line 4416, in groupby
**kwargs)
File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py", line 1699, in groupby
return klass(obj, by, **kwds)
File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py", line 392, in __init__
mutated=self.mutated)
File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py", line 2690, in _get_grouper
raise KeyError(gpr)
KeyError: 'Atom'
df
as: Atom ppm unclear
0 5.H6 7.89100 0.3
1 6.H6 7.69300 0.3
2 7.H8 8.16859 0.3
3 8.H6 7.44600 0.3
4 5.H6 7.72158 0.3
5 9.H8 8.10530 0.3
6 8.H6 7.65014 0.3
7 10.H6 7.54000 0.3
8 12.H6 8.06700 0.3
9 13.H6 8.04700 0.3
10 14.H6 7.69624 0.3
11 6.H6 7.70272 0.3
12 17.H8 7.16900 0.3
13 16.H8 8.27957 0.3
14 18.H6 7.38500 0.3
15 19.H8 7.65700 0.3
16 20.H8 7.78512 0.3
17 21.H8 8.06057 0.3
groupby()
to collect information per-Atom
, then apply aggregation functions as desired:gb = (df.groupby("Atom", as_index=False)
.agg({"ppm":["count","mean"]})
.rename(columns={"count":"nVa", "mean":"avgppm"}))
gb.head()
Atom ppm
nVa avgppm
0 10.H6 1 7.54000
1 12.H6 1 8.06700
2 13.H6 1 8.04700
3 14.H6 1 7.69624
4 16.H8 1 8.27957
gb.columns = gb.columns.droplevel()
gb = gb.rename(columns={"":"Atom"})
Atom nVa avgppm
0 10.H6 1 7.54000
1 12.H6 1 8.06700
2 13.H6 1 8.04700
3 14.H6 1 7.69624
4 16.H8 1 8.27957
5 17.H8 1 7.16900
6 18.H6 1 7.38500
7 19.H8 1 7.65700
8 20.H8 1 7.78512
9 21.H8 1 8.06057
10 5.H6 2 7.80629
11 6.H6 2 7.69786
12 7.H8 1 8.16859
13 8.H6 2 7.54807
14 9.H8 1 8.10530
groupby()
docs for a full treatment.