如何从文本文件中读取值并计算值重复多少次然后求平均值? [英] How can I read in values from a text file and calculate how many times a value repeats and then find the average?

查看:107
本文介绍了如何从文本文件中读取值并计算值重复多少次然后求平均值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为text.txt的文本文件,如下所示:

  5.H6 7.891 0.3 
6.H6 7.693 0.3
7.H8 8.16859 0.3
8.H6 7.446 0.3
5.H6 7.72158 0.3
9.H8 8.1053 0.3
8.H6 7.65014 0.3
10.H6 7.54 0.3
12.H6 8.067 0.3
13.H6 8.047 0.3
14.H6 7.69624 0.3
6.H6 7.70272 0.3
17 .H8 7.169 0.3
16.H8 8.27957 0.3
18.H6 7.385 0.3
19.H8 7.657 0.3
20.H8 7.78512 0.3
21.H8 8.06057 0.3

我想创建一个新的输出文本文件,如下所示:

  Atom nVa predppm avgppm 
7.H2 2 7.674 7.853
9.H2 2 7.434 7.458
20.H2 2 7.602 7.898
21.H2 1 7.959 7.898
8.H1'1 5.363 5.238

本质上,我想从text.txt中读取值,并查看第一列中的值是否重复。例如, text.txt 中的 5.H6 在第1行和第5行中重复。 5.H6 分别是7.891和7.72158,我想计算它们的平均值并将它们放在输出文件中 avgppm 在我的示例输出文件中。另外,在示例输出文件的第二列,称为 nVa 中,我想计算文本第一列中的值多少次。 txt 重复。例如, 5.H6 重复了两次,因此对于 Atom 5.H6 ,第二列应该为2。 p>

现在,我只是尝试编写代码以从示例输出文件中获取第一,第二和第四列。但是后来我想向文件中添加单独的列,例如 predppm stdev delta 等。



这是我当前的代码:

 以pd 

filename ='text.txt'导入熊猫
df = pd.read_csv(filename,sep = r'/ s +',header = None)
df [df.duplicated([''],keep = False)]
df.sum(axis = 1)/ len(df.columns)


df .to_csv( output.txt,sep = r'/ s +',header = None)

我不确定如何继续,因为我不断收到错误,所以无法测试我的代码。



编辑:错误

  gb =(df.groupby( Atom,as_index = False).agg({ ppm:[ count, mean]})。rename(columns = { count: nVa, mean: avgppm}))) 
文件 /Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/generic.py,行4416,在groupby中b $ b ** kwargs)
文件 /Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py ,第1699行,在groupby
中返回klass(obj,by,** kwds)
File /Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx- 10.11-intel.egg / pandas / core / groupby.py,第392行,位于__init__
mutated = self.mutated)
文件 /Library/Python/2.7/site-packages/pandas-0.20 .3-py2.7-macosx-10.11-intel.egg / pandas / core / groupby.py,第2690行,在_get_grouper
中引发KeyError(gpr)
KeyError:'Atom'


解决方案

使用 df as:

 原子ppm不清楚
0 5.H6 7.89100 0.3
1 6.H6 7.69300 0.3
2 7.H8 8.16859 0.3
3 8.H6 7.44600 0.3
4 5.H6 7.72158 0.3
5 9.H8 8.10530 0.3
6 8.H6 7.65014 0.3
7 10.H6 7.54000 0.3
8 12.H6 8.06700 0.3
9 13.H6 8.04700 0.3
10 14.H6 7.69624 0.3
11 6.H6 7.70272 0.3
12 17.H8 7.16900 0.3
13 16.H8 8.27957 0.3
14 18.H6 7.38500 0.3
15 19.H8 7.65700 0.3
16 20.H8 7.78512 0.3
17 21.H8 8.06057 0.3

使用 groupby()收集每个 Atom 的信息,然后根据需要应用聚合函数:

  gb =(df.groupby( Atom,as_index = False)
.agg({ ppm :[ count, mean]})
.rename(columns = { count: nVa, mean: avgppm}))
gb.head()
原子ppm
nVa平均ppm
0 10.H6 1 7.54000
1 12.H6 1 8.06700
2 13.H6 1 8.04700
3 14.H6 1 7.69624
4 16.H8 1 8.27957

这提供了用于分组和聚合的工作流,但它的格式并不符合您的要求。我们可以删除多级列结构,尽管并不是严格地计算您感兴趣的值:

  gb .columns = gb.columns.droplevel()
gb = gb.rename(columns = {: Atom})

Atom nVa avgppm
0 10.H6 1 7.54000
1 12.H6 1 8.06700
2 13.H6 1 8.04700
3 14.H6 1 7.69624
4 16.H8 1 8.27957
5 17.H8 1 7.16900
6 18.H6 1 7.38500
7 19.H8 1 7.65700
8 20.H8 1 7.78512
9 21.H8 1 8.06057
10 5.H6 2 7.80629
11 6.H6 2 7.69786
12 7.H8 1 8.16859
13 8.H6 2 7.54807
14 9.H8 1 8.10530

请参见 groupby() 文档进行全面处理。


I have a text file called text.txt which looks like this:

5.H6 7.891 0.3
6.H6 7.693 0.3
7.H8 8.16859 0.3
8.H6 7.446 0.3
5.H6 7.72158 0.3
9.H8 8.1053 0.3
8.H6 7.65014 0.3
10.H6 7.54 0.3
12.H6 8.067 0.3
13.H6 8.047 0.3
14.H6 7.69624 0.3
6.H6 7.70272 0.3
17.H8 7.169 0.3
16.H8 8.27957 0.3
18.H6 7.385 0.3
19.H8 7.657 0.3
20.H8 7.78512 0.3
21.H8 8.06057 0.3

I want to create a new output text file which looks like this:

 Atom nVa  predppm   avgppm    
  7.H2   2   7.674   7.853    
  9.H2   2   7.434   7.458    
  20.H2  2   7.602   7.898   
  21.H2  1   7.959   7.898   
  8.H1'  1   5.363   5.238   

Essentially I want to read in values from text.txt and see if values in the first column repeat. For example, 5.H6 from text.txt repeats in row 1 and 5. The values in the second columns for 5.H6 are 7.891 and 7.72158, I want to calculate the average for them and put them in a column in my output file under avgppm in my sample output file. Also, in my second column of my sample output file, called nVa I want to count how many times my a value from the first column of text.txt is repeated. For example, 5.H6 is repeated twice so the second column should be 2 for Atom 5.H6.

Right now, I'm just trying to code to get the first, second and fourth column from my sample output file. But later on I want to add separate columns to my file like predppm, stdev, delta, etc.

This is my current code:

import pandas as pd

filename = 'text.txt'
df = pd.read_csv(filename,sep = r'/s+', header = None)
df[df.duplicated([' '], keep=False)]
df.sum(axis=1) / len(df.columns)


df.to_csv("output.txt",sep = r'/s+',header=None)

I'm not sure how to proceed, I can't test my code out because I keep getting errors.

Edit: Error

  gb = (df.groupby("Atom", as_index=False).agg({"ppm":["count","mean"]}).rename(columns={"count":"nVa", "mean":"avgppm"}))
  File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/generic.py", line 4416, in groupby
**kwargs)
  File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py", line 1699, in groupby
return klass(obj, by, **kwds)
  File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py", line 392, in __init__
mutated=self.mutated)
  File "/Library/Python/2.7/site-packages/pandas-0.20.3-py2.7-macosx-10.11-intel.egg/pandas/core/groupby.py", line 2690, in _get_grouper
raise KeyError(gpr)
KeyError: 'Atom'

解决方案

With df as:

     Atom      ppm  unclear
0    5.H6  7.89100      0.3
1    6.H6  7.69300      0.3
2    7.H8  8.16859      0.3
3    8.H6  7.44600      0.3
4    5.H6  7.72158      0.3
5    9.H8  8.10530      0.3
6    8.H6  7.65014      0.3
7   10.H6  7.54000      0.3
8   12.H6  8.06700      0.3
9   13.H6  8.04700      0.3
10  14.H6  7.69624      0.3
11   6.H6  7.70272      0.3
12  17.H8  7.16900      0.3
13  16.H8  8.27957      0.3
14  18.H6  7.38500      0.3
15  19.H8  7.65700      0.3
16  20.H8  7.78512      0.3
17  21.H8  8.06057      0.3

Use groupby() to collect information per-Atom, then apply aggregation functions as desired:

gb = (df.groupby("Atom", as_index=False)
        .agg({"ppm":["count","mean"]})
        .rename(columns={"count":"nVa", "mean":"avgppm"}))
gb.head()
     Atom ppm         
          nVa   avgppm
0   10.H6   1  7.54000
1   12.H6   1  8.06700
2   13.H6   1  8.04700
3   14.H6   1  7.69624
4   16.H8   1  8.27957

That gives the workflow for grouping and aggregating, but it's not quite in the format you requested. We can drop the multi-level column structure, although it's not strictly necessary to compute the values you're interested in:

gb.columns = gb.columns.droplevel()
gb = gb.rename(columns={"":"Atom"})

     Atom  nVa   avgppm
0   10.H6    1  7.54000
1   12.H6    1  8.06700
2   13.H6    1  8.04700
3   14.H6    1  7.69624
4   16.H8    1  8.27957
5   17.H8    1  7.16900
6   18.H6    1  7.38500
7   19.H8    1  7.65700
8   20.H8    1  7.78512
9   21.H8    1  8.06057
10   5.H6    2  7.80629
11   6.H6    2  7.69786
12   7.H8    1  8.16859
13   8.H6    2  7.54807
14   9.H8    1  8.10530

See groupby() docs for a full treatment.

这篇关于如何从文本文件中读取值并计算值重复多少次然后求平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆