Python,如何统计json文本文件中的独特纪念品? [英] Python, how to count the unique commeters in a json text file?

查看:110
本文介绍了Python,如何统计json文本文件中的独特纪念品?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理Python代码,该代码使用嵌套在评论者字段名称中的字段名称_id中找到的唯一ID来计算聊天中唯一评论者的数量。 JSON看起来像这样。



Json:

I am working on Python code that counts the number of unique commenters in a chat using their unique id found in the field name "_id" nested in the commenters field name. The JSON looks like this.

Json:

{ 
"_id":"123adfvssw",
"content_type":"video",
"content_id":"12345",
"commenter":{ 
"display_name":"student1",
"name":"student1",
"type":"user",
},
"source":"chat",
"state":"published",
"message":{ 
"body":"Hi",
"fragments":[ 
{ 
"text":"Hi"
}
],
"is_action":false
},
"more_replies":false
}

{ 
"_id":"123adfvssw",
"content_type":"video",
"content_id":"12345",
"commenter":{ 
"display_name":"student2",
"name":"student2",
"type":"user",
},
"source":"chat",
"state":"published",
"message":{ 
"body":"Hey!",
"fragments":[ 
{ 
"text":"Hey"
}
],
"is_action":false
},
"more_replies":false
}

{ 
"_id":"123adfvssw",
"content_type":"video",
"content_id":"12345",
"commenter":{ 
"display_name":"student1",
"name":"student1",
"type":"user",
},
"source":"chat",
"state":"published",
"message":{ 
"body":"How are you?",
"fragments":[ 
{ 
"text":"How are you?"
}
],
"is_action":false
},
"more_replies":false
}





总的来说,该主题收到了3个评论者。但是,student1不止一次发表评论。所以回想起来,这个帖子中只有两个独特的评论者。我的问题是如何确保我只使用JSON中的_id字段来计算唯一的评论者?我能够计算文本中的所有评论者字段,但我无法计算独特的评论者。我写的初始代码统计了打印3的所有评论者字段。然而,真正的答案是2,因为student1评论了两次。我现在正试图将评论者的_id放在数组/列表中,以便我可以计算唯一的ID。但是,我在通过循环存储多个值时遇到一些麻烦。如果可以,请帮忙。



我尝试过:





In all, the topic received 3 commenters. However, student1 commented more than once. So in retrospect, there are only two unique commenters in this thread. My question is how do I ensure that I only count the unique commenters using their _id field in the JSON? I am able to count all the commenter fields in the text but I am unable to count the unique commenters. The initial code I wrote counts all the commenters field which prints 3. However, the real answer is 2 since student1 commented twice. I am now trying to put the commenter's _id in an array/list so that I can count the ids that are unique. However, I am having some trouble storing the multiple values through a loop. Please help if you can.

What I have tried:

Code that Prints Number of Commenters Field:

import json

import  requests

from collections import Counter

files ="/chatinfo.txt"

with open(files) as f:

    commenters = 0

    for line in f:

        jsondata = json.loads(line)

        if "commenter" in jsondata:

            commenters += 1


print(commenters)

Output
3





尝试获取数组/列表中的Commenter _id Field值以进行比较并仅计算唯一的评论者_id:



An attempt at getting the Commenter _id Field value in an array/list to compare and only count unique commenters _id:

import json
files = "/chatinfo.txt"
with open(files) as f:
	num_with_field = 0
	for line in f:
		jsondata = json.loads(line)
		dictjson = json.dumps(jsondata)
		if "commenter" in jsondata:
			commenterid = []
			commenterid.append(jsondata["commenter"]["_id"])
			print(commenterid)

Output:
			
['193984934']
['157255102']
['100365638']



____________



然而,在此之后,我试着看看数组/列表中有什么。我得到['100365638']而不是所有三个值。




____________

However, after this, I try to see what's in the array/list. I get ['100365638'] instead of all three values.

print(commenterid)




Output

['100365638']





在这三个中,看起来数组中只存储了1个值/ list commenterid。



问题1:

任何人都可以帮助我使用循环填充我需要的三个值的数组/列表?数组/列表应包含['193984934'] ['157255102'] ['100365638']。



问题2:



此外,我如何计算该阵列中的唯一ID?到目前为止,我只看到了如何计算ID的频率。





Out of the three, it looks like only 1 value was stored in the array/list commenterid.

Problem 1:
Can anyone help me with filling my array/list with the three values I need using the loop? The array/list should contain ['193984934']['157255102']['100365638'].

Problem 2:

In addition, how can I count the unique ids in that array? So far I've only seen how to count the frequency of the ids.

Counter(commenterid).values() # counts the elements' frequency.





你认为



Do you think

len(set(commenterid))

会起作用吗?另外,如果你有更好的方法来做这个,除了在数组或列表中存储我需要的值,我很乐意看到它。在此先感谢。

would work? Also if you have a better way of doing this other than storing the values I need in an array or list I would love to see it. Thanks in advance.

推荐答案

你需要使用字典并检查你的新 id ,或者您只需为 id 增加一个计数器: 5。数据结构 - Python 3.7.0文档 [ ^ ]
You need to use a dictionary and check if your new id has been added before adding or you could just increment a counter for the id: 5. Data Structures — Python 3.7.0 documentation[^]


我在 https://www.codeproject.com/Questions/1260731/How-do-I-count-the-unique-commeters-in-json-text-f [ ^ ]。但由于某种原因你删除了它。
I gave you a suggestion at https://www.codeproject.com/Questions/1260731/How-do-I-count-the-unique-commeters-in-json-text-f[^]. But for some reason you deleted it.


这篇关于Python,如何统计json文本文件中的独特纪念品?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆