如何通过过滤和排序“重复"比较两个列表价值观 [英] How to compare two lists by filtering and sorting "repeated" values

查看:45
本文介绍了如何通过过滤和排序“重复"比较两个列表价值观的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下用于电子邮件活动的 act2.txt 文件:

I have the following act2.txt file for an email campaign:

2021-04-02//email@example.com//Enhance your presentation skills in 15 minutes//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Open
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-11//email@example.com//Enroll in the presentations skills - FREE WEBINAR//Delivered
2021-04-16//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered
2021-04-01//email@example.com//Enhance your presentation skills in 15 minutes//Delivered
2021-04-09//email@example.com//we are here to help you improve your skills//Delivered
2021-04-12//email@example.com//(1st meeting) here is our recorded presentation skills webinar//Delivered
2021-04-13//email@example.com//YOU ARE INVITED TO THIS PROGRAMMING EVENT//Delivered

我想按客户跟踪电子邮件活动 - 我计算了已发送的电子邮件、已发送的电子邮件,然后是打开率.

I want to track email activity by customer - I calculated delivered emails, sent emails then open rate.

我生成了两个列表,一个用于发送的电子邮件,另一个用于打开的电子邮件:

I generated two lists, one for delivered emails and another for the opened emails:

import re
from pprint import pprint

#read the file with activities separated by //
afile = "act2.txt"
afile_read = open(afile,"r")
lines = afile_read.readlines()

activityList = []
for activities in lines:
            activity = activities.split("//")
            date = activity[0]
            customer_email = activity[1]
            email_title = activity[2]
            action = activity[3]
            stripped_line = [s.rstrip() for s in activity]
            activityList.append(stripped_line)

#print (activityList)


stripped_email = 'email@example.com'
email_actions = [x for x in activityList if stripped_email in x[1]]
delivered = [x for x in email_actions if 'Delivered' in x]
Opened = [x for x in email_actions if 'Open' in x]
delcount = (len(delivered))
opencount = (len(Opened))
try:
  Open_rate =  opencount / delcount * 100
except ZeroDivisionError:
  Open_rate = 0
print (stripped_email,",", delcount,",", opencount,",", Open_rate,"%")

pprint(delivered)
pprint (Opened)

已交付清单:

[['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Delivered'],
 ['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Delivered'],
 ['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Delivered'],
 ['2021-04-16',
  'email@example.com',
  'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
  'Delivered'],
 ['2021-04-01',
  'email@example.com',
  'Enhance your presentation skills in 15 minutes',
  'Delivered'],
 ['2021-04-09',
  'email@example.com',
  'we are here to help you improve your skills',
  'Delivered'],
 ['2021-04-12',
  'email@example.com',
  '(1st meeting) here is our recorded presentation skills webinar',
  'Delivered'],
 ['2021-04-13',
  'email@example.com',
  'YOU ARE INVITED TO THIS PROGRAMMING EVENT',
  'Delivered']]

打开的列表:

[['2021-04-02',
  'email@example.com',
  'Enhance your presentation skills in 15 minutes',
  'Open'],
 ['2021-04-11',
  'email@example.com',
  'Enroll in the presentations skills - FREE WEBINAR',
  'Open']]

我想比较两个列表并生成第三个(组合活动),按电子邮件主题过滤 - 如果主题在已发送列表和打开列表中,那么它将被视为一个活动.但是,电子邮件主题可以重复,就像电子邮件发送了 3 次但只打开了一次一样.由于我还在学习 python,我找不到正确的逻辑.

I want to compare both lists and generate a third one (combined activity), filtered by the email subject - if the subject is in delivered list and opened list then it will be counted as a one activity. However, email subject could be repeated like the email was delivered 3 times but it was opened only once. I cannot find the proper logic to to that as I am still learning python.

编辑以提高清晰度:

如果在按标题过滤的打开列表中找到一封电子邮件,则应在最后日期之前从发送列表中删除相同的标题,并生成一个包含组合活动的新列表.

If an email is found in open list filtered by title, then the same title should be removed from the delivered list by last date and a new list is generated with combined activities.

推荐答案

你需要以不同的方式思考这个问题,你不是在组合列表.

You need to think of this in a different way, you are not combining lists.

如果一封电子邮件被打开,这意味着它也被收到了.这意味着您打开的列表也是您的组合列表.

If an email was opened, that means it was also received. This means that your opened list is also your combined list.

意识到这一点后,您要做的就是将未打开的电子邮件复制到未打开电子邮件的结果列表中.

After you realize that, all you have to do is copy the unopened emails to a result list for emails that ere not opened.

查看打开的邮件列表并将主题复制到一个集合中,然后查看收到的电子邮件并检查主题是否在集合中,如果是则什么都不做.如果主题不在集合中,则将其复制到未打开的电子邮件列表中.

Go over the opened emails list and copy the subjects into a set, after that go over the received emails and check if the subject is in the set, if it is then do nothing. If the subject isn't in the set then copy it to unopened emails list.

这是一段非常简单的代码:

It is a very simple piece of code:

opened_subjects = set()
unopened = []
for email in opened:
    opened_subjects.add(email[2])

unopened_subjects = set()
for email in received:
    if all(email[2] not in subj_set 
           for subj_set in (opened_subjects, unopened_subjects)):
        unopened.append(email)
        unopened_subjects.add(email[2])

print('Both received and opened:', opened)
print('Unopened emails:', unopened)

一个小笔记-
每个集合的原因是不同的.第一个集合 opened_subjects 之所以存在是因为 set 能够只包含唯一的项目,而这正是本例中所需要的.第二个集合 unopened_subjects 存在是因为检查一个项目是否在集合中比在列表中更快,因为我在以任何方式添加到集合之前正在检查,那么不需要设置仅存储唯一性的能力.

A small note -
The reason for the each of the sets is different. The first set opened_subjects is there because of the set's ability to contain only unique items, and that is what is required in this case. The second set unopened_subjects is there because it is faster to check if an item is in a set than in a list, seeing as I am checking before adding to the set any way then there is no requirement for the set ability to store unique only.

这篇关于如何通过过滤和排序“重复"比较两个列表价值观的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆