Python遍历列表并返回“无序".价值观 [英] Python loop through list and return "out of sequence" values

查看:99
本文介绍了Python遍历列表并返回“无序".价值观的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑此列表:

dates = [
    ('2015-02-03', 'name1'),
    ('2015-02-04', 'nameg'),
    ('2015-02-04', 'name5'),
    ('2015-02-05', 'nameh'),
    ('1929-03-12', 'name4'),
    ('2023-07-01', 'name7'),
    ('2015-02-07', 'name0'),
    ('2015-02-08', 'nameh'),
    ('2015-02-15', 'namex'),
    ('2015-02-09', 'namew'),
    ('1980-12-23', 'name2'),
    ('2015-02-12', 'namen'),
    ('2015-02-13', 'named'),
]

我该如何识别那些日期不正确的日期.我不在乎他们是否重复或跳过,我只需要那些脱节.即,我应该回来:

How can I identify those dates that are out of sequence. I don't care if they repeat, or skip, I just need the ones way out of line. Ie, I should get back:

('1929-03-12', 'name4'),
('2023-07-01', 'name7'),
('2015-02-15', 'namex'),
('1980-12-23', 'name2'),

Namex不太明显,但不是按列表的一般顺序.

Namex is less obvious, but it's not in the general order of the list.

我的简单起见(为了简化问题已删除),显然是不完整的.

My simplistic start (which I have deleted to simplify the question) is obviously woefully incomplete.

更新:根据评论,最长增加子序列(LIS)将使我入门,这是在此处找到的python实现:

Update: Based on the comments, it seems an implementation of the Longest Increase Subsequence (LIS) will get me started, a python implementation found here:

  • https://stackoverflow.com/a/9832414/1061836
  • How to determine the longest increasing subsequence using dynamic programming?
  • https://rosettacode.org/wiki/Longest_increasing_subsequence#Python
  • https://codereview.stackexchange.com/questions/10230/python-implementation-of-the-longest-increasing-subsequence

获得LIS后,似乎可以将其与原始列表进行比较,看看差距在哪里……令人着迷. SO是令人敬畏的蜂巢.

Seems once I get the LIS, I can compare it to the original list and see where the gaps are... Fascinating. SO is the hive-mind of awesomeness.

推荐答案

简短答案,一般解决方案

使用我的回答最长子序列增加"问题,可以将其简单地实现为:

Short answer, general solution

Using my answer to the "Longest increasing subsequence" question, this could be implemented simply as:

def out_of_sequence(seq):
  indices = set(longest_subsequence(seq, 'weak', key=lambda x: x[0], index=True))
  return [e for i, e in enumerate(seq) if i not in indices]

更长的答案,具体解决方案

基于代码审查中的问题

Longer answer, specific solution

Based on the question at Code Review and a question about non-decreasing sequences (since that's what you're after), here's a solution to your problem:

from bisect import bisect_right
from operator import itemgetter


def out_of_sequence(seq, key = None):
  if key is None: key = lambda x: x 

  lastoflength = [0] # end position of subsequence with given length
  predecessor = [None] # penultimate element of l.i.s. ending at given position

  for i in range(1, len(seq)):
    # find length j of subsequence that seq[i] can extend
    j = bisect_right([key(seq[k]) for k in lastoflength], key(seq[i]))
    # update old subsequence or extend the longest
    try: lastoflength[j] = i
    except: lastoflength.append(i)
    # record element preceding seq[i] in the subsequence for backtracking
    predecessor.append(lastoflength[j-1] if j > 0 else None)

  indices = set()
  i = lastoflength[-1]
  while i is not None:
    indices.add(i)
    i = predecessor[i]

  return [e for i, e in enumerate(seq) if i not in indices]


print(*out_of_sequence(dates, itemgetter(0)), sep='\n')

输出:

('1929-03-12', 'name4')
('2023-07-01', 'name7')
('2015-02-15', 'namex')
('1980-12-23', 'name2')


key参数(受 sorted 启发内置)指定一个参数的函数,该参数用于从每个列表元素中提取比较键.默认值为None,因此调用方可以方便地说出我想直接比较元素".如果将其设置为None,我们将lambda x: x用作身份函数,因此元素在比较之前没有任何改变.


The key parameter (inspired by sorted builtin) specifies a function of one argument that is used to extract a comparison key from each list element. The default value is None so the caller has a convenient way of saying "I want to compare the elements directly". If it is set to None we use lambda x: x as an identity function, so the elements are not changed in any way before the comparison.

您要使用日期作为比较的键,因此我们使用

In your case, you want to use the dates as keys for comparison, so we use itemgetter(0) as key. And itemgetter(1) would use the names as key, see:

>>> print(*map(itemgetter(1), dates))
name1 nameg name5 nameh name4 name7 name0 nameh namex namew name2 namen named

使用itemgetter(k)等效于lambda x: x[k]:

>>> print(*map(lambda x: x[1], dates))
name1 nameg name5 nameh name4 name7 name0 nameh namex namew name2 namen named

map一起使用等效于生成器表达式:

Using it with map is equivalent to a generator expression:

>>> print(*(x[1] for x in dates))
name1 nameg name5 nameh name4 name7 name0 nameh namex namew name2 namen named

但是,如果我们使用类似的列表理解将序列传递给out_of_sequence,我们将得到与预期不同的结果:

But if we used a similar list comprehension to pass the sequence to out_of_sequence we would get a different result from expected:

>>> print(*out_of_sequence([x[0] for x in dates]), sep='\n')
1929-03-12
2023-07-01
2015-02-15
1980-12-23

同样,如果我们直接比较日期-名称对,则会得到错误的结果(因为'nameg''name5'大):

Likewise, if we compare the date-name pairs directly we get wrong results (because 'nameg' compares greater to 'name5'):

>>> print(*out_of_sequence(dates), sep='\n')
('2015-02-04', 'nameg')
('1929-03-12', 'name4')
('2023-07-01', 'name7')
('2015-02-15', 'namex')
('1980-12-23', 'name2')

因为我们要返回日期和名称,并且只想按日期排序,所以我们需要传递一个使用key参数提取日期的函数.

Because we want to return dates and names, and we want to order by dates only, we need to pass a function that extracts dates using the key parameter.

一种替代方法是摆脱key而只写:

An alternative would be to get rid of key and just write:

j = bisect_right([seq[k][0] for k in lastoflength], seq[i][0])

但是由于这是stackoverflow,也许有一天会有另一个人来回答这个问题,并且需要提取其他一些密钥,因此我决定在此处发布更通用的解决方案.

But since this is stackoverflow, maybe one day another person will come by this answer and will need some other key extraction, therefore I decided to post the more general solution here.

这篇关于Python遍历列表并返回“无序".价值观的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆