问题
我想使用 zip 来解析(可能)不同长度的两个生成器:
- for el1, el2 in zip(gen1, gen2):
- print(el1, el2)
复制代码
但是,如果 gen2 的元素较少,它将“消耗”元素。一个额外的 gen1 元素。
例如,
- def my_gen(n:int):
- for i in range(n):
- yield i
- gen1 = my_gen(10)
- gen2 = my_gen(8)
- list(zip(gen1, gen2)) # Last tuple is (7, 7)
- print(next(gen1)) # printed value is "9" => 8 is missing
- gen1 = my_gen(8)
- gen2 = my_gen(10)
- list(zip(gen1, gen2)) # Last tuple is (7, 7)
- print(next(gen2)) # printed value is "8" => OK
复制代码
显然,缺少一个值(在我之前的示例中为 8 ),因为它在意识到 gen1 没有更多元素之前已被读取(因此产生值 8 )。但是这个价值在宇宙中消失了。不存在这样的“问题”。当 gen2 是“更长”时。
问题:有没有办法检索这个缺失值(即前面示例中的 gen2)?
注意:我目前正在使用 8 以另一种方式实现它,但我真的很想知道如何获得这个值。
回答
一种方法是实现一个允许您缓存最后一个值的生成器:
- class cache_last(collections.abc.Iterator):
- """
- Wraps an iterable in an iterator that can retrieve the last value.
- .. attribute:: obj
- A reference to the wrapped iterable. Provided for convenience
- of one-line initializations.
- """
- def __init__(self, iterable):
- self.obj = iterable
- self._iter = iter(iterable)
- self._sentinel = object()
- @property
- def last(self):
- """
- The last object yielded by the wrapped iterator.
- Uninitialized iterators raise a `ValueError`. Exhausted
- iterators raise a `StopIteration`.
- """
- if self.exhausted:
- raise StopIteration
- return self._last
- @property
- def exhausted(self):
- """
- `True` if there are no more elements in the iterator.
- Violates EAFP, but convenient way to check if `last` is valid.
- Raise a `ValueError` if the iterator is not yet started.
- """
- if not hasattr(self, '_last'):
- raise ValueError('Not started!')
- return self._last is self._sentinel
- def __next__(self):
- """
- Retrieve, record, and return the next value of the iteration.
- """
- try:
- self._last = next(self._iter)
- except StopIteration:
- self._last = self._sentinel
- raise
- # An alternative that has fewer lines of code, but checks
- # for the return value one extra time, and loses the underlying
- # StopIteration:
- #self._last = next(self._iter, self._sentinel)
- #if self._last is self._sentinel:
- # raise StopIteration
- return self._last
- def __iter__(self):
- """
- This object is already an iterator.
- """
- return self
复制代码
要使用此选项,请将输入包装到 zip 中:
- gen1 = cache_last(range(10))
- gen2 = iter(range(8))
- list(zip(gen1, gen2))
- print(gen1.last)
- print(next(gen1))
复制代码
重要的是让 gen2 成为一个迭代器而不是一个可迭代的,这样你就可以知道哪个已经用完了。如果 gen2 用尽,则无需检查 gen1.last 。
另一种方法是重写 zip 以接受可变的迭代序列而不是单个迭代。这将允许您用包含“peek”的链接版本替换可迭代对象。物品:
- def myzip(iterables):
- iterators = [iter(it) for it in iterables]
- while True:
- items = []
- for it in iterators:
- try:
- items.append(next(it))
- except StopIteration:
- for i, peeked in enumerate(items):
- iterables[i] = itertools.chain([peeked], iterators[i])
- return
- else:
- yield tuple(items)
- gens = [range(10), range(8)]
- list(myzip(gens))
- print(next(gens[0]))
复制代码
这种方法有很多问题。它不仅会丢失原始的可迭代对象,还会丢失原始对象可能通过将其替换为链对象而具有的任何有用属性。
|