Iterator,Iterables & the IteratorProtocol
Iteration is the fundamental technique which is supported by every programming language in the form of loops. The most common one at least from is the For loop and if specifically talk about Python's case, we have For each loop. For each loop are powered by iterators. An iterator is an object that does the actual iterating and fetches data one at a time and on-demand.
Let's take a step back and look back at some of the common terms which would help us in understanding iterators even better.
iterables: anything that can be iterated over is called an iterable.
for item in some_iterable: print(item)
sequences: Sequences are iterables which can be indexed.
numbers = [1,2,3,4] tuples = (1,2,3) word = 'Hello world'
The iter function
Iter is built-in function and whenever the interpreter needs to iterator over an object, it automatically calls the iter().
The iter() function returns an iterator.
When the iter function is called it does three things:
- Checks whether the object implements
__iter__method. (To see this just do dir() on the object.)
- If the
__iter__method is not present but the
__getitem__is implemented, python creates an iterator that fetches the items in order, starting from the index zero.
- If that fails a TypeError is raised stating “ Object is not iterable”.
numbers = [1,2,3,4] num = iter(numbers) # Builds an iterator 'num'
Looking at the code snippet above we can make a better definition of an iterable.
*Any object which the
__iter__ built-in function can be called an iterable.*
Before moving forward let's look at nifty little way the iter() works with functions to make them work as an iterator.
Let's build a die roller that rolls a die from 1-6 and stops when the die hits 1.
In this usage we need to make sure of two things:
- That the iter function must receive a callable that will be invoked every time the next function is called and the callable function should not have any arguments.
- The second argument which is called the sentinel which acts as a flag will cause the iterator to raise an exception instead of returning the second argument.
def die_roll(): return randint(1,6) roller = iter(die_roll, 1) print(type(roller)) # <class 'callable_iterator'> for roll in roller: print(roll) """ Output: 5 6 3 2 """
Iterable vs Iterator
Python obtains an iterator from an iterable. Let's look at the for-each loop again to see how everything fits in the picture.
numbers = [1,2,3,4] for number in numbers: print(number)
Looking at the code above we can only see the iterable i.e numbers. But what about the iterator? What about the iter() ? Isn't it suppose to use both to work.
Here we can't see the iterator or the iter() in action but it's working behind the scene. Let's re-write the whole statement in a while loop so we can see how it all fits together.
numbers = [1,2,3,4] num = iter(numbers) # builds an iterator while True: try: print(next(num)) except StopIteration: del num break
The flow of the above code is simple:
- Iterator num is created from the iterable.
- To obtain the value from the iterator next is used.
- Iterator raises the StopIteratioin error when there are no further items left.
- We delete the iterator and break out of the loop.
You must be wondering everything is fine but why did we delete the iterator.
Iterators have this property that they are one-directional and once all the item is iterated over they can't be reset to the original state.
Thus the StopIteration signals that the iterator is exhausted. Thus it's best to delete it.
Writing your own iterator
Python iterator objects are required to support two methods
__iter__ and the
iter method returns self. This allows iterators to be used where an iterable is expected i.e “for” and “in” keywords.
next method returns the next available item, raising the StopIteration when there are no more items to be looped through.
Let's bundle this knowledge and build our very own Range built-in function.
class _Range: def __init__(self, start, end, step = 1): self.start = start self.end = end - 1 self.step = step def __iter__(self): return self def __next__(self): if self.start > self.end: raise StopIteration else: self.start += self.step return self.start - 1 numbers = _Range(1, 3) print(next(numbers)) # Result -> 1 print(next(numbers)) # Result -> 2 print(next(numbers)) # Raise a StopIteration Exception
Now that we know how an iterator works let's look back at the definition of an iterator again:
*Any object that implements the
__next__ no-argument method that returns the next item in a series or raises StopIteration when there are no more items is called an Iterator.*
Just a quick tip before moving forward, the optimal way of creating your own iterator is to make a generator function, not by creating a iterator class like we did here.
The iterator objects are required to support the following two methods, which together form the iterator protocol. The
__iter__ and the
- Iterator Protocol powers the all the iteration in python.
- Iterator Protocol also powers the tuple unpacking in Python.
# Tuple unpacking x,y,z = coordinates
- Iterator Protocol also powers the star expressions.
numbers = [1,2,3,4,5] a,b,*rest = numbers print(numbers)
- Most of the built-in functions that require some kind of looping(iterations) in python uses the Iterator Protocol.
Python's tongue twister
Iteratorables are not necessarily iterators but an iterator is necessarily iterable.
Example: Generators are iterators that can be looped over but lists are iterables but not an iterator.
Reasons to use Iterator:
- Iterators allow lazy evaluation possible which saves memory.
- Iterators allow for infinitely long iterables.
Not so common iterators
- Enumerate objects are also iterators.
- Zip objects are also iterators.
- Reversed objects are iterators.
- Files are also iterators.
letters = ['a','b','c','d'] next(enumerate(letters)) # Result -> (0, 'a') next(zip(letters,letters)) # Result -> ('a','a') next(reversed(letters)) # Result -> 'd' next(open('iterator.txt')) # Result -> 'iterator\n'