Iterator,Iterables & the IteratorProtocol

July 29, 2019

Iteration is the fundamental technique which is supported by every programming language in the form of loops. The most common one at least from is the For loop and if specifically talk about Python's case, we have For each loop. For each loop are powered by iterators. An iterator is an object that does the actual iterating and fetches data one at a time and on-demand.

Let's take a step back and look back at some of the common terms which would help us in understanding iterators even better.

iterables: anything that can be iterated over is called an iterable.

for item in some_iterable:
    print(item)

sequences: Sequences are iterables which can be indexed.

numbers = [1,2,3,4]
tuples = (1,2,3)
word = 'Hello world'

The iter function

Iter is built-in function and whenever the interpreter needs to iterator over an object, it automatically calls the iter().

The iter() function returns an iterator.

When the iter function is called it does three things:

Checks whether the object implements __iter__ method. (To see this just do dir() on the object.)
If the __iter__ method is not present but the __getitem__is implemented, python creates an iterator that fetches the items in order, starting from the index zero.
If that fails a TypeError is raised stating “ Object is not iterable”.

numbers = [1,2,3,4]
num = iter(numbers) # Builds an iterator 'num'

Looking at the code snippet above we can make a better definition of an iterable.

*Any object which the __iter__ built-in function can be called an iterable.*

Before moving forward let's look at nifty little way the iter() works with functions to make them work as an iterator.

Let's build a die roller that rolls a die from 1-6 and stops when the die hits 1.

In this usage we need to make sure of two things:

That the iter function must receive a callable that will be invoked every time the next function is called and the callable function should not have any arguments.
The second argument which is called the sentinel which acts as a flag will cause the iterator to raise an exception instead of returning the second argument.

def die_roll():
    return randint(1,6)

roller = iter(die_roll, 1)

print(type(roller)) # <class 'callable_iterator'>

for roll in roller:
    print(roll)

"""
Output:
5
6
3
2
"""

Iterable vs Iterator

Python obtains an iterator from an iterable. Let's look at the for-each loop again to see how everything fits in the picture.

numbers = [1,2,3,4]
for number in numbers:
    print(number)

Looking at the code above we can only see the iterable i.e numbers. But what about the iterator? What about the iter() ? Isn't it suppose to use both to work.

Here we can't see the iterator or the iter() in action but it's working behind the scene. Let's re-write the whole statement in a while loop so we can see how it all fits together.

numbers = [1,2,3,4]
num = iter(numbers) # builds an iterator
while True:
    try:
        print(next(num))
    except StopIteration:
        del num
        break

The flow of the above code is simple:

Iterator num is created from the iterable.
To obtain the value from the iterator next is used.
Iterator raises the StopIteratioin error when there are no further items left.
We delete the iterator and break out of the loop.

You must be wondering everything is fine but why did we delete the iterator.

Iterators have this property that they are one-directional and once all the item is iterated over they can't be reset to the original state.

Thus the StopIteration signals that the iterator is exhausted. Thus it's best to delete it.

Writing your own iterator

Python iterator objects are required to support two methods __iter__ and the __next__ method.

iter method returns self. This allows iterators to be used where an iterable is expected i.e “for” and “in” keywords.

next method returns the next available item, raising the StopIteration when there are no more items to be looped through.

Let's bundle this knowledge and build our very own Range built-in function.

class _Range:
    def __init__(self, start, end, step = 1):
        self.start = start
        self.end = end - 1 
        self.step = step

    def __iter__(self):
        return self

    def __next__(self):
        if self.start > self.end:
            raise StopIteration
        else:
            self.start += self.step
            return self.start - 1

numbers = _Range(1, 3)
print(next(numbers)) # Result -> 1
print(next(numbers)) # Result -> 2
print(next(numbers)) # Raise a StopIteration Exception

Now that we know how an iterator works let's look back at the definition of an iterator again:

*Any object that implements the __next__ no-argument method that returns the next item in a series or raises StopIteration when there are no more items is called an Iterator.*

Just a quick tip before moving forward, the optimal way of creating your own iterator is to make a generator function, not by creating a iterator class like we did here.

Iterator Protocol

The iterator objects are required to support the following two methods, which together form the iterator protocol. The __iter__ and the __next__ method.

iterator.__iter__()
iterator.__next__()

Iterator Protocol powers the all the iteration in python.
Iterator Protocol also powers the tuple unpacking in Python.

# Tuple unpacking
x,y,z = coordinates

Iterator Protocol also powers the star expressions.

numbers = [1,2,3,4,5]
a,b,*rest = numbers 
print(numbers)

Most of the built-in functions that require some kind of looping(iterations) in python uses the Iterator Protocol.

Python's tongue twister

Iteratorables are not necessarily iterators but an iterator is necessarily iterable.

Example: Generators are iterators that can be looped over but lists are iterables but not an iterator.

Reasons to use Iterator:

Iterators allow lazy evaluation possible which saves memory.
Iterators allow for infinitely long iterables.

Not so common iterators

Enumerate objects are also iterators.
Zip objects are also iterators.
Reversed objects are iterators.
Files are also iterators.

letters = ['a','b','c','d']
next(enumerate(letters)) # Result -> (0, 'a')
next(zip(letters,letters)) #  Result -> ('a','a')
next(reversed(letters)) #  Result -> 'd'
next(open('iterator.txt')) #  Result -> 'iterator\n'