Pradhvan

Python geek who loves to play around with web technologies.

Interrogating docstrings, Django's custom user model, Python's ast and tokenize module.

Oh boy! The week was more fun than I had planned.

Let's walk through each of them one at a time:

  • After finishing one of the courses on Django Celery. I wanted to put my knowledge to the test by building a knockoff version of Buttondown. Started a project called Hermes. Currently, Hermes has all the models done, celery setup is done and basic auth in place. Wanted to implement a custom user model so that the username field can be emailed instead of a username. So currently stuck with a bug related to that hopefully, will be resolved by next week.

  • I picked an issue in a library interrogate. interrogate gives a coverage report of missing docstrings and currently does not have a skip function like noqa. So I picked up the issue to add that. On a quick glance saw that ast module was being internally used. Though ast module doesn't include comments. The tokenize module can give you comments but doesn't provide other program structures. So I guess I need to mix and match both to add the feature.

  • Last year I sprinted for scanapi at EuroPython sprints 2020. I moved back to the project this week. Started back again by adding some docs, adding issue templates, and docstring coverage to the project. Also, this is how I stumbled upon interrogate.

I also managed to stay on course with my yearly health goal. Meditated daily and spent a decent 40min doing some cardio/core exercises on the working weekdays.

That's it for this week. Until next week. Stay safe folks.

#weeknotes2021

Last weekend I attended EuroPython sprints that were virtually conducted. The communication platform for the conference was discord and was kept the same for the sprints too. It served a good platform as we were able to pair program with the maintainer by sharing our screens.

Day 1

Sprints opened at 12:30 PM IST and started with its first round of project introduction. A total of 12 projects that took part in this year's sprint. Though the project maintainers were from varied timezone and timezones are difficult to handle. The first opening of sprints only had a few maintainers of the project to talk about their project.

The project that I started off in the day one of the sprints was terminusdb. I primarily contributed to terminudb's python client which had Cheuk Ting Ho and Kevin Chekov Feeney to help us out. Kevin had coded the JS Client of the project and was here to work on the Python Client.

The issue I picked up was increasing the test coverage of the project and while working on that issue I also discovered some other issues. Some depreciated function was still being used in the client and the make file did not have a command to generate coverage HTML of the project.

By the end of day one, I had moved the coverage of terminusdb_client/woqlclient/connectionConfig.py to 70% from 62% with a PR to remove the deprecated function from the client. Doing that I learned about graph databases and how terminusdb has git like features for the database.

Day 2

I started late on the second day and continued to work on the test coverage PR. I fixed some minor flake8 errors in my test coverage PR and pushed to coverage to 75% and created a PR for that make file command. A lot of people in sprints were confused in setup of project. So opened up a documentation issue for writing the wiki for setup instructions and contributions guidelines for new/first time contributors.

Just an hour before the first closing session I moved to scanapi which is maintained by Camila Maia. I picked up some good first issues and got them merged in no time. I saw this project at the closing of the day-1 and found it very interesting.

The other projects that I really found interesting but could not contribute to were Hypothesis, strawberry GraphQL and commitizen.

Overall I had a really fun weekend and I am excited to contribute more to those projects.

I recently stumbled across a very peculiar topic called Bit Manipulation. In most of my programming days, I haven't actually relied on the binary operation to get me the result, I know under the hood everything is converted into 0's and 1's but it was all abstraction to me.

The case was different here. While working with Bit Manipulation, I had to actually rely on arithmetic bit operations to get me to the result. So it became real interesting real soon.

Bitwise operators

Basic operation done on bits are done with bitwise operators. Since we primarily work on bits these operations are fast and are optimized to reduce time complexity.

The first three &, | and ~ are fairly straightforward. So I would briefly go over it.

&: if both bits are of equal size than & operator would compare each position and would return True/1 if input bits are True/1. Similarly for False/0.

    6       : 1 1 0
    5       : 1 0 1
            -------- &
              1 0 0

|: if both bits are of equal size than & operator would compare each position and would return True/1 if input bits differ. Similarly for False/0.

     5       : 1 0 0
     3       : 0 1 1
            --------  |
              1 1 1

~: Not operator just compliments the bit it gets. In fancy computer lingo it gives one’s complement of a number.

    5       : 1 0 1
            -------- ~
              0 1 0

Now coming to more interesting operators:

Operator Name
^ XOR
>> Right Shift
<< Left Shift
XOR

If two bits are of two equal-size ^ of both bits in the compared position would be 1 if compared bits are of different binary and would be 0 if bot the compared bits are the same.

    6       : 1 1 0
    5       : 1 0 1
            -------- ^
              0 1 1
  • XOR of a number with itself is 0

    x = "Any int number"
    (x ^ x) == 0
    
  • XOR of a number with 0 is number itself.

    (x ^ 0) == 0
    
  • Ordering in XOR does not matter, both will give the same output.

    output = (7 ^ 3) ^ (5 ^ 4 ^ 5) ^ (3 ^ 4)
    output = 7 ^ (3 ^ (5 ^ 4 ^ 5)) ^ (3 ^ 4)
    

While discussing Left Shift,<< and Right Shift, >> we will be talking about arithmetic shifts.

Left shift <<

  • Left shift shifts the binary digits by n, pads 0’s on the right.
  • Left shift is equivalent to multiplying the bit pattern with 2 power k( if we are shifting k bits )
1 << 1 = 2 = 1 * (2  ** 1) 
1 << 2 = 4 = 1 *(2  ** 2) 
1 << 3 = 8 = 1 * (2  ** 3)
1 << 4 = 16 = 1* (2  ** 4)
…
1 << n = 2n

Right shift >>

  • Shifts the binary digits by n, pads 0's on the left.
  • Right shift is equivalent to dividing the bit pattern with 2k ( if we are shifting k bits ).
4 >> 1 = 2
6 >> 1 = 3
5 >> 1 = 2
16 >> 4 = 1

Both Right shift and Left shift operators come real handy in masking.

Masking allows the user to check/change a particular bit at a particular position.

Some of the common functions associated with masking are:

Set Bit
  • The set bit method is generally used to SET a particular with 1.
  • To achieve this we would need to create a mask at the particular position where we want to SET
  • The mask can be created with the help of the << if the left shift operator.
def set_bit(x, position):
    mask = 1 << position
    return x | mask

set_bit(6,1)
  • In the above code snippet we are SETing the bit at 0th index.
    masking = 1 << 0 = 1 * (2 ** 0) 
    
    6       : 1 1 0
    1 << 0  : 0 0 1
            -------- |
              1 1 1
IS BIT SET
def is_bit_set(x, position):
    shifted = x >> position
    return shifted & 1
Clearing Bit
def clear_bit(x, position):
    mask = 1 << position
    return x & ~mask
Flip Bit
def flip_bit(x, position):
    mask = 1 << position
    return x ^ mask
Modify Bit
def modify_bit(x, position, state):
    """
    state is param that tells us to set a bit 
    or clear a bit
    """
    mask = 1 << position
    return (x & ~mask) | (-state & mask)

Observations

Bit manipulation can be used to solve problems that you are familiar with but necessarily don't know about. Here are some of my observations that I noted while using bit manipulation.

To check if the number is even
  • & ANDing the number with 1 gives 0 or 1 — 0 if it's even — 1 if it's odd
x = "Any int number here"
(x & 1) == 0

Practice Question

To check if the number is a power of two
  • If a number is x binary representation of (x-1) can be obtained by simply flipping all the bits to the right of rightmost 1 in x and also including the rightmost 1.
Let, x = 4 = (100)2
x - 1 = 3 = (011)2
Let, x = 6 = (110)2
x - 1 = 5 = (101)2
  • x & (x-1) will have all the bits equal to the x except for the rightmost 1 in x. In the given example below the values enclosed in || are the same for both the x and x-1 if x is not the power of 2.
  • If the number is neither zero nor a power of two, it will have 1 in more than one place.
Let, x = 6 = 1|1|0
(x- 1) = 5 = 1|0|1

Let,x = 16 = |1|0000
(x-1) = 15 = |0|1111

Let,x = 8 = |1|000
(x-1) = 7 = |0|111

Let,x = 23 = 1011|1|
(x-1) = 22 = 1011|0|
x = "Any int number here"
(x & x-1) == 0

There are a lot more things that can be done with just bits and are definitely not limited to the above observations. Try to find your own observations. Happy coding!

I recently finished reading Python Testing with Pytest by Brian Okken and I am glad I picked this up rather than jumping into the docs. It's definitely a good introduction for people who haven't had their share of testing a python codebase, let alone be with Pytest.

The book introduces a python CLI called Tasks and takes this as a base for writing all of its tests throughout the course of the book. Though eventually, the tests become more complex when you get into the latter half of the book.

The pros of the book are that it covers almost every section of the framework from fixtures, plugins, custom pytest configuration and even using pytest with tools like coverage and mock. But if you're someone like me who hasn't had his share of testing a python codebase you might find yourself with a bit of information overload at times.

I did find the book a bit of overwhelming on chapters like writing your own plugin, custom configuration and using pytest with Jenkins because these are the features that I wouldn't be using right out of the box. I would definitely be coming back to these chapters in the future if I need any of the features.

Overall the book is really well-written keeping in mind beginners who are just picking up pytest as their first testing framework and also for folks who are moving towards pytest from any other testing framework. Exercises at the back of every chapter make sure you also get some hands-on experience of writing tests.

Just a personal tip for anyone who is picking this up and has less experience with pytest. Feel free to skip chapters or skim chapters that aren't useful right out of the box. You can always come back to them when you need those features.

2019 has been a year of new beginnings both personally and professionally. This was the year I got my first job, the first salary and on the contrary to that, I did give my first resignation. Yeah, that was fun!

This blog just highlights most of the things I did in the previous year.

Blog Posts

I did post out 8 blogs this year. I know it's not that much. Initially, I had planned one blog a month. But by the end of the year during the time I was giving interviews for the new job things started to fall and I could not commit to one blog a month.

The plan for this year is to blog more or at least be consistent with writing. Stick to at least one blog per month.

Books

The previous year was a good reading year compared to the last few years. The Kindle I bought came real handy during the long metro rides. Plus I got some tech books cheap compared to their paperback prices so I did finish some of them too.

This year I started to take up reading non-tech books a bit more seriously. So I am picking up a book a month and finishing it slowly. Keeping in consideration that the book is less than 800-1000 pages for the initial months just to help in making a momentum.

Recently finished Parliamental and will be moving to The Elephant Vanishes.

Talks

I did give one talk at PyConf Hyderabad 2019 one of my favorite regional conferences in India. I also did submit one for a PyDelhi meetup but sadly by the time, it was scheduled I had already relocated. More on that later.

Open Source Contributions

One of the major things that I want to work towards this year is towards making more upstream contributions.

Last year I did submit two document patches to one org aio-libs . The project was aiopg, async version of Postgres but that happened by sheer luck. As I was going through the documentation I found some of the documentation to be using old-styled decorator based coroutines instead of new async def function. So I submitted a patch to update them.

PyCon India is one of those conferences that I look forward to every year. This year marked my fourth conference in a row. I was excited to meet all my old friends and make some new ones.

ChennaiPy the local python user group of Chennai hosted this year's conference. This meant two things I will get to attend the conference in Chennai and also visit some beaches around Pondicherry. So yeah I was super excited.

The journey to the conference started on 11 October, I was traveling from Delhi with two of my friends Kuntal and Sakshi. Since we planned our journey in such a way that we would reach one night before the conference, we missed the pre-conference volunteer's meet. Kuntal and I were in a state of regret of not taking the morning flight as the pre-conference volunteer's meet are super fun. You get to see the venue beforehand, helps out with swag bags and interact with all the volunteers and organizers of the conference.

On reaching the Chennai airport we met with Dedipyaman, he was staying with us. His name was a bit unique so we called him twodee, which he later adopted as his nick. Traveling to our Airbnb apartment was a challenge in itself as none of us knew Tamil. We were staying with 13 other folks, I knew most of them besides one, Shubo. I had seen the nick on the #dgplug but haven't met him in person. When we arrived at the apartment only Shubo was present, rest came in an hour or two. As everyone settled, we played some rounds of Uno while enjoying pizzas just before going to bed.

The next day I left with twodee and Sakshi for the conference, we were running a bit late. When we reached the conference I saw Kuntal at the registration desk. We all got our attendee card and proceeded to the conference. I saw all my old friends, most of them I only personally meet during conferences as they all live in a different state. So it was fun to catch up. After roaming around the sponsors both I went to attend Pradyun's talk. The talk was titled Python Packaging – where we are and where we're headed, I was interested in the talk as only a handful of people maintain pip. Since it's such a huge ecosystem in itself it was interesting to get some insights from Pradyun's talk about how packaging works with pip and how are they planning to move forward. Later in the tea break, I met with Saurav and Haris. I learned a lot from the conversion we had during the tea break. These people have been in tech much before me. Saurav talked about his company Deepsource, how managing a small team with people who take up responsibility is easy. You don't have to worry about those formal things like timesheets, leave policy because people take responsibility for their work. Haris was working in a two-person team and shockingly carried a very old cell phone which didn't even have internet. So his take on life was very interesting.

The next day we had our annual #dgplug staircase meeting, this year since Kushal was sick. Sayan took the initiative of conducting the meeting. We discussed the first staircase meeting, what went wrong in this year's summer training that people weren't completing their tasks, weren't showing up in the IRC channel and what needs to be done now. I meet lambainsaan who I had always thought was a bot.

The meeting concluded at noon and it was just in time for me to catch up the talk “Let's hunt a memory leak” so I ran to the hall to get a good spot. Sanket was the speaker, he showed us various ways how he solved memory leak problems in a flask app in production while describing the whole memory management concept in Python. I rushed for lunch after the talk as I had to be in open spaces for the PyDelhi's session.

Anuvrat had registred the open space for PyDelhi and other communities of the north. The whole agenda of the open space was how to be consistent while conducting the meetup, what can we do in the meetup we get people to come often and how can we increase the quality of the talks. I liked one idea of pushing all the 101 sessions to blog posts or even hangout sessions a day before the event so we aren't limiting the target audience to just people who are starting in tech. Of what we have been observing in the recent meetups, experienced people who can help mentor people and give great talks have stopped attending meetups. The problem is there were a lot of 101 sessions happening. We concluded that we can shift those 101 sessions to blog posts and if someone wants to give a 101 session we can have themed meetups once in one-two months where they can present those talks. The open spaces were scheduled for half an hour but we stretched it a bit longer as more people started adding points to the discussion.

Before the closing keynote of the day I helped in volunteering at hall-B, I was so much excited for the keynote that during the tea break before the closing keynote I went and sat in the second row of the hall just so I can enjoy the talk from a good spot.

The conference ended with David Beazley keynote, he live coded a stack machine, wrote an interpreter for Web Assembly game that was initially written for Rust in Python and in the end added PyGame to make it into an actual game. It was a jaw-dropping moment for me, though I lost in the midway of his talk it was a bit advanced for me. But when I looked around most people were feeling the same. The keynote ended with standing ovation from all the people in the hall. For me, the whole closing keynote was like a movie it was such a joy to just watch David live code and nothing could have been a better way to end a conference.

The last day of our stay in Chennai was a bit weird as there was some issue with water in our apartment so we went a bit late to the workshop. I had bought tickets for David's workshop “Write your own Async”. In the workshop I tried to follow up with him, was writing code just as he would do it but after the second half, I was a bit lost so I just focused on listening to him. It was not exactly like a workshop but more of him giving us a problem and we would discuss the solution to it and he would live code the solution after the discussion. The solutions were so well designed that it would be similar to the inbuilt functions that the Async module has. As I tried to live coded with him so wasn't able to make some detailed notes that I could revisit later.But luckily he uploaded the workshop screencast so I can revise the concepts again.

The day ended with me saying goodbye to all the people that had stayed late during the dev sprints as workshop and devsprints were happening in parallel.

This marked the end to one more year of my PyCon India journey. It was my fourth PyCon India and the most special one. I stayed with people that I look up to in real life and had lots of fun. The funny thing is not all of them use Python as their day to day language yet they came to a conference dedicated towards the language. I guess that's the beauty of the community. You meet so many people from different backgrounds and learn from them which not only helps you be a better developer but also gives a different perspective towards your life.

This is the second part of the series, in the first part we talked about the general idea of concurrency, how it's different from parallelism and saw how Python handles concurrency.

Part 1: Talking Concurrency -1

In the second part of the blog, we will look into the modern solution towards the problem using the new Asyncio module.

Import Asyncio

In the last post, we looked into a basic code snippet on how can we write concurrently. We also discussed some of the basic terminology used while using the Asyncio module. If you don't remember you should quickly take a recap as we would look at those concepts in a bit detailed manner.

Before looking at some code, let's understand some basic terminologies that would help in understanding the code better.

  • Eventloop: it's an infinite loop that keeps track of all the running tasks. It manages all the suspended functions and executes them when the time is right. These functions are stored in the queue called as the Task Queue, the event loop constantly polls the task queue and passes them to the event loop. When a task is passed on to the event loop it returns back a future object.

  • Future: a future is an indirect reference to a forthcoming result. It can loosely be translated as promise you make to do something when a condition is met, so when the condition is met a future can “callback” when ready to be executed. Since everything is an object in python, future is also an object that has the __await__() method implemented and its job is to hold a certain state and result. The state can be one of three things:

Pending: it does not have a result or exception yet. Cancelled: it was canceled Finished: it was finished either with a result or exception.

Futures also have a method called the add_done_callback() this is method allows the function to be called as soon as the task is completed with its process and is returned with a result. Which is the python object that would be returned with the expected result or raise an exception when the task is finished.

  • Tasks: a task executes a coroutine in an event loop. In a program, asyncio.create_task(coroutine) wraps the coroutine into a task and schedules its execution. asyncio.create_task(coroutine) returns a task object. Every time a coroutine is awaited for a future, the future is sent back to the task and binds itself to the future by calling the add_done_callback() on the future. From now on if the state of the future changes from either canceled or finished, while raising an exception or by passing the result as a python object. The task will be called and it will rise back up to its existence.

Since a typical program will have multiple tasks to be executed concurrently, we create normally with asyncio.create_task(coroutine) but we run them with asyncio.gather().

  • Coroutine: Asyncio was introduced in Python 3.4, initially it started off as decorator based coroutines @asyncio.coroutine which used a yield from keyword. Later in Python 3.5 async and await keywords were introduced which made working/reading concurrent code much easier. I won't go into much detailed on how coroutines evolved to the new async def keyword, because I planning to write a separate blog on that.

As we looked into the basic definition of coroutines in the last blog, we can loosely describe them as restartable functions.

You make a coroutine with the help of the async def keyword and you can suspend the coroutine with the await keyword. Every time when you await the function gets suspended while whatever you asked to wait on happens, and then when it's finished, the event loop will wake the function up again and resume it from the await call, passing any result out. Since coroutines evolved from generators and generators are iterators with __iter__() method, coroutines also have __await__() which allows them to continue every time await is called.

At each step a coroutine does three things:

  • It either awaits a future
  • It awaits another coroutine
  • It returns a result.

Before moving forward, I want to talk about await. In Python, anything that can be awaited i.e used with the await keyword is called an awaitable object. The most common awaitable that you would use would be coroutines, futures and tasks. Thus anything is blocking get's put to the event loop using the await and added to the list of paused coroutines.

Now let's look at a very basic async program to understand how everything fits in together.

import asyncio

import asyncio

async def compute(x, y):
    print("Compute %s + %s ..." % (x, y))
    await asyncio.sleep(1.0)
    return x + y

async def print_sum(x, y):
    result = await compute(x, y)
    print("%s + %s = %s" % (x, y, result))

asyncio.run(print_sum())

The sequence diagram below describes the flow of the above program.

tulip_coro.png

Now that we know all the basic terminology used in an async program let's look at a slightly complex code below for getting a better understanding all the jargons we learned above.

import asyncio


async def compute(x, y):
    """
    A coroutine that takes in two values and returns the sum.
    """
    print(f"Computing the value of {x} and {y}")
    await asyncio.sleep(1)
    return x + y


async def print_sum():
    """
    A coroutine that creates tasks.
    """
    value1 = asyncio.create_task(compute(1, 0))
    value2 = asyncio.create_task(compute(1, 0))
    value3 = asyncio.create_task(compute(1, 0))
    print(sum(await asyncio.gather(value1, value2, value3)))

asyncio.run(print_sum())

async def print_sum() and async def compute() are the two coroutines in the above program, the async def print_sum() as the main function used in the sync programming. The main function executes the entire program and all the functions related to it. The same approach is followed here, one coroutine awaits all the other coroutine.

Though this can be easily miss-understood, in that case, the program would just fine but would run in more like a sequential manner.

    value1 = await asyncio.create_task(compute(1, 0))
    value2 = await asyncio.create_task(compute(1, 0))
    value3 = await asyncio.create_task(compute(1, 0))
    print(sum(value1, value2, value3))

The above code can be a good example of how not to write async code, here using await on every task we are making all the calls sync thus making the program sequential. To avoid this asyncio.gather() is used in the program. To gather all the tasks in the program, value1, value2 and value3.

Finally, when all the tasks are gathered together, they are run concurrently.

Sync-Async-Sync

A lot of time you might be in a situation where you might have to call a sync function def from coroutine async def or have to call coroutine async def from sync function def. Ideally, you “shouldn't” use sync functions for calls that can be async like a database call because that is something that could provide further optimization. But there is nothing wrong with using a synchronous library for database, an async library for HTTP and gradually move things to async.

  • Sync-Async

Calling a sync function def from a coroutine async def. In that case, you run the sync function in a different thread using the threadpool executor. The runinexecutor() method of the event loop takes an executor instance, a regular callable to invoke, and any arguments to be passed to the callable. It returns a Future that can be used to wait for the function to finish its work and return something.

import asyncio
import concurrent.futures

def blocking_io():
    # File operations (such as logging) can block the
    # event loop: run them in a thread pool.
    with open('/dev/urandom', 'rb') as f:
        return f.read(100)

def cpu_bound():
    # CPU-bound operations will block the event loop:
    # in general it is preferable to run them in a
    # process pool.
    return sum(i * i for i in range(10 ** 7))

async def main():
    loop = asyncio.get_running_loop()

    ## Options:

    # 1. Run in the default loop's executor:
    result = await loop.run_in_executor(
        None, blocking_io)
    print('default thread pool', result)

    # 2. Run in a custom thread pool:
    with concurrent.futures.ThreadPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, blocking_io)
        print('custom thread pool', result)

    # 3. Run in a custom process pool:
    with concurrent.futures.ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_bound)
        print('custom process pool', result)

asyncio.run(main())
  • Sync-Async

When you have to call coroutines from the normal sync function. You just have to manually get_event_loo() , create tasks() and call the asyncio.gather() function. Since you can await, one thing you can do is create a queue with asyncio.queue() and use that queue to pass around the data between different coroutines.


import asyncio


async def compute(x, y, data):
    print(f"Computing the value of {x} and {y}")
    result = x + y
    await data.put(result)


async def process(n, data):
    processed, sumx = 0, 0
    while processed < n:
        item = await data.get()
        print(item)
        processed += 1
        value = item
        sumx += value
    print(f"The sum is:{sumx}")
    await asyncio.sleep(.5)


def main():
    loop = asyncio.get_event_loop()
    data = asyncio.Queue()
    sum1 = loop.create_task(compute(1, 4, data))
    sum2 = loop.create_task(compute(0, 0, data))
    sum3 = loop.create_task(process(2, data))
    final_task = asyncio.gather(sum1, sum2, sum3)
    loop.run_until_complete(final_task)


if __name__ == '__main__':
    main()

What now?

  • Just to get a better understanding of all the next syntax you learned, you can try out a sample problem mentioned below.

Write a program that reads log files and refires those URLs that have a 5xx status code. Once the refiring is done just add the &retry=True in the prefix of the URL and store them in a separate log file.

The log file will be a text file, you can check out a sample file here.

  • As I am still exploring the concept concurrency so I don't exactly know the best practices and pitfalls you should avoid while writing async code, but I highly recommend you check out asyncio: We Did It Wrong – roguelynn. This article can be a good followup after you are done with this one and are comfortable with syntax of asyncio.

Just before ending the blog I would like to thank maxking and Jason Braganza for helping me out in the blog.

In the next part of the series, I will be talking about threads and finally will conclude the series with asyncio based frameworks such as quart and aiohttp.

Happy Coding!

Whenever we think of programs or algorithms we think of steps that are supposed to be done one after the other to achieve a particular goal. Let's take a very simple example of a function that is supposed to greet a person:

def greeter(name):
    """Greeting function"""
    print(f"Hello {name}")

greeter(Guido) #1
greeter(Luciano) #2
greeter(Kushal) #3
"""
Output:
Hello Guido
Hello Luciano
Hello Kushal
"""

Here the function greeter() greets the person who's name is passed through it. But it does it sequentially i.e when greeter(Guido) will run the whole program will block it's state unless the function executes successfully or not. If it runs successfully then only the second and third function calls will be made.

This familiar style of programming is called sequential programming.

Why concurrency?

Sequential programming is comparatively easy to understand and most of the time fit the use case. But sometimes you need to get most out of your system for any X reason, the most common substituent of X, I could find is scaling your application.

Though greeter() is just a toy example but a real-world application with real user need to work the same even on huge amount of traffic it receives. Every time you get that spike in your traffic/daily active user you can't just add more hardware so one of the best solutions at times is to utilize your current system to the fullest. Thus Concurrency comes into the picture.

Concurrency is about dealing with lots of things at once. – Rob Pike

Challenges in writing concurrent programs

Before I move forward, I know what most of the people will say. If it's that important why at work/college/park/metro station/.. people are not talking about it? Why most of the people still use sequential programming patterns while coding?

Because of a very simple reason, it's not easy to wrap your head around and it's very easy to write sequential code pretending to be concurrent code.

concurrency-comic

I got to know about this programming style very late and later when I talked to people they said the same thing. It's not easy to code, you can easily skip the best practices and very hard to debug so most of the people try to stick to the normal style of programming.

How Python handles concurrency?

The two most popular ways(techniques) of dealing with concurrency in Python is through:

  1. Threading
  2. Asyncio

Threading: Python has a threading module that helps in writing multi-threaded code. You can spawn independent threads share common states (just like a common variable that is accessed by two independent threads).

Let's re-write that greeter() function again now with threads.

import threading 
import time
def main():
    thread1 = threading.Thread(target=greeter, args=('Guido',))
    thread2 = threading.Thread(target=greeter, args=('Luciano',))
    thread3 = threading.Thread(target=greeter, args=('Kushal',))
    thread1.start()
    thread2.start()
    thread3.start()

def greeter(name):
    print("Hello {}".format(name))
    time.sleep(1)
    
if __name__ == '__main__':
    main()

"""
Output:
Hello Guido
Hello Luciano
Hello Kushal
"""
    

Here thread1, thread2, thread3 are three independent threads that run alongside main thread of the interpreter. This may look it is running in parallel but it's not. Whenever the thread waits(here it's a simple function so you might see that), this wait can be anything reading from a socket, writing to a socket, reading from a Database. Its control is passed on to the other thread in the queue. In threading, this switching is done by the operating system(preemptive multitasking).

Though threads seem to be a good way to write multithreaded code it does have some problems too.

  • The switch between the threads during the waiting period is done by the operating system. The user does not have control over it.
  • Python has this lock called the GIL(Global Interpreter Lock) and the thread which holds the GIL can only run, others have to wait for its turn to get the GIL than only they can proceed. Which is great if you're doing an I/0 bound task but sucks if you're doing a CPU bound task.

Asyncio: Python introduced asyncio package in 3.4, which followed a different approach of doing concurrency. It brought up the concept of coroutines. A coroutine is a restartable function that can be awaited(paused) and restarted at any given point. Unlike threads, the user decides which coroutine should be executed next. Thus this became cooperative multitasking.

Asyncio brought new keywords like async and await. A coroutine is defined with the async keyword and is awaited so that the waiting time can be utilized by the other coroutine.

Let's rewrite the greeter() again but now using the Asyncio.

import asyncio


async def greeter(name):
	await asyncio.sleep(1)
	print(f'Hello {name}')


def main():
    loop = asyncio.get_event_loop()

    task1 = loop.create_task(greeter('Guido'))
    task2 = loop.create_task(greeter('Luciano'))
    task3 = loop.create_task(greeter('Kushal'))

    final_task = asyncio.gather(task1, task2, task3)
    loop.run_until_complete(final_task)


if __name__ == '__main__':
    main()

"""
Output:
Hello Guido
Hello Luciano
Hello Kushal
"""

Looking at the above code we see some of the not so common jargons thrown around, event loop, tasks, and a new sleep function. Let's understand them before we dissect the code and understand it's working.

  • Event loop: it's one of the most important parts of the async code, this is a simple code that keeps on looping and checks if anything has finished it's waiting and needs to be executed. Only a single task can be run in an event loop at a time.
  • Coroutines: here the greeter() is a coroutine which prints the greeting, though this is a simple example but in an I/0 bound process a coroutine needs to wait so await helps the program to wait and get off the event loop. The async.sleep() function is different from the time.sleep() because async.sleep() is a non blocking call i.e it does not hold the program until the execution is completed. The argument given to the async.sleep() is the at the most value of the wait.
  • Tasks: since a calling, a coroutine does not return the value of the coroutine it returns a coroutine object. Separate tasks are created that can function independently with the help of the coroutine.

Now let's move on to the code. Here task1,task2 and task3 work concurrently calling the coroutine. Once all the tasked are gathered the event loop runs until all the tasks are completed.

I hope this gives you a brief overview of Concurrency, we would be diving deep into both threading and asyncio and how can we use async for web applications using aiohttp and quart.

Stay tuned this will be a multi-part series.

While reading about concurrency you might a lot of other topics that you might confuse concurrency with so let's look at them now just so we know how is concurrency different.

Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once. Not the same, but related. One is about structure, one is about execution. Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable. -Rob Pike

  • Parallesim: doing tasks simultaneously, this is different from concurrency as in parallelism all the tasks run side by side without waiting(sleep) for other tasks, unlike a concurrent task. The method to achieve is called multiprocessing. Multiprocessing is well suited for CPU bound tasks as it distributes tasks over different cores of the CPU. Sadly Python's GIL doesn't do go well with CPU bound tasks.

  • Single-Threaded/Multi-Threaded: Python is a single-threaded language because of the Python's GIL but you can use multiple threads. These threads run along with the main thread. So threading, in general, is the method to achieve concurrency.

  • Asynchronous:, asynchrony is used to present the idea of either concurrent or parallel task and when we talk about asynchronous execution the tasks can correspond to different threads, processes or even servers.

Part 2: Talking Concurrency: Asyncio

In the last blog I talked about Iterators and Iterables and I am assuming you're familiar with both of the concepts. So moving forward from let's talk about generators.

Simply put generators are iterators with a yield keyword and they do not return they yield. Similarly, a generator function is one that has a yield keyword in its body.

Let's look at some code and find out a bit more about them so we can define them more formally.

def range_123():
    print("Start")
    yield 1
    yield 2
    yield 3
    print('End')


for number in range_123():
    print(number)
"""
OUTPUT:
Start
1
2
3
End
"""

numbers = range_123() # Assigning generator object to numbers

next(numbers) #Output -> 1
next(numbers) #Output -> 2
next(numbers) #Output -> 3
next(numbers) #Output -> StopIteration Error

When we look closely into the above code range_123() is a generator function. Since generators are iterator we can directly iterate over the whole iterator function or we could assign it to a generator object and then use the next keyword to iterate over it until it's exhausted and raises the StopIteration error in a manner of confirming with the IteratorPrortocal.

Now you must be wondering what is the difference between the yield and return?

  • When a return statement is invoked inside a function, it permanently passes control back to the caller of the function and disposes of a function's local state.

  • When a yield is invoked, it also passes the control back to the caller of the function but it only does so temporarily. It suspends the function and retains its local state.

def greeter(name):
    while True:
        yield f'Hello {name}'

gen_object = greeter('Pradhvan') 
next(gen_object) # Output -> Hello Pradhvan
next(gen_object) # Output -> Hello Pradhvan
next(gen_object) # Output -> Hello Pradhvan
next(gen_object) # Output -> Hello Pradhvan

If we look at the above code we could clearly see that local variable are stashed away temporaily, suspending the function and giving control back to the caller while retaining it's local state.

Since it's doing a lazy evaluation it can be continued anytime with the next() on the generator, which can evaluate somewhat infinitely long series of greeting messages.

Let's look at one more example of a code snippet where multiple yield statements decide the flow of the function.

def repeater():
    while True:
        print("Start")
        yield 1
        yield 2
        print("end")
gen_obj = repeater()
next(iterator) # 1
next(iterator) # 2
next(iterator) # 3

"""
OUTPUT # 1
start 
1
OUTPUT # 2
2
OUTPUT # 3
end
start
1
"""

The above example makes it clear that in a generator function the flow of control of where the function suspends is decided by the yield statement. As the #2 suspends the value at 2 and when we do next() on 3 we get the whole block of statements.

Generator Expression

A generator function can be replaced with a generator expression. These are similar to list comprehensions which that eagerly builds a list, generator expressions return a generator that can lazily produce the items.

def range_123():
    print("Start")
    yield 1
    print("Middle")
    yield 3
    print("End")

res1 = [x*3 for x in range_123()]

"""
Output res1:
Start
Middle
End
"""

for i in res1:
    print("-->",i)
"""output:
--> 3
--> 9
"""
  • The list comprehension eagerly iterates over the items that are to be yielded and prints the Start Middle and End.
  • When the for is iterated over the list produced by the res1, it returns the item that are to be yielded.
def range_123():
    print("Start")
    yield 1
    print("Middle")
    yield 3
    print("End")

res2 = (x*3 for x in range_123())

print(res2) # <generator object <genexpr> at 0x7f8be1d09150>

for i in res2:
    print("-->",i)
"""
Output
Start
-->i
Middle
-->i
End
"""
  • In the case of generator expression, when the for loop iterates over the generator object res2, the body of the generator function range_123() actually executes.
  • Each iteration calls the next() while the iteration advances till a StopIteration is raised.

Since comprehension is a great way to increase the readability of your code and if you're using generator expression, you're making the comprehension more memory efficient.

But sometimes we tend to overuse the whole comprehension feature which backfires, I found a great article Overusing list comprehensions and generator expressions in Python which you should definitely look into.

Iteration is the fundamental technique which is supported by every programming language in the form of loops. The most common one at least from is the For loop and if specifically talk about Python's case, we have For each loop. For each loop are powered by iterators. An iterator is an object that does the actual iterating and fetches data one at a time and on-demand.

Let's take a step back and look back at some of the common terms which would help us in understanding iterators even better.

iterables: anything that can be iterated over is called an iterable.

for item in some_iterable:
    print(item)

sequences: Sequences are iterables which can be indexed.

numbers = [1,2,3,4]
tuples = (1,2,3)
word = 'Hello world'

The iter function

Iter is built-in function and whenever the interpreter needs to iterator over an object, it automatically calls the iter().

The iter() function returns an iterator.

When the iter function is called it does three things:

  1. Checks whether the object implements __iter__ method. (To see this just do dir() on the object.)
  2. If the __iter__ method is not present but the __getitem__is implemented, python creates an iterator that fetches the items in order, starting from the index zero.
  3. If that fails a TypeError is raised stating “ Object is not iterable”.
numbers = [1,2,3,4]
num = iter(numbers) # Builds an iterator 'num' 

Looking at the code snippet above we can make a better definition of an iterable.

*Any object which the __iter__ built-in function can be called an iterable.*

Before moving forward let's look at nifty little way the iter() works with functions to make them work as an iterator.

Let's build a die roller that rolls a die from 1-6 and stops when the die hits 1.

In this usage we need to make sure of two things:

  1. That the iter function must receive a callable that will be invoked every time the next function is called and the callable function should not have any arguments.
  2. The second argument which is called the sentinel which acts as a flag will cause the iterator to raise an exception instead of returning the second argument.
def die_roll():
    return randint(1,6)

roller = iter(die_roll, 1)

print(type(roller)) # <class 'callable_iterator'>

for roll in roller:
    print(roll)

"""
Output:
5
6
3
2
"""

Iterable vs Iterator

Python obtains an iterator from an iterable. Let's look at the for-each loop again to see how everything fits in the picture.

numbers = [1,2,3,4]
for number in numbers:
    print(number)

Looking at the code above we can only see the iterable i.e numbers. But what about the iterator? What about the iter() ? Isn't it suppose to use both to work.

Here we can't see the iterator or the iter() in action but it's working behind the scene. Let's re-write the whole statement in a while loop so we can see how it all fits together.

numbers = [1,2,3,4]
num = iter(numbers) # builds an iterator
while True:
    try:
        print(next(num))
    except StopIteration:
        del num
        break

The flow of the above code is simple:

  1. Iterator num is created from the iterable.
  2. To obtain the value from the iterator next is used.
  3. Iterator raises the StopIteratioin error when there are no further items left.
  4. We delete the iterator and break out of the loop.

You must be wondering everything is fine but why did we delete the iterator.

Iterators have this property that they are one-directional and once all the item is iterated over they can't be reset to the original state.

Thus the StopIteration signals that the iterator is exhausted. Thus it's best to delete it.

Writing your own iterator

Python iterator objects are required to support two methods __iter__ and the __next__ method.

iter method returns self. This allows iterators to be used where an iterable is expected i.e “for” and “in” keywords.

next method returns the next available item, raising the StopIteration when there are no more items to be looped through.

Let's bundle this knowledge and build our very own Range built-in function.

class _Range:
    def __init__(self, start, end, step = 1):
        self.start = start
        self.end = end - 1 
        self.step = step

    def __iter__(self):
        return self

    def __next__(self):
        if self.start > self.end:
            raise StopIteration
        else:
            self.start += self.step
            return self.start - 1

numbers = _Range(1, 3)
print(next(numbers)) # Result -> 1
print(next(numbers)) # Result -> 2
print(next(numbers)) # Raise a StopIteration Exception

Now that we know how an iterator works let's look back at the definition of an iterator again:

*Any object that implements the __next__ no-argument method that returns the next item in a series or raises StopIteration when there are no more items is called an Iterator.*

Just a quick tip before moving forward, the optimal way of creating your own iterator is to make a generator function, not by creating a iterator class like we did here.

Iterator Protocol

The iterator objects are required to support the following two methods, which together form the iterator protocol. The __iter__ and the __next__ method.

iterator.__iter__()
iterator.__next__()
  • Iterator Protocol powers the all the iteration in python.
  • Iterator Protocol also powers the tuple unpacking in Python.
# Tuple unpacking
x,y,z = coordinates
  • Iterator Protocol also powers the star expressions.
numbers = [1,2,3,4,5]
a,b,*rest = numbers 
print(numbers)
  • Most of the built-in functions that require some kind of looping(iterations) in python uses the Iterator Protocol.

Python's tongue twister

Iteratorables are not necessarily iterators but an iterator is necessarily iterable.

Example: Generators are iterators that can be looped over but lists are iterables but not an iterator.

Reasons to use Iterator:

  • Iterators allow lazy evaluation possible which saves memory.
  • Iterators allow for infinitely long iterables.

Not so common iterators

  • Enumerate objects are also iterators.
  • Zip objects are also iterators.
  • Reversed objects are iterators.
  • Files are also iterators.
letters = ['a','b','c','d']
next(enumerate(letters)) # Result -> (0, 'a')
next(zip(letters,letters)) #  Result -> ('a','a')
next(reversed(letters)) #  Result -> 'd'
next(open('iterator.txt')) #  Result -> 'iterator\n'