Talking Concurrency: Introduction
Whenever we think of programs or algorithms we think of steps that are supposed to be done one after the other to achieve a particular goal. Let's take a very simple example of a function that is supposed to greet a person:
def greeter(name):
"""Greeting function"""
print(f"Hello {name}")
greeter(Guido) #1
greeter(Luciano) #2
greeter(Kushal) #3
"""
Output:
Hello Guido
Hello Luciano
Hello Kushal
"""
Here the function greeter()
greets the person who's name is passed through it. But it does it sequentially i.e when greeter(Guido)
will run the whole program will block it's state unless the function executes successfully or not. If it runs successfully then only the second and third function calls will be made.
This familiar style of programming is called sequential programming.
Why concurrency?
Sequential programming is comparatively easy to understand and most of the time fit the use case. But sometimes you need to get most out of your system for any X reason, the most common substituent of X, I could find is scaling your application.
Though greeter()
is just a toy example but a real-world application with real user need to work the same even on huge amount of traffic it receives. Every time you get that spike in your traffic/daily active user you can't just add more hardware so one of the best solutions at times is to utilize your current system to the fullest. Thus Concurrency comes into the picture.
Concurrency is about dealing with lots of things at once. – Rob Pike
Challenges in writing concurrent programs
Before I move forward, I know what most of the people will say. If it's that important why at work/college/park/metro station/.. people are not talking about it? Why most of the people still use sequential programming patterns while coding?
Because of a very simple reason, it's not easy to wrap your head around and it's very easy to write sequential code pretending to be concurrent code.
I got to know about this programming style very late and later when I talked to people they said the same thing. It's not easy to code, you can easily skip the best practices and very hard to debug so most of the people try to stick to the normal style of programming.
How Python handles concurrency?
The two most popular ways(techniques) of dealing with concurrency in Python is through:
- Threading
- Asyncio
Threading: Python has a threading module that helps in writing multi-threaded code. You can spawn independent threads share common states (just like a common variable that is accessed by two independent threads).
Let's re-write that greeter()
function again now with threads.
import threading
import time
def main():
thread1 = threading.Thread(target=greeter, args=('Guido',))
thread2 = threading.Thread(target=greeter, args=('Luciano',))
thread3 = threading.Thread(target=greeter, args=('Kushal',))
thread1.start()
thread2.start()
thread3.start()
def greeter(name):
print("Hello {}".format(name))
time.sleep(1)
if __name__ == '__main__':
main()
"""
Output:
Hello Guido
Hello Luciano
Hello Kushal
"""
Here thread1
, thread2
, thread3
are three independent threads that run alongside main thread of the interpreter. This may look it is running in parallel but it's not. Whenever the thread waits(here it's a simple function so you might see that), this wait can be anything reading from a socket, writing to a socket, reading from a Database. Its control is passed on to the other thread in the queue. In threading, this switching is done by the operating system(preemptive multitasking).
Though threads seem to be a good way to write multithreaded code it does have some problems too.
- The switch between the threads during the waiting period is done by the operating system. The user does not have control over it.
- Python has this lock called the GIL(Global Interpreter Lock) and the thread which holds the GIL can only run, others have to wait for its turn to get the GIL than only they can proceed. Which is great if you're doing an I/0 bound task but sucks if you're doing a CPU bound task.
Asyncio: Python introduced asyncio package in 3.4, which followed a different approach of doing concurrency. It brought up the concept of coroutines. A coroutine is a restartable function that can be awaited(paused) and restarted at any given point. Unlike threads, the user decides which coroutine should be executed next. Thus this became cooperative multitasking.
Asyncio brought new keywords like async
and await
. A coroutine is defined with the async
keyword and is awaited so that the waiting time can be utilized by the other coroutine.
Let's rewrite the greeter()
again but now using the Asyncio.
import asyncio
async def greeter(name):
await asyncio.sleep(1)
print(f'Hello {name}')
def main():
loop = asyncio.get_event_loop()
task1 = loop.create_task(greeter('Guido'))
task2 = loop.create_task(greeter('Luciano'))
task3 = loop.create_task(greeter('Kushal'))
final_task = asyncio.gather(task1, task2, task3)
loop.run_until_complete(final_task)
if __name__ == '__main__':
main()
"""
Output:
Hello Guido
Hello Luciano
Hello Kushal
"""
Looking at the above code we see some of the not so common jargons thrown around, event loop, tasks, and a new sleep function. Let's understand them before we dissect the code and understand it's working.
- Event loop: it's one of the most important parts of the async code, this is a simple code that keeps on looping and checks if anything has finished it's waiting and needs to be executed. Only a single task can be run in an event loop at a time.
- Coroutines: here the
greeter()
is a coroutine which prints the greeting, though this is a simple example but in an I/0 bound process a coroutine needs to wait soawait
helps the program to wait and get off the event loop. Theasync.sleep()
function is different from thetime.sleep()
becauseasync.sleep()
is a non blocking call i.e it does not hold the program until the execution is completed. The argument given to theasync.sleep()
is the at the most value of the wait. - Tasks: since a calling, a coroutine does not return the value of the coroutine it returns a coroutine object. Separate tasks are created that can function independently with the help of the coroutine.
Now let's move on to the code. Here task1
,task2
and task3
work concurrently calling the coroutine. Once all the tasked are gathered the event loop runs until all the tasks are completed.
I hope this gives you a brief overview of Concurrency, we would be diving deep into both threading and asyncio and how can we use async for web applications using aiohttp and quart.
Stay tuned this will be a multi-part series.
Related Jargons:
While reading about concurrency you might a lot of other topics that you might confuse concurrency with so let's look at them now just so we know how is concurrency different.
Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once. Not the same, but related. One is about structure, one is about execution. Concurrency provides a way to structure a solution to solve a problem that may (but not necessarily) be parallelizable. -Rob Pike
Parallesim: doing tasks simultaneously, this is different from concurrency as in parallelism all the tasks run side by side without waiting(sleep) for other tasks, unlike a concurrent task. The method to achieve is called multiprocessing. Multiprocessing is well suited for CPU bound tasks as it distributes tasks over different cores of the CPU. Sadly Python's GIL doesn't do go well with CPU bound tasks.
Single-Threaded/Multi-Threaded: Python is a single-threaded language because of the Python's GIL but you can use multiple threads. These threads run along with the main thread. So threading, in general, is the method to achieve concurrency.
Asynchronous:, asynchrony is used to present the idea of either concurrent or parallel task and when we talk about asynchronous execution the tasks can correspond to different threads, processes or even servers.
Part 2: Talking Concurrency: Asyncio