sandeepk

django

The Debug Diary – Chapter I

Lately, I was debugging an issue with the importer tasks of our codebase and came across a code block which looks fine but makes an extra database query in the loop. When you have a look at the Django ORM query

jato_vehicles = JatoVehicle.objects.filter(
    year__in=available_years,<more_filters>
).only("manufacturer_code", "uid", "year", "model", "trim")

for entry in jato_vehicles.iterator():
    if entry.manufacturer_code:
        <logic>
    ymt_key = (entry.year, entry.model, entry.trim_processed)
...

you will notice we are using only, which only loads the set of fields mentioned and deferred other fields, but in the loop, we are using the field trim_processed which is a deferred field and will result in an extra database call.

Now, as we have identified the performance issue, the best way to handle the cases like this is to use values or values_list. The use of only should be discouraged in the cases like these.

Update code will look like this

jato_vehicles = JatoVehicle.objects.filter(
    year__in=available_years,<more-filters>).values_list(
    "manufacturer_code",
    "uid",
    "year",
    "model",
    "trim_processed",
    named=True,
)

for entry in jato_vehicles.iterator():
    if entry.manufacturer_code:
        <logic>
    ymt_key = (entry.year, entry.model, entry.trim_processed)
...

By doing this, we are safe from accessing the fields which are not mentioned in the values_list. If anyone tries to do so, an exception will be raised.

** By using named=True we get the result as a named tuple which makes it easy to access the values :)

Cheers!

#Django #ORM #Debug

select_for_update is the answer if you want to acquire a lock on the row. The lock is only released after the transaction is completed. This is similar to the Select for update statement in the SQL query.

>>> Dealership.objects.select_for_update().get(pk='iamid')
>>> # Here lock is only required on Dealership object
>>> Dealership.objects.select_related('oem').select_for_update(of=('self',))

select_for_update have these four arguments with these default value – nowait=False – skiplocked=False – of=() – nokey=False

Let's see what these all arguments mean

nowait

Think of the scenario where the lock is already acquired by another query, in this case, you want your query to wait or raise an error, This behavior can be controlled by nowait, If nowait=True we will raise the DatabaseError otherwise it will wait for the lock to be released.

skip_locked

As somewhat name implies, it helps to decide whether to consider a locked row in the evaluated query. If the skip_locked=true locked rows will not be considered.

nowait and skip_locked are mutually exclusive using both together will raise ValueError

of

In select_for_update when the query is evaluated, the lock is also acquired on the select related rows as in the query. If one doesn't wish the same, they can use of where they can specify fields to acquire a lock on

>>> Dealership.objects.select_related('oem').select_for_update(of=('self',))
# Just be sure we don't have any nullable relation with OEM

no_key

This helps you to create a weak lock. This means the other query can create new rows which refer to the locked rows (any reference relationship).

Few more important points to keep in mind select_for_update doesn't allow nullable relations, you have to explicitly exclude these nullable conditions. In auto-commit mode, select_for_update fails with error TransactionManagementError you have to add code in a transaction explicitly. I have struggled around these points :).

Here is all about select_for_update which you require to know to use in your code and to do changes to your database.

Cheers!

#Python #Django #ORM #Database

Today we are gone to talk about the Foreign Key which is used to establish a many-to-one relationship and different args which can be passed.

Syntax to define a Foreign key relationship

from django.db import models

class GSTRate(models.Model):
    # ...
    pass

class PurchaseOrder(models.Model):
    gst = models.ForeignKey(GSTRate, on_delete=models.CASCADE)

Different args for the Foreign Key

  • on_delete
  • limitchoicesto
  • related_name
  • relatedqueryname
  • to_field
  • db_constraints

on_delete This args handle the process, of what to do if the reference keys are deleted, we have three options – CASCADE: It deletes the objects contains the Foreign key – PROTECT: It prevents the deletion of the referenced object by raising ProtectedError – RESTRICT: Introduced in Django 3.1, It only deletes the object, if another references object is being deleted in the same operation, but with CASCADE flow otherwise raise RestrictedError

from django.db import models

class GSTRate(models.Model):
    # ...
    pass

class PurchaseOrder(models.Model):
    gst = models.ForeignKey(GSTRate, on_delete=models.CASCADE)

limit_choices_to This is helpful in the form rendering to restrict the option for the reference field. The limited choice can either be a dict, Q object or a callable that return dict or Q object

from django.db import models

class GSTRate(models.Model):
    # ...
    pass

class PurchaseOrder(models.Model):
    gst = models.ForeignKey(GSTRate, on_delete=models.CASCADE, limit_choices_to={'tax_type': 'gst'},)
# here we only show the tax of type GST only.

related_name It's the name that can be used for the reverse or backward reference while querying. You can also stop the backward relation by assigning + in related_name.


class GSTRate(models.Model):
    # ...
    pass

class PurchaseOrder(models.Model):
    gst = models.ForeignKey(GSTRate, on_delete=models.CASCADE, related_name='+',) # IN this case GSTRate cannot backward realte to the Purchase Order obejct.

class PurchaseOrder(models.Model):
    gst = models.ForeignKey(GSTRate, on_delete=models.CASCADE, related_name='invoice')

# here you can refer to Invoice inside GSTRate query by the related name.
GSTRate.objects.filter(invoice_id=1232)

related_query_name It has the same value as the related_name, unless not specified explicitly. If it is defined then you have to use that name while querying


class GSTRate(models.Model):
    # ...
    pass

class OrderConfirmation(models.Model):
    gst = models.ForeignKey(GSTRate, on_delete=models.CASCADE, related_name='order_confirmation', related_query_name='oc')

# here you can refer to Invoice inside GSTRate query by the related name.
GSTRate.objects.filter(oc_id=112)

to_field This is helpful in the case where you want to refer other the primary key of the modal, but the condition whatever key you referred it should be unique=True

db_constraints This is used to create a constraint at the database level, by default value is true and if the user sets the value to False, then accessing a related object that doesn’t exist will raise its DoesNotExist exception.

Conclusion These args are handy which gives us the possibilities of implementing the additional constraint and functionality to query data in the Django ORM query language.

Cheers!

#100DaysToOffload #Python#Django

Template is the one of powerful features of Django Framework, and we will discuss the tag and filter. If you do not have any idea about the template, please go read about that here, I will wait for you.

Template tags which we are going to discuss today are

  • URL
  • Now
  • Spaceless

URL The URL tag is the template equivalent of the reverse function. URL can accept args or kwargs for routes that expect other variables.

<a href="{% url "a_named_view" %}">Go to a named view</a>

Now Now is a convenient method to display information about the current time. It was a problem solver for the copyright year update, which I first have to update every year manually.

&copy; {% now "Y" %} sandeepk.dev.

Spaceless It removes the whitespace, newline, tab from the HTML

{% spaceless %}
    <div>
        <span>I am a good span.</span>
    </div>
{% endspaceless %}

Now let's talk about the filter which we can use in the template.

  • date
  • default
  • pluralize
  • yesno

Date This filter can be used to control the format of the datetime passed in the context of the response.

{{ birth_date|date:"Y-m-d" }}

Default Have you ever been in a condition when you want to show a text for a falsy value, then default filter is the one thing you should use

{{ value_to_check|default:"Nothing to see here." }}

Pluralize When your text considers counts of things. Pluralize can use to save you from if and else check

{ count_items }} item{{ count_items|pluralize }}

0 items
1 item
2 items
3 items
(and so on)

YesNo YesNo is good for converting True|False|None into a meaningful text message

{{ user.name }} has {{ user_accepted|yesno:"accepted,declined,not RSVPed" }} request.

These are the tags and filter which I use very often but there many more which you can check out here

Cheers!

#100DaysToOffload #Python #Django

Django provides the support of using multiple databases in your project. Let's see how we can do that, but first, let me put some use case where we might need multiple databases for our application. Why need multiple databases? In today's world, we are gathering a lot of data from user which is used for different purposes, some data is relational data and other is non-relational data. Let me put few use cases

  • Suppose you need to record all the touchpoints of a web page in your web application, for this you need a non-relation database to store that data to run some analytical result on it.
  • Read replicas, you need to set up read replicas of your default database to speed up the fetching of data from the database.
  • Saving the email metadata like how many emails were sent, open rate, error rate, link clicked to see the engagement of the emails.

Lets us see how to set up multiple databases in the Django project.

  1. Need to add the details of the databases in settings.py of Django project.
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.mysql",
        "NAME": config.get("database_default", "name"),
        "USER": config.get("database_default", "user"),
        "PASSWORD": config.get("database_default", "password"),
        "HOST": config.get("database_default", "host"),
        "PORT": "3306",
        "CONN_MAX_AGE": 0,
    },
    "replica1": {
        "ENGINE": "django.db.backends.mysql",
        "NAME": config.get("database_replica1", "name"),
        "USER": config.get("database_replica1", "user"),
        "PASSWORD": config.get("database_replica1", "password"),
        "HOST": config.get("database_replica1", "host"),
        "PORT": "3306",
        "CONN_MAX_AGE": 0,
    },
    "mongo": {
        "ENGINE": "djongo",
        "NAME": config.get("mongo_database", "name"),
        "HOST": config.get("mongo_database", "host"),
        "USER": config.get("mongo_database", "user"),
        "PASSWORD": config.get("mongo_database", "password"),
    },
}

Here you can see we define 2 databases other than the default databases mongo and replica1. After this, you need to tell the Django router in which app you want to use which connection of database. This is one of the ways to do it, you can manually decide which database you want to use while querying.

DATABASE_ROUTERS = ['path.to.replica1', 'path.to.mongo']
  1. Now we need to define this router class to tell them which database to use, for that we need to write a class
class MongoRouter:
    """
    A router to control all database operations on models in the
    analytics and status applications.
    """
    route_app_labels = {'analytics', 'status'}

    def db_for_read(self, model, **hints):
        """
        Attempts to read analytics and status models go to mongo db.
        """
        if model._meta.app_label in self.route_app_labels:
            return 'mongo'
        return None

    def db_for_write(self, model, **hints):
        """
        Attempts to write analytics and status models go to auth_db.
        """
        if model._meta.app_label in self.route_app_labels:
            return 'mongo'
        return None

    def allow_relation(self, obj1, obj2, **hints):
        """
        Allow relations if a model in the analytics and status apps is
        involved.
        """
        if (
            obj1._meta.app_label in self.route_app_labels or
            obj2._meta.app_label in self.route_app_labels
        ):
           return True
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """
        Make sure the analytics and status apps only appear in the
        'mongo' database.
        """
        if app_label in self.route_app_labels:
            return db == 'mongo
        return None

similar goes for replica1 database


class ReplicaRouter:
    def db_for_read(self, model, **hints):
        """
        Reads go to replica1.
        """
        return 'replica1'

    def db_for_write(self, model, **hints):
        """
        Writes always go to default.
        """
        return 'default'

    def allow_relation(self, obj1, obj2, **hints):
        """
        Relations between objects are allowed if both objects are
        in the default/replica1 pool.
        """
        db_set = {'default', 'replica1'}
        if obj1._state.db in db_set and obj2._state.db in db_set:
            return True
        return None

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        """
        All non-mongo models end up in this pool.
        """
        return True

That's it, now you read to use multiples database in your project, which we early handle by the routers class you have defined.

Cheers!

#100DaysToOffload #Django #Python

In Django, we can use the abstraction concept in defining the tables for the columns which are common. We can make any modal as an abstract model by adding this meta property abstract = true.

Suppose you have some column fields which are common in all the tables, which you can abstract and have to just inherit this abstract class to add the fields in the model which can help you to follow the Don't Repeat Yourself principle. Let see an example

class Base(models.Model):
  """
  Base parent class for all the models
  """
  timestamp = models.DateTimeField(blank=True, db_index=True)
  is_active = models.BooleanField(default=True, db_index=True)

  def __init__(self, *args, **kwargs):
    super(Base, self).__init__(*args, **kwargs)

  class Meta:
    abstract = True

class OttPlatform(Base):
  """
  """

  name = models.CharField(max_length=200)
  ott_type = models.CharField(max_length=50)

  def __str__(self):
    return self.name

So, this helps you to stop duplication of code, but there is one more issue we can handle here. The is_active column is used to mark the row as deleted. Mainly in our use case, we can't delete the data from the table to keep the track of changes. So is_active field helps us with that. But now we have to use the is_active filter in every query.

We can solve this by overriding the manager, let see how


# First, define the Manager subclass.
class AtiveOTTManager(models.Manager):
    def get_queryset(self):
        return super().get_queryset().filter(is_active=True)

class OttPlatform(Base):
  """
  """

  name = models.CharField(max_length=200)
  ott_type = models.CharField(max_length=50)
  
   # the order matters, first come default manager, then custom managers.
  objects = models.Manager() # The default manager.
  active_objects = AtiveOTTManager() # The active OTT manager.

  def __str__(self):
    return self.name

# Now you have to do OttPlatform.active_objects.all(), to get all the active OTT platform name.

So, with overriding the manager we don't have to write a filter for is_active in every query.

Cheers!

#100DaysToOffload #django #python

Django Q() object helps to define SQL condition on the database and can be combined with the &(AND) and |(or) operator. Q() helps in the flexibility of defining and reusing the conditions.

  • Using Q() objects to make an AND conditions.
  • Using Q() objects to make an OR conditions.
  • Using Q() objects to make reusable conditions.

Using Q() objects to make an AND conditions We can use Q() objects to combine multiple filter conditions into one condition as filter conditions always perform AND operations.

from django.db.models import Q

# Without Q() object
document_obj = Document.objects.filter(created_by=1282).filter(doc_type='purchase_order').filter(edit=0).filter(cancelled=0)

#With Q() object
q_filter_document = Q(created_by=1282) & Q(doc_type='purchase_order') & Q(cancelled=0) &(edit=0)

# can also be written as
q_filter_document_another_way = Q(created_by=1282, doc_type='purchase_order', cancelled=0, edit=0)

document_obj = Document.objects.filter(q_filter_document)

Using Q() objects to make an OR conditions

from django.db.models import Q


#With Q() object
q_filter_document = Q(created_by=1282) | Q(created_by=1282)
document_obj = Document.objects.filter(q_filter_document)

Q() to make reusable filter condition The best use of Q() objects is reusability, we define the Q() once and can use them to combine with different Q() objects with help of &, |, and ~ operators.

Let's consider a use case, in which the user can generate a report based on certain filters. User can filter report based on these values documenttype, isdraft, createdby, documentstatus


def get_document_object(document_type, is_draft, created_by, document_status):
    base_query = Q(active=1, cancelled=0, document_tye=document_type, is_draft=is_draft)

    # based on condition we can different Q() objects to filter the tables
    if document_status = 'in_progress':
        base_query = base_query & Q(document_status=document_status, completed=0)
    else if document_status = 'completed':
        base_query = base_query & Q(document_status=document_status, completed=1)


   return Documents.objects.filter(base_query)

In Q() objects we can use the same conditional operator which we use in filter objects like in operator, startswith, endswith, etc.

Conclusion Q() objects contribute to clean code and reusability. It helps to define the condition with &, |, and ~ relation operator to simplify the complex queries.

Cheers!

#django #python #100DaysToOffload

In the previous blog post, we have discussed F() Expression, we will now explore more query expression in Django, to name few that we will discuss in this post are

  • Func() Expression
  • Subquery Expression
  • Aggregation () Expression

Func() Expression Func () Expression is the base of all the expressions and can be used to create your custom expression for the database level function.

# The table that we using for our query is the *Student* which keeps records of the students for the whole school.

from django.db.models import F, Func
student_obj = Student.objects.annotate(full_name=Func(F('first_name') + F('last_name'), function='UPPER')

# This will give a student object with a new field that is *full_name* of the student in upper case.

Subquery Expression Subquery are like nested condition in the query filter which helps you to make a complex query into a clean concise query. But you need to know the order of the sequence the query will be executed to use effectively. While using a Subquery you will also need to know about the OuterRef, which is like an F() Expression but points to the parent query value, let see both Subquery and OuterRef in action

# you are given a task to get the name of the student whose name starts with *S* and whose fees are due.

from django.db.models import OuterRef, Subquery
fee_objects = Fees.objects.filter(payment_due_gt=0)
student_obj = Student.objects.filter(name__startswith='S').filter(id__in=Subquery(fee_objects.values('student_id')))

# Get the lastest remarks for the students
remark = Remark.objects.filter(student_id=OuterRef('pk')).order_by('-created_at')
student_obj = Student.objects.annotate(newest_remark=Subquery(remark.values('remark_strl')[:1]))

Aggregation () Expression

Aggregation Expression is the Func Expression with GroupBy clause in the query filter.

# get the total student enrolled in the *Blind Faith* subject.

student_obj = Student.objects.filter(subject_name='blind_faith').annotate(total_count=Count('id'))

Note: All queries mentioned above in the code are not tested. So if you see any typo, a query that does not make sense, feel free to reach out to me at sandeepchoudhary1507[at]gmail[DOT]com.

Cheers!

#100DaysToOffload #django #python

What is the F() Expression? First let me explain to you what are Query Expressions are, these expressions let you use value or computation to be used in the update, create and filters, order by, annotation, aggregation. F() object represent the value of the model fields or annotated columns. It lets you help to not load the value of the field into the python memory rather directly handles in the Database query.

How to use the F() Expression? To use the F expression you have to import them from the from django.db.models import F and have to pass the name of the field or annotated column as argument, and it will return the value of the field from the database, without letting know the python any value. Let some example.

from django.db.models import F

# Documents is the table which have the details of the document submitted by user from the registrey portal for GYM membership

# We need update the count of the document submitted by the user with pk=10091

# without using F Expression

document = Documents.objects.get(user_id=10091)
document.document_counts += 1
document.save()

# Using F expression
document = Documents.objects.get(user_id=10091)
document.document_counts = F('document_counts') + 1
document.save()

Benefits of the F() Expression.

  • With the help of F expression we can make are query clean and concise.
    from django.db.models import F
  document = Documents.objects.get(user_id=10091)
  document.update(document_counts=F('document_counts') + 1)

   #Here we also have achieved some performance advantage
    #1. All the work is done at database level, rather than throwing the value from the database in the python memory to do the computation.
    #2. Save queries hit on the database.
  • F Expression can save you from the race condition. Consider a scenario where multiple user access your database and when bother user access the Document object for the user 10091, the count value is two, when user updates the value and save it and other user does the same the value will be saved as three not Four because when both user fetches the value its two.

  # user A fetch the document object, and value of document_counts is two.
  document = Documents.objects.get(user_id=10091)
  document.document_counts += 1
  document.save()
  # after the operation value of document_counts is three

  # Code running prallerly, User B also fetch the object, and value of document_counts is two.
  document = Documents.objects.get(user_id=10091)
  document.document_counts  += 1
  document.save()
  # after the operation value of document_counts is three

  # But actually value should be Four, but this is not the case using F expression here will save use from this race condition.
  • F Expression are persistent, which means the query persist after the save operation, so you have to use the refreshfromdb to avoid the persistence.
  document = Documents.objects.get(user_id=10091)
  document.document_counts = F('document_counts') + 1
  document.save()

  document.document_validation = 0
  document.save()

  # This will increase the value of *document_counts* by two rather then one as the query presists and calling save will trigger the increment again.

  • More example of F Expression in action with filter, annotate query.
from django.db.models import F

# annotation example
annotate_document = Document.objects.filter(created_by=F('created_by_first_name') + F('created_by_last_name')


# filter example
filter_document = Documents.objects.filter(total_documents_count__gt=F('valid_documents_count'))

That's all about the F Expression, it is pretty handy and helps you improve the performance, by reducing value loading in the memory and query hit optimization, but you have to keep an eye over the persistent issue, which I assume will not be the case if you are writing your code in structured way.

Cheers!

#100DaysToOffload #django #python