The Debug Diary – Chapter I

Lately, I was debugging an issue with the importer tasks of our codebase and came across a code block which looks fine but makes an extra database query in the loop. When you have a look at the Django ORM query

jato_vehicles = JatoVehicle.objects.filter(
    year__in=available_years,<more_filters>
).only("manufacturer_code", "uid", "year", "model", "trim")

for entry in jato_vehicles.iterator():
    if entry.manufacturer_code:
        <logic>
    ymt_key = (entry.year, entry.model, entry.trim_processed)
...

you will notice we are using only, which only loads the set of fields mentioned and deferred other fields, but in the loop, we are using the field trim_processed which is a deferred field and will result in an extra database call.

Now, as we have identified the performance issue, the best way to handle the cases like this is to use values or values_list. The use of only should be discouraged in the cases like these.

Update code will look like this

jato_vehicles = JatoVehicle.objects.filter(
    year__in=available_years,<more-filters>).values_list(
    "manufacturer_code",
    "uid",
    "year",
    "model",
    "trim_processed",
    named=True,
)

for entry in jato_vehicles.iterator():
    if entry.manufacturer_code:
        <logic>
    ymt_key = (entry.year, entry.model, entry.trim_processed)
...

By doing this, we are safe from accessing the fields which are not mentioned in the values_list. If anyone tries to do so, an exception will be raised.

** By using named=True we get the result as a named tuple which makes it easy to access the values :)

Cheers!

#Django #ORM #Debug