The Debug Diary – Chapter I
Lately, I was debugging an issue with the importer tasks of our codebase and came across a code block which looks fine but makes an extra database query in the loop. When you have a look at the Django ORM query
jato_vehicles = JatoVehicle.objects.filter(
year__in=available_years,<more_filters>
).only("manufacturer_code", "uid", "year", "model", "trim")
for entry in jato_vehicles.iterator():
if entry.manufacturer_code:
<logic>
ymt_key = (entry.year, entry.model, entry.trim_processed)
...
you will notice we are using only
, which only loads the set of fields mentioned and deferred other fields, but in the loop, we are using the field trim_processed
which is a deferred field and will result in an extra database call.
Now, as we have identified the performance issue, the best way to handle the cases like this is to use values
or values_list
. The use of only
should be discouraged in the cases like these.
Update code will look like this
jato_vehicles = JatoVehicle.objects.filter(
year__in=available_years,<more-filters>).values_list(
"manufacturer_code",
"uid",
"year",
"model",
"trim_processed",
named=True,
)
for entry in jato_vehicles.iterator():
if entry.manufacturer_code:
<logic>
ymt_key = (entry.year, entry.model, entry.trim_processed)
...
By doing this, we are safe from accessing the fields which are not mentioned in the values_list
. If anyone tries to do so, an exception will be raised.
** By using named=True
we get the result as a named tuple which makes it easy to access the values :)
Cheers!