In this blog I demonstrate how you can implement and take advantage of caching in your Python scripts/applications.
Caching allows you to complete tasks more rapidly by storing and reusing results for repeated operations using the same criteria. For example, consider a function that takes several arguments and performs a complex calculation. What if you passed the same arguments to this function ten times; well, without caching, the same operation and complex calculation will be performed ten times.
Fortunately, if a caching scheme is implemented correctly, the application will perform the complex calculation only once and save the arguments and calculated results; if the same exact arguments are passed to the function again, the application will know that the calculation has been done in the past and can just return the prior results instead of performing a recalculation. As you can imagine, application performance can greatly be improved by caching; this is the reason most all high-traffic commercial web applications implement some caching scheme.
For this blog, I created a Python application that calculates the molecular/formula mass of some simple compounds. The molecular/formula mass of a compound is the sum of atomic masses of all its component elements. For example, the molecule H2O or water is equal to a molecular mass of 18.016 amu [(2 x atomic mass of H) + (1 x atomic mass of O)].
Below is the Python function I created that takes a molecular/formula string as an argument, parses the respective information, and calculates the molecular/formula mass. The element data is pulled from an external text file which holds the symbol and atomic mass for the respective element.
In the above function, the second to last line saves the results to a dictionary and saves the respective passed in arguments as the keys. Now, if you look at the first four lines of the function, you can see that upon entering the function the same dictionary is checked to see if the arguments passed in were already passed in prior – if so, the saved results are returned, skipping the entire recalculation and saving time.
To prove that the caching mechanism is really improving performance, I wrote a function called ‘calculateTime’ which takes the molecular formula as a string and times how long it takes to calculate the molecular/formula mass. Below is the function code.
In the test code below, you can see I call the ‘calculateTime’ function four times. The first three times, the molecular formula passed in as an argument is different in each case. However, in the fourth call to the ‘calculateTime’ function, you can see I used the same argument as the third call to the function. I would expect the fourth call to return much faster than the other three as the same arguments were used for the call prior and the results should have been cached.
The ‘calculateTime’ function prints out the time in seconds it took to return a result. You can see in the fourth call there is a huge jump in calculation time for the ‘calculateMolecularMass’ function. You’ve just witnessed caching at work. This is a relatively simple program; just imagine how much time can be saved in a huge commercial system.
You can download the full source code for this application from the download section of this website.