Openair R module in Python Notebook
‘Openair’ is a fantastic air quality analysis package with many extremely useful visualisations ready to use, often written with a single line.
I first learned openair at the start of my PhD (literally a week or so in) before I had any particular affinity for any programming language and so I dived head first into R and Rstudio.
I happily used R, Rstudio and openair for about a year, but I couldn’t help noticing the plethora of easy to use, well documented, well troubleshooted (troubleshot?) python packages that seemed to lend themselves to the kind of work I was doing. Matplotlib just seemed so much easier than ggplot. I decided to give python a go.
Using scipy, pandas and matplotlib was a lot more intuitive than the equivalent in R and so I used both python and R for a time. However, the simplicity of python eventually won. I slowly stopped using R and stopped using openair.
The problem is: openair is really really good; so if I wanted a quick windrose or polar plot, I’d have to use R. Or would I? True to form, there is of course a python package, rpy2, that allows you to run R code in a python kernel.
I am sure there is way more to rpy2 than I could ever get my head around but all I care about is that I now have a way of using openair within python.
After installing openair and rpy2, you must set some environment variables so rpy2 knows where to look for the R exectuable and library. It’s then as simple as:
Now the openair package is accessible as a python object. You can now run openair plotting functions. However, the dataframe to pass to the plotting function must be an R dataframe, not a pandas dataframe. That must then be converted.
Now we can run the the plotting function.
This opens a new window with the displayed graphic. This is great, now we have access to the plot. However, I would prefer it if I could put the plot inline within a notebook, then I could integrate it into my analyses.
After much stackoverflowing, I found a solution using the grdevices module of rpy2.robjects.lib. I wrapped that in a function with the openair plot function and hey presto, inline openair plotting.
The *args and **kwargs are passed straight to the callback.
Somewhat uneccessarily, I wrapped a lot of this functionality into a module that I could import and so use more cleanly in a notebook. This includes loading the openair module, converting a pandas dataframe to R dataframe (assuming it has a time series index) and displaying the plot inline.
The only problem I have encountered so far is that the error messages are somewhat obscured. However, the most common error I encounter is:
This indicates an incorrect argument has been passed to the r function. Typically this is to the pollutant keyword, especially if it contains any sort of punctuation character. These are rendered as ‘.’ in an R dataframe. For example, a column heading of “SO4(AMS)/ugm3” would be converted to “SO4.AMS..ugm3”.
Below is an example showing the openair pollutionRose function which is used to find the direction and frequency of a particular pollutant.