Mammals dataset, Tidy data, & Streamlit
I got a hold of a nice dataset about the presence of various mammal species in Europe. I thought it would be nice to play with it, practice some data tidying with pandas and show it in Streamlit. The data contain in each row, a specific grid-cell where you are given some Bioclimatic features, like the average temperature in a specific month of the year, or precipitation of wettest month, etc. For each grid cell you have also a binary column per mammal species, ‘found here or not’.
Following ideas from this great PyData, Youtube recording: Daniel Chen: Cleaning and Tidying Data in Pandas | PyData DC 2018 — YouTube, I have extracted three tables out of the original table, and made sure to be able to reconnect by introducing a new ‘cell_id’ column, which will be selected in each.
Why did I need three different dataframes? Think of it like a database normalization. The information about the grid, can be kept in one table. Then we have some information per month, ex. mean_temp_march_utm. And finally, the relation regarding the presence of a specific mammal species in a specific cell grid. I did not have a table with a row per mammal species, but this can be added, like, the common name, Latin name, link to Wikipedia article, etc.
For example, for the per month information, I had the following code:
You can see above, that I took the ‘cell_id’ and the relevant columns. Then I’ve used ‘melt’ keeping only ‘cell_id’ from melting, so that we have now the columns ‘variable’ and ‘value’. Then the values in ‘variable’ are being split into two columns: ‘statistics’ and ‘month’, and finally, I use ‘pivot’ to have each statistic in its own column.
Consider the following aggregation:
We can draw (distorted) map, pretending that (longitude, latitude) are a cartesian coordinates:
I then moved from the Jupyter notebook to a script (Streamlit), and made use of pydeck to display a proper map:
And while in Streamlit, I had a selector of the mammal species, to fetch its Wikipedia page (displayed in an iframe), and then a map showing the grid cells where this species is found.
Overall it was a very nice experience, and pandas and Streamlit, do the job.
You can find a link to the Github repository below, and also a link to YouTube two parts description of above.
The credentials for the data are:
Mitchell-Jones, A.J., Amori, G., Bogdanowicz, W., Krystufek, B., Reijnders, P.J.H., Spitzenberger, F.,Stubbe, M., Thissen, J.B.M., Vohralik, V. & Zima, J. (1999) The atlas of European mammals. Academic
Press, London.
The Github link is: zbenmo/mammals: mammals dataset with streamlit (github.com)
And the YouTube videos are in: https://youtube.com/playlist?list=PL9WgRVRjJLlYXGnHCso7SQAegjbQ8yio6