Mammals dataset, Tidy data, & Streamlit

Oren Zeev Ben Mordehai
3 min readJan 16, 2021

I got a hold of a nice dataset about the presence of various mammal species in Europe. I thought it would be nice to play with it, practice some data tidying with pandas and show it in Streamlit. The data contain in each row, a specific grid-cell where you are given some Bioclimatic features, like the average temperature in a specific month of the year, or precipitation of wettest month, etc. For each grid cell you have also a binary column per mammal species, ‘found here or not’.

Loading the dataset from ARFF format to a pandas dataframe

Following ideas from this great PyData, Youtube recording: Daniel Chen: Cleaning and Tidying Data in Pandas | PyData DC 2018 — YouTube, I have extracted three tables out of the original table, and made sure to be able to reconnect by introducing a new ‘cell_id’ column, which will be selected in each.

Splitting into ‘df_grid’, ‘df_monthly’, and ‘df_mammals’.

Why did I need three different dataframes? Think of it like a database normalization. The information about the grid, can be kept in one table. Then we have some information per month, ex. mean_temp_march_utm. And finally, the relation regarding the presence of a specific mammal species in a specific cell grid. I did not have a table with a row per mammal species, but this can be added, like, the common name, Latin name, link to Wikipedia article, etc.

For example, for the per month information, I had the following code:

You can see above, that I took the ‘cell_id’ and the relevant columns. Then I’ve used ‘melt’ keeping only ‘cell_id’ from melting, so that we have now the columns ‘variable’ and ‘value’. Then the values in ‘variable’ are being split into two columns: ‘statistics’ and ‘month’, and finally, I use ‘pivot’ to have each statistic in its own column.

Consider the following aggregation:

Aggregation: how many different mammal species?

We can draw (distorted) map, pretending that (longitude, latitude) are a cartesian coordinates:

Distorted map based on (longitude, latitude) as (x, y).

I then moved from the Jupyter notebook to a script (Streamlit), and made use of pydeck to display a proper map:

And while in Streamlit, I had a selector of the mammal species, to fetch its Wikipedia page (displayed in an iframe), and then a map showing the grid cells where this species is found.

Crete spiny mouse is found in..

Overall it was a very nice experience, and pandas and Streamlit, do the job.

You can find a link to the Github repository below, and also a link to YouTube two parts description of above.

The credentials for the data are:

Mitchell-Jones, A.J., Amori, G., Bogdanowicz, W., Krystufek, B., Reijnders, P.J.H., Spitzenberger, F.,Stubbe, M., Thissen, J.B.M., Vohralik, V. & Zima, J. (1999) The atlas of European mammals. Academic

Press, London.

The Github link is: zbenmo/mammals: mammals dataset with streamlit (github.com)

And the YouTube videos are in: https://youtube.com/playlist?list=PL9WgRVRjJLlYXGnHCso7SQAegjbQ8yio6

--

--

Oren Zeev Ben Mordehai

MSc in computer science, an MBA, Practicing Data Science, Data Mining, Machine Learning, and loving it.