Mapping Tasiyagnunpa (Western Meadowlark) migration¶
Introduction to vector data operations
Tasiyagnunpa (or Western Meadowlark, or sturnella neglecta) migrates each year to next on the Great Plains in the United States. Using crowd-sourced observations of these birds, we can see that migration happening throughout the year.
Read more about the Lakota connection to Tasiyagnunpa from Native Sun News Today
Set up your reproducible workflow¶
Import Python libraries¶
We will be getting data from a source called GBIF (Global Biodiversity
Information Facility). We need a package called
pygbif
to access the data, which is not included in your environment.
Install it by running the cell below:
#for gbif (globzl diversity info facility) biology data
%%bash
pip install pygbif
Cell In[8], line 3 pip install pygbif ^ SyntaxError: invalid syntax
!pip install pygbif
Requirement already satisfied: pygbif in /opt/conda/lib/python3.11/site-packages (0.6.4) Requirement already satisfied: requests>2.7 in /opt/conda/lib/python3.11/site-packages (from pygbif) (2.31.0) Requirement already satisfied: requests-cache in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.2.0) Requirement already satisfied: geojson-rewind in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: geomet in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0) Requirement already satisfied: appdirs>=1.4.3 in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.4.4) Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from pygbif) (3.8.4) Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2.2.1) Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2024.2.2) Requirement already satisfied: click in /opt/conda/lib/python3.11/site-packages (from geomet->pygbif) (8.1.7) Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.2.0) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (4.51.0) Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.4.4) Requirement already satisfied: numpy>=1.21 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.24.3) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (24.0) Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (10.3.0) Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (2.9.0) Requirement already satisfied: attrs>=21.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (23.2.0) Requirement already satisfied: cattrs>=22.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (23.2.3) Requirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (4.2.0) Requirement already satisfied: url-normalize>=1.4 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (1.4.3) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->pygbif) (1.16.0)
Your Task: Import packages
Add imports for packages that will help you:
- Work with tabular data
- Work with geospatial vector data
- Make an interactive plot of tabular and/or vector data
#Copied from Textbook unit 3 solutions tab for needed items
#needed imports for intereative maps, occurance data, spatial joins and other mapping/ spatial data stuffz
import calendar
import os
import pathlib
import requests
import time
import zipfile
from getpass import getpass
from glob import glob
import cartopy.crs as ccrs
import geopandas as gpd
import hvplot.pandas
import pandas as pd
import panel as pn
import pygbif.occurrences as occ
import pygbif.species as species
Create a folder for your data¶
For this challenge, you will need to save some data to your computer. We
suggest saving to somewhere in your home folder
(e.g. /home/username
), rather than to your GitHub repository, since
data files can easily become too large for GitHub.
Warning
The home directory is different for every user! Your home directory probably won’t exist on someone else’s computer. Make sure to use code like
pathlib.Path.home()
to compute the home directory on the computer the code is running on. This is key to writing reproducible and interoperable code.
Your Task: Create a project folder
The code below will help you get started with making a project directory
- Replace
'your-project-directory-name-here'
and'your-gbif-data-directory-name-here'
with descriptive names- Run the cell
- (OPTIONAL) Check in the terminal that you created the directory using the command
ls ~/earth-analytics/data
# Create data directory in the home folder
data_dir = os.path.join(
# Home directory
pathlib.Path.home(),
# Earth analytics data directory
'earth-analytics',
'data',
# Project directory
'spceies_distributions_esiil',
)
os.makedirs(data_dir, exist_ok=True) #ensures only happens once
# Define the directory name for GBIF data
gbif_dir = os.path.join(data_dir, 'tasiyagnunpa_data')
Define your study area – the ecoregions of North America¶
Track observations of Taciyagnunpa across the different ecoregions of North America! You should be able to see changes in the number of observations in each ecoregion throughout the year.
Download and save ecoregion boundaries¶
Your Task
- Find the URL for for the level III ecoregion boundaries. You can get ecoregion boundaries from the Environmental Protection Agency (EPA)..
- Replace
your/url/here
with the URL you found, making sure to format it so it is easily readable.- Change all the variable names to descriptive variable names
- Run the cell to download and save the data.
# Set up the ecoregions level III boundary URL
ecoregion3_url = ("https://gaftp.epa.gov/EPADataCommons/ORD/Ecoregions/"
"cec_na/NA_CEC_Eco_Level3.zip")
# Set up a path to save the dataon your machine
ecoregion3_path = os.path.join(data_dir, 'NA_CEC_Eco_Level3.zip')
# Don't download twice
if not os.path.exists(ecoregion3_path):
# Download, and don't check the certificate for the EPA
a_response = requests.get(ecoregion3_url, verify=False)
# Save the binary data to a file
with open(ecoregion3_path, 'wb') as a_file:
a_file.write(a_response.content)
# Open up the ecoregions boundaries
ecoregions_gdf = (
gpd.read_file(ecoregion3_path)
.rename(columns={
'NA_L3NAME': 'name',
'Shape_Area': 'area'})
[['name', 'area', 'geometry']]
)
# We'll name the index so it will match the other data
ecoregions_gdf.index.name = 'ecoregion'
# Plot the ecoregions to check download
ecoregions_gdf.plot()
<Axes: >
Create a simplified GeoDataFrame
for plotting¶
Plotting larger files can be time consuming. The code below will
streamline plotting with hvplot
by simplifying the geometry,
projecting it to a Mercator projection that is compatible with
geoviews
, and cropping off areas in the Arctic.
Your task
Download and save ecoregion boundaries from the EPA:
- Make a copy of your ecoregions
GeoDataFrame
with the.copy()
method, and save it to another variable name. Make sure to do everything else in this cell with your new copy!- Simplify the ecoregions with
.simplify(1000)
, and save it back to thegeometry
column.- Change the Coordinate Reference System (CRS) to Mercator with
.to_crs(ccrs.Mercator())
- Use the plotting code in the cell to check that the plotting runs quickly and looks the way you want, making sure to change
gdf
to YOURGeoDataFrame
name.
# Make a copy of the ecoregions
ecoregions_plot_gdf = ecoregions_gdf.copy()
# Simplify the geometry to speed up processing
ecoregions_plot_gdf.geometry = ecoregions_plot_gdf.simplify(1000)
# Change the CRS to Mercator for mapping
ecoregions_plot_gdf = ecoregions_plot_gdf.to_crs(ccrs.Mercator())
# Check that the plot runs
ecoregions_plot_gdf.hvplot(geo=True, crs=ccrs.Mercator())
Access locations and times of Tasiyagnunpa encounters¶
For this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.
Your task: Explore GBIF
Before your get started, go to the GBIF occurrences search page and explore the data.
Contribute to open data
You can get your own observations added to GBIF using iNaturalist!
Register and log in to GBIF¶
You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.
Tip
If you accidentally enter your credentials wrong, you can set
reset_credentials=True
instead ofreset_credentials=False
reset_credentials = False
# GBIF needs a username, password, and email
credentials = dict(
GBIF_USER=(input, 'enter_user_name'),
GBIF_PWD=(getpass, 'enter_password'),
GBIF_EMAIL=(input, 'enter_email'),
)
for env_variable, (prompt_func, prompt_text) in credentials.items():
# Delete credential from environment if requested
if reset_credentials and (env_variable in os.environ):
os.environ.pop(env_variable)
# Ask for credential and save to environment
if not env_variable in os.environ:
os.environ[env_variable] = prompt_func(prompt_text)
# Query speciesr
species_info = species.name_lookup('Sturnella neglecta', rank='SPECIES')
# Get the first result
first_result = species_info['results'][0]
# Get the species key (nubKey)
species_key = first_result['nubKey']
# Check the result
first_result['species'], species_key
('Sturnella neglecta', 9596413)
Download data from GBIF¶
Your task
Replace
csv_file_pattern
with a string that will match any.csv
file when used in theglob
function. HINT: the character*
represents any number of any values except the file separator (e.g./
)Add parameters to the GBIF download function,
occ.download()
to limit your query to:
- Sturnella Neglecta observations
- in north america (
NORTH_AMERICA
)- from 2023
- with spatial coordinates.
Then, run the download. This can take a few minutes.
# Only download once (cell needs to be run even after download, because. Code later in this cell is needed)
gbif_pattern = os.path.join(gbif_dir, '*.csv')
if not glob(gbif_pattern):
# Submit query to GBIF
gbif_query = occ.download([
"continent = NORTH_AMERICA",
"speciesKey = 9596413",
"hasCoordinate = TRUE",
"year = 2023",
])
download_key = gbif_query[0]
# Wait for the download to build
if not 'GBIF_DOWNLOAD_KEY' in os.environ:
os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]
# Wait for the download to build
wait = occ.download_meta(download_key)['status']
while not wait=='SUCCEEDED':
wait = occ.download_meta(download_key)['status']
time.sleep(5)
# Download GBIF data
download_info = occ.download_get(
os.environ['GBIF_DOWNLOAD_KEY'],
path=data_dir)
# Unzip GBIF data
with zipfile.ZipFile(download_info['path']) as download_zip:
download_zip.extractall(path=gbif_dir)
# Find the extracted .csv file path (take the first result)
gbif_path = glob(gbif_pattern)[0]
Load the GBIF data into Python¶
Your task
- Look at the beginning of the file you downloaded using the code below. What do you think the delimiter is?
- Run the following code cell. What happens?
- Uncomment and modify the parameters of
pd.read_csv()
below until your data loads successfully and you have only the columns you want.
You can use the following code to look at the beginning of your file:
!head $gbif_path
gbifID datasetKey occurrenceID kingdom phylum class order family genus species infraspecificEpithet taxonRank scientificName verbatimScientificName verbatimScientificNameAuthorship countryCode locality stateProvince occurrenceStatus individualCount publishingOrgKey decimalLatitude decimalLongitude coordinateUncertaintyInMeters coordinatePrecision elevation elevationAccuracy depth depthAccuracy eventDate day month year taxonKey speciesKey basisOfRecord institutionCode collectionCode catalogNumber recordNumber identifiedBy dateIdentified license rightsHolder recordedBy typeStatus establishmentMeans lastInterpreted mediaType issue 4726720538 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1679802956 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta US Long Canyon--middle Nebraska PRESENT 8 e2e717bf-551a-4917-bdc9-4fa0f342c530 41.515686 -103.94431 2023-04-08 8 4 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1679802956 CC_BY_4_0 obsr240738 2024-04-17T09:31:02.556Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4727158102 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1681211292 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta US Klickitat Wildlife Area-Canyon Creek, Washington, US (45.879, -121.032) Washington PRESENT 5 e2e717bf-551a-4917-bdc9-4fa0f342c530 45.8792 -121.03192 2023-04-08 8 4 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1681211292 CC_BY_4_0 obsr1602841 2024-04-17T09:31:03.211Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4834728050 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1742803682 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta CA Range Road 251 Alberta PRESENT 2 e2e717bf-551a-4917-bdc9-4fa0f342c530 50.958218 -113.37701 2023-05-27 27 5 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1742803682 CC_BY_4_0 obsr1363203 2024-04-17T09:31:49.066Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4668925094 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD_CAN:OBS1766846197 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta CA Cedoux BBS Stop 48 Saskatchewan PRESENT 3 e2e717bf-551a-4917-bdc9-4fa0f342c530 49.931126 -103.71253 2023-06-18 18 6 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD_CAN OBS1766846197 CC_BY_4_0 obsr144843 2024-04-17T09:32:00.137Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4786905120 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1625422653 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta US Kern Water Bank B California PRESENT 5 e2e717bf-551a-4917-bdc9-4fa0f342c530 35.31122 -119.34128 2023-02-07 7 2 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1625422653 CC_BY_4_0 obsr51532 2024-04-17T10:10:07.351Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4723003520 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1667298003 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta CA Lake Country--Beaver Lake Road--KM 1-5 British Columbia PRESENT 1 e2e717bf-551a-4917-bdc9-4fa0f342c530 50.026264 -119.36027 2023-03-27 27 3 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1667298003 CC_BY_4_0 obsr131954 2024-04-17T10:10:36.812Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4751558999 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1732588615 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta US Eureka Reservoir Montana PRESENT 1 e2e717bf-551a-4917-bdc9-4fa0f342c530 47.88029 -112.31076 2023-05-15 15 5 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1732588615 CC_BY_4_0 obsr197150 2024-04-17T10:11:22.488Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4762562259 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1776201323 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta US 23959–23997 SD-240, Wall US-SD 43.92673, -102.23886 South Dakota PRESENT 7 e2e717bf-551a-4917-bdc9-4fa0f342c530 43.926735 -102.23885 2023-07-05 5 7 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1776201323 CC_BY_4_0 obsr441430 2024-04-17T10:11:51.214Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED 4635193877 4fa7b334-ce0d-4e88-aaae-2e0c138d049e URN:catalog:CLO:EBIRD:OBS1860461976 Animalia Chordata Aves Passeriformes Icteridae Sturnella Sturnella neglecta SPECIES Sturnella neglecta Audubon, 1844 Sturnella neglecta US Barr Lake SP--South End Colorado PRESENT 1 e2e717bf-551a-4917-bdc9-4fa0f342c530 39.93179 -104.76997 2023-10-26 26 10 2023 9596413 9596413 HUMAN_OBSERVATION CLO EBIRD OBS1860461976 CC_BY_4_0 obsr248020 2024-04-17T10:12:41.864Z CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_CONCEPT_ID_IGNORED
# Load the GBIF data
gbif_df = pd.read_csv(
gbif_path,
delimiter='\t',
index_col='gbifID',
usecols=['gbifID', 'decimalLatitude', 'decimalLongitude', 'month']
)
gbif_df.head()
decimalLatitude | decimalLongitude | month | |
---|---|---|---|
gbifID | |||
4726720538 | 41.515686 | -103.94431 | 4 |
4727158102 | 45.879200 | -121.03192 | 4 |
4834728050 | 50.958218 | -113.37701 | 5 |
4668925094 | 49.931126 | -103.71253 | 6 |
4786905120 | 35.311220 | -119.34128 | 2 |
Convert the GBIF data to a GeoDataFrame¶
To plot the GBIF data, we need to convert it to a GeoDataFrame
first.
Your task
- Replace
your_dataframe
with the name of theDataFrame
you just got from GBIF- Replace
longitude_column_name
andlatitude_column_name
with column names from your `DataFrame- Run the code to get a
GeoDataFrame
of the GBIF data.
gbif_gdf = (
gpd.GeoDataFrame(
gbif_df,
geometry=gpd.points_from_xy(
gbif_df.decimalLongitude,
gbif_df.decimalLatitude),
crs="EPSG:4326")
# Select the desired columns
[[ 'month', 'geometry']]
)
gbif_gdf.geometry
gbifID 4726720538 POINT (-103.94431 41.51569) 4727158102 POINT (-121.03192 45.87920) 4834728050 POINT (-113.37701 50.95822) 4668925094 POINT (-103.71253 49.93113) 4786905120 POINT (-119.34128 35.31122) ... 4659909364 POINT (-117.78588 33.65364) 4767048893 POINT (-93.20390 46.77750) 4803174022 POINT (-122.44387 37.45439) 4831633599 POINT (-108.23702 35.04276) 4808227896 POINT (-119.50544 35.00428) Name: geometry, Length: 249046, dtype: geometry
Count the number of observations in each ecosystem, during each month of 2023¶
Identify the ecoregion for each observation¶
You can combine the ecoregions and the observations spatially using
a method called .sjoin()
, which stands for spatial join.
Further reading
Check out the
geopandas
documentation on spatial joins to help you figure this one out. You can also ask your favorite LLM (Large-Language Model, like ChatGPT)
Your task
- Identify the correct values for the
how=
andpredicate=
parameters of the spatial join.- Select only the columns you will need for your plot.
- Run the code.
gbif_ecoregion_gdf = (
ecoregions_gdf
# Match the CRS of the GBIF data and the ecoregions
.to_crs(gbif_gdf.crs)
# Find ecoregion for each observation
.sjoin(
gbif_gdf,
how='inner',
predicate='contains')
# Select the required columns
[['month','name']]
)
gbif_ecoregion_gdf
month | name | |
---|---|---|
ecoregion | ||
57 | 6 | Thompson-Okanogan Plateau |
57 | 9 | Thompson-Okanogan Plateau |
57 | 6 | Thompson-Okanogan Plateau |
57 | 4 | Thompson-Okanogan Plateau |
57 | 8 | Thompson-Okanogan Plateau |
... | ... | ... |
2545 | 6 | Eastern Cascades Slopes and Foothills |
2545 | 6 | Eastern Cascades Slopes and Foothills |
2545 | 5 | Eastern Cascades Slopes and Foothills |
2545 | 5 | Eastern Cascades Slopes and Foothills |
2545 | 4 | Eastern Cascades Slopes and Foothills |
248063 rows × 2 columns
Count the observations in each ecoregion each month¶
Your task:
- Replace
columns_to_group_by
with a list of columns. Keep in mind that you will end up with one row for each group – you want to count the observations in each ecoregion by month.- Select only month/ecosystem combinations that have more than one occurrence recorded, since a single occurrence could be an error.
- Use the
.groupby()
and.mean()
methods to compute the mean occurrences by ecoregion and by month.- Run the code – it will normalize the number of occurrences by month and ecoretion.
occurrence_df = (
gbif_ecoregion_gdf
# For each ecoregion, for each month...
.groupby(['ecoregion', 'month'])
# ...count the number of occurrences
.agg(occurrences=('name', 'count'))
)
display(occurrence_df) #Not needed to run this cell. The display(x_y_z) can be used for identifying bugs, which was occuring with this code.
# Get rid of rare observations (possible misidentification?)
occurrence_df = occurrence_df[occurrence_df.occurrences>1]
# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
occurrence_df
.groupby(['ecoregion'])
.mean()
)
display(mean_occurrences_by_ecoregion) #Not needed to run this cell. The display(x_y_z) can be used for identifying bugs, which was occuring with this code.
# Take the mean by month
mean_occurrences_by_month = (
occurrence_df
.groupby(['month'])
.mean()
)
display(mean_occurrences_by_month) #Not needed to run this cell. The display(x_y_z) can be used for identifying bugs, which was occuring with this code.
# Normalize the observations by the monthly mean throughout the year
occurrence_df['norm_occurrences'] = (
occurrence_df
/ mean_occurrences_by_ecoregion
/ mean_occurrences_by_month
)
occurrence_df
occurrences | ||
---|---|---|
ecoregion | month | |
57 | 2 | 1 |
3 | 132 | |
4 | 397 | |
5 | 660 | |
6 | 481 | |
... | ... | ... |
2545 | 8 | 76 |
9 | 63 | |
10 | 78 | |
11 | 45 | |
12 | 61 |
1120 rows × 1 columns
occurrences | |
---|---|
ecoregion | |
57 | 234.555556 |
59 | 7.714286 |
60 | 799.416667 |
61 | 495.083333 |
62 | 270.100000 |
... | ... |
2538 | 30.833333 |
2540 | 21.125000 |
2541 | 3.000000 |
2544 | 19.272727 |
2545 | 186.333333 |
154 rows × 1 columns
occurrences | |
---|---|
month | |
1 | 222.380282 |
2 | 199.342466 |
3 | 186.316832 |
4 | 364.680000 |
5 | 569.465909 |
6 | 396.635294 |
7 | 221.263889 |
8 | 134.344262 |
9 | 129.160494 |
10 | 155.305263 |
11 | 176.628205 |
12 | 196.833333 |
occurrences | norm_occurrences | ||
---|---|---|---|
ecoregion | month | ||
57 | 3 | 132 | 0.003020 |
4 | 397 | 0.004641 | |
5 | 660 | 0.004941 | |
6 | 481 | 0.005170 | |
7 | 182 | 0.003507 | |
... | ... | ... | ... |
2545 | 8 | 76 | 0.003036 |
9 | 63 | 0.002618 | |
10 | 78 | 0.002695 | |
11 | 45 | 0.001367 | |
12 | 61 | 0.001663 |
983 rows × 2 columns
Plot the Tasiyagnunpa observations by month¶
Your task
- If applicable, replace any variable names with the names you defined previously.
- Replace
column_name_used_for_ecoregion_color
andcolumn_name_used_for_slider
with the column names you wish to use.- Customize your plot with your choice of title, tile source, color map, and size.
# non functioning code to
# #Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = ecoregions_plot_gdf.join(occurrence_df)
# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = occurrence_gdf.total_bounds
# Plot occurrence by ecoregion and month
migration_plot = (
occurrence_gdf
.hvplot(
c='norm_occurrences',
groupby= 'month',
# Use background tiles
geo=True, crs=ccrs.Mercator(), tiles='CartoLight',
title="Meadowlark Occurrence per Month",
xlim=(xmin, xmax), ylim=(ymin, ymax),
frame_height=600,
widget_location='bottom'
)
)
# Save the plot
migration_plot.save('migration.html', embed=True)
# Show the plot
migration_plot
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p6652', ...)
BokehModel(combine_events=True, render_bundle={'docs_json': {'f1c3f18b-2da0-4b86-98ee-55e5b40134d0': {'version…
# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = ecoregions_plot_gdf.join(occurrence_df)
# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = occurrence_gdf.total_bounds
# Define the slider widget
slider = pn.widgets.DiscreteSlider(
name='month',
options={calendar.month_name[i]: i for i in range(1, 13)}
)
# Plot occurrence by ecoregion and month
migration_plot = (
occurrence_gdf
.hvplot(
c='norm_occurrences',
groupby='month',
# Use background tiles
geo=True, crs=ccrs.Mercator(), tiles='CartoLight',
title="Tasiyagnunpa migration",
xlim=(xmin, xmax), ylim=(ymin, ymax),
frame_height=600,
colorbar=False,
widgets={'month': slider},
widget_location='bottom'
)
)
# Save the plot
migration_plot.save('migration.html', embed=True)
# Show the plot
migration_plot
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p11601', ...)
BokehModel(combine_events=True, render_bundle={'docs_json': {'2d7f44ee-3b70-4efa-89a0-6ecc95b8be75': {'version…
::: {.content-visible when-format=“html”} :::
Want an EXTRA CHALLENGE?
Notice that the
month
slider displays numbers instead of the month name. Usepn.widgets.DiscreteSlider()
with theoptions=
parameter set to give the months names. You might want to try asking ChatGPT how to do this, or look at the documentation forpn.widgets.DiscreteSlider()
. This is pretty tricky!
%%capture
%%bash
jupyter nbconvert *.ipynb --to html