Climate change is impacting the way people live around the world¶

::: {.cell .markdown}

Higher highs, lower lows, storms, and smoke – we’re all feeling the effects of climate change. In this workflow, you will take a look at trends in temperature over time in Rapid City, SD.

Important

Read about some of the effects of – and solutions for – climate change on the Pine Ridge Reservation southeast of Rapid City.

Get started with open reproducible science!¶

Open reproducible science makes scientific methods, data and outcomes available to everyone. That means that everyone who wants should be able to find, read, understand, and run your workflows for themselves.

No description has been provided for this image

Image from https://www.earthdata.nasa.gov/esds/open-science/oss-for-eso-workshops

Few if any science projects are 100% open and reproducible (yet!). However, members of the open science community have developed open source tools and practices that can help you move toward that goal. You will learn about many of those tools in the Intro to Earth Data Science textbook. Don’t worry about learning all the tools at once – we’ve picked a few for you to get started with.


Further reading

Read our textbook chapter about open reproducible science.

What does open reproducible science mean to you?

Create a new Markdown cell below this one using the + Markdown button in the upper left.

In the new cell, answer the following questions using a numbered list in Markdown:

  1. In 1-2 sentences, define open reproducible science.
  2. In 1-2 sentences, choose one of the open source tools that you have learned about (i.e. Shell, Git/GitHub, Jupyter Notebook, Python) and explain how it supports open reproducible science.

Human-readable and Machine-readable¶

Create a new Markdown cell below this one using the ESC + b keyboard shortcut.

In the new cell, answer the following question in a Markdown quote: In 1-2 sentences, does this Jupyter Notebook file have a machine-readable name? Explain your answer.

Negatory. Spaces, upper case letters, Exlamation mark! The machine would be so confused!


What the fork?! Who wrote this?¶

Below is a scientific Python workflow. But something’s wrong – The code won’t run! Your task is to follow the instructions below to clean and debug the Python code below so that it runs.

Tip

Don’t worry if you can’t solve every bug right away. We’ll get there! The most important thing is to identify problems with the code and write high-quality GitHub Issues.

At the end, you’ll repeat the workflow for a location and measurement of your choosing.

Alright! Let’s clean up this code. First things first…

Machine-readable file names

Rename this notebook (if necessary) with an expressive and machine-readable file name


Python packages let you use code written by experts around the world¶

Because Python is open source, lots of different people and organizations can contribute (including you!). Many contributions are in the form of packages which do not come with a standard Python download.

Read more

Packages need to be installed and imported.

In the cell below, someone was trying to import the pandas package, which helps us to work with tabular data such as comma-separated value or csv files.

Your task

  1. Correct the typo below to properly import the pandas package under its alias pd.
  2. Run the cell to import pandas

NOTE: **Run your code in the right **environment** to avoid import errors**

We’ve created a coding environment for you to use that already has all the software and libraries you will need! When you try to run some code, you may be prompted to select a kernel. The kernel refers to the version of Python you are using. You should use the base kernel, which should be the default option.

In [2]:
# Import pandas
import pandas as pd

Once you have run the cell above and imported pandas, run the cell below. It is a test cell that will tell you if you completed the task successfully. If a test cell isn’t working the way you expect, check that you ran your code immediately before running the test.

In [3]:
# DO NOT MODIFY THIS TEST CELL
points = 0
try:
    pd.DataFrame()
    points += 5
    print('\u2705 Great work! You correctly imported the pandas library.')
except:
    print('\u274C Oops - pandas was not imported correctly.')
print('You earned {} of 5 points for importing pandas'.format(points))
✅ Great work! You correctly imported the pandas library.
You earned 5 of 5 points for importing pandas

There are more Earth Observation data online than any one person could ever look at¶

NASA’s Earth Observing System Data and Information System (EOSDIS) alone manages over 9PB of data. 1 PB is roughly 100 times the entire Library of Congress (a good approximation of all the books available in the US). It’s all available to you once you learn how to download what you want.

Here we’re using the NOAA National Centers for Environmental Information (NCEI) Access Data Service application progamming interface (API) to request data from their web servers. We will be using data collected as part of the Global Historical Climatology Network daily (GHCNd) from their Climate Data Online library program at NOAA.

For this example we’re requesting daily summary data in Rapid City, CO (station ID USC00396947).

Your task:

  1. Research the Global Historical Climatology Network - Daily data source.
  2. In the cell below, write a 2-3 sentence description of the data source. You should describe:
    • who takes the data
    • where the data were taken
    • what the maximum temperature units are
    • how the data are collected
  3. Include a citation of the data (HINT: See the ‘Data Citation’ tab on the GHCNd overview page).

YOUR DATA DESCRIPTION AND CITATION HERE

The Global Historical Climatology Network - Daily (GHCN-Daily) dataset is data sourced from 30 different sources od daily data observations. Including 90,000 weather stations, 60,000 mostly collect percipitation data while the others collectct various meteroloogical data including daily maximum and minimum temperature, temperature at the time of observation, snowfall, snow depth, etc. Data regularly synced and maintained.¶

This is the data being accessed by the ncei_weather_url¶

Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeill, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ [access date].¶

You can access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️¶

The cell below contains the URL for the data you will use in this part of the notebook. We created this URL by generating what is called an API endpoint using the NCEI API documentation.

Note

An application programming interface (API) is a way for two or more computer programs or components to communicate with each other. It is a type of software interface, offering a service to other pieces of software (Wikipedia).

However, we still have a problem - we can’t get the URL back later on because it isn’t saved in a variable. In other words, we need to give the url a name so that we can request in from Python later (sadly, Python has no ‘hey what was that thingy I typed yesterday?’ function).

Read more

Check out the textbook section on variables

Your task

  1. Pick an expressive variable name for the URL. HINT: click on the Variables button up top to see all your variables. Your new url variable will not be there until you define it and run the code
  2. Reformat the URL so that it adheres to the 79-character PEP-8 line limit.You should see two vertical lines in each cell - don’t let your code go past the second line
  3. At the end of the cell where you define your url variable, call your variable (type out its name) so it can be tested.
In [4]:
ncei_weather_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1'
    '?dataset=daily-summaries'
    '&dataTypes=TOBS,PRCP'
    '&stations=USC00396947'
    '&startDate=1949-10-01'
    '&endDate=2024-02-18'
    '&includeStationName=true'
    '&includeStationLocation=1'
    '&units=standard')
ncei_weather_url
Out[4]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00396947&startDate=1949-10-01&endDate=2024-02-18&includeStationName=true&includeStationLocation=1&units=standard'
In [5]:
# DO NOT MODIFY THIS TEST CELL
resp_url = _
points = 0

if type(resp_url)==str:
    points += 3
    print('\u2705 Great work! You correctly called your url variable.')
else:
    print('\u274C Oops - your url variable was not called correctly.')

if len(resp_url)==218:
    points += 3
    print('\u2705 Great work! Your url is the correct length.')
else:
    print('\u274C Oops - your url variable is not the correct length.')

print('You earned {} of 6 points for defining a url variable'.format(points))
✅ Great work! You correctly called your url variable.
✅ Great work! Your url is the correct length.
You earned 6 of 6 points for defining a url variable

Download and get started working with NCEI data¶

The pandas library you imported can download data from the internet directly into a type of Python object called a DataFrame. In the code cell below, you can see an attempt to do just this. But there are some problems…

You’re ready to fix some code!

Your task is to:

  1. Leave a space between the # and text in the comment and try making the comment more informative

  2. Make any changes needed to get this code to run. HINT: The my_url variable doesn’t exist - you need to replace it with the variable name you chose.

  3. Modify the .read_csv() statement to include the following parameters:

    • index_col='DATE' – this sets the DATE column as the index. Needed for subsetting and resampling later on
    • parse_dates=True – this lets python know that you are working with time-series data, and values in the indexed column are date time objects
    • na_values=['NaN'] – this lets python know how to handle missing values
  4. Clean up the code by using expressive variable names, expressive column names, PEP-8 compliant code, and descriptive comments

Make sure to call your DataFrame by typing it’s name as the last line of your code cell Then, you will be able to run the test cell below and find out if your answer is correct.

In [6]:
rapid_df = pd.read_csv(
    ncei_weather_url, 
    index_col="DATE", 
    parse_dates=True, 
    na_values=["NaN"]
)
rapid_df
Out[6]:
STATION NAME LATITUDE LONGITUDE ELEVATION PRCP TOBS
DATE
1949-10-01 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 51.0
1949-10-02 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 51.0
1949-10-03 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 52.0
1949-10-04 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 45.0
1949-10-05 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 50.0
... ... ... ... ... ... ... ...
2024-02-14 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.15 24.0
2024-02-15 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.03 21.0
2024-02-16 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.20 8.0
2024-02-17 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 NaN
2024-02-18 USC00396947 RAPID CITY 4 NW, SD US 44.12055 -103.28417 1060.4 0.00 NaN

26042 rows × 7 columns

In [7]:
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0

if isinstance(tmax_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')

print('You earned {} of 2 points for downloading data'.format(points))
✅ Great work! You called a DataFrame.
You earned 1 of 2 points for downloading data

HINT: Check out the type() function below - you can use it to check that your data is now in DataFrame type object

In [8]:
# Check that the data was imported into a pandas DataFrame
type(rapid_df)
Out[8]:
pandas.core.frame.DataFrame
In [ ]:
 

Clean up your DataFrame

Use double brackets to only select the columns you want in your DataFrame

Make sure to call your DataFrame by typing it’s name as the last line of your code cell Then, you will be able to run the test cell below and find out if your answer is correct.

In [ ]:
 
In [9]:
rapid_df = rapid_df[['TOBS', 'PRCP']]
rapid_df
Out[9]:
TOBS PRCP
DATE
1949-10-01 51.0 0.00
1949-10-02 51.0 0.00
1949-10-03 52.0 0.00
1949-10-04 45.0 0.00
1949-10-05 50.0 0.00
... ... ...
2024-02-14 24.0 0.15
2024-02-15 21.0 0.03
2024-02-16 8.0 0.20
2024-02-17 NaN 0.00
2024-02-18 NaN 0.00

26042 rows × 2 columns

In [10]:
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0

summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [0.05, 54.53]:
    points += 4
    print('\u2705 Great work! You correctly downloaded data.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for downloading data'.format(points))
❌ Oops - your data are not correct.
You earned 0 of 5 points for downloading data

Plot the precpitation column (PRCP) vs time to explore the data¶

Plotting in Python is easy, but not quite this easy:

In [11]:
rapid_df.plot()
Out[11]:
<Axes: xlabel='DATE'>
No description has been provided for this image

****Label and describe your plots****

Source: https://xkcd.com/833
Source: https://xkcd.com/833

Make sure each plot has:

  • A title that explains where and when the data are from
  • x- and y- axis labels with units where appropriate
  • A legend where appropriate

You’ll always need to add some instructions on labels and how you want your plot to look.

Your task:

  1. Change dataframe to your DataFrame name.
  2. Change y= to the name of your observed temperature column name.
  3. Use the title, ylabel, and xlabel parameters to add key text to your plot.
  4. Adjust the size of your figure using figsize=(x,y) where x is figure width and y is figure height

HINT: labels have to be a type in Python called a string. You can make a string by putting quotes around your label, just like the column names in the sample code (eg y='TOBS').

In [12]:
 #Convert to celcius
rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
rapid_df
/tmp/ipykernel_2524/1770224429.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
Out[12]:
TOBS PRCP TCel
DATE
1949-10-01 51.0 0.00 10.555556
1949-10-02 51.0 0.00 10.555556
1949-10-03 52.0 0.00 11.111111
1949-10-04 45.0 0.00 7.222222
1949-10-05 50.0 0.00 10.000000
... ... ... ...
2024-02-14 24.0 0.15 -4.444444
2024-02-15 21.0 0.03 -6.111111
2024-02-16 8.0 0.20 -13.333333
2024-02-17 NaN 0.00 NaN
2024-02-18 NaN 0.00 NaN

26042 rows × 3 columns

In [13]:
# Plot the data using .plot

rapid_df.plot(
    y='TOBS',
    title='Observed Temperature Over Time, Rapid City, 1994-2024',
    xlabel='Date',
    legend= False,
    ylabel='Temperature (F)')
Out[13]:
<Axes: title={'center': 'Observed Temperature Over Time, Rapid City, 1994-2024'}, xlabel='Date', ylabel='Temperature (F)'>
No description has been provided for this image

Want an EXTRA CHALLENGE?

There are many other things you can do to customize your plot. Take a look at the pandas plotting galleries and the documentation of plot to see if there’s other changes you want to make to your plot. Some possibilities include:

  • Remove the legend since there’s only one data series
  • Increase the figure size
  • Increase the font size
  • Change the colors
  • Use a bar graph instead (usually we use lines for time series, but since this is annual it could go either way)
  • Add a trend line

Not sure how to do any of these? Try searching the internet, or asking an AI!


Convert units

Modify the code below to add a column that includes temperature in Celsius. The code below was written by your colleague. Can you fix this so that it correctly calculates temperature in Celsius and adds a new column?

In [14]:
# Convert to celcius
rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
rapid_df
/tmp/ipykernel_2524/869984472.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
Out[14]:
TOBS PRCP TCel
DATE
1949-10-01 51.0 0.00 10.555556
1949-10-02 51.0 0.00 10.555556
1949-10-03 52.0 0.00 11.111111
1949-10-04 45.0 0.00 7.222222
1949-10-05 50.0 0.00 10.000000
... ... ... ...
2024-02-14 24.0 0.15 -4.444444
2024-02-15 21.0 0.03 -6.111111
2024-02-16 8.0 0.20 -13.333333
2024-02-17 NaN 0.00 NaN
2024-02-18 NaN 0.00 NaN

26042 rows × 3 columns

In [15]:
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0

if isinstance(tmax_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')

summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [0.05, 54.53, 12.52]:
    points += 4
    print('\u2705 Great work! You correctly converted to Celcius.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for converting to Celcius'.format(points))
✅ Great work! You called a DataFrame.
❌ Oops - your data are not correct.
You earned 1 of 5 points for converting to Celcius

Want an EXTRA CHALLENGE?

  1. As you did above, rewrite the code to be more expressive
  2. Using the code below as a framework, write and apply a function that converts to Celcius. > Functions let you reuse code you have already written
  3. You should also rewrite this function and parameter names to be more expressive.
In [16]:
def a_function(a_parameter):
    """Convert temperature to Celcius"""
    return a_parameter # Put your equation in here

dataframe['celcius_column'] = dataframe['fahrenheit_column'].apply(convert)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 5
      2     """Convert temperature to Celcius"""
      3     return a_parameter # Put your equation in here
----> 5 dataframe['celcius_column'] = dataframe['fahrenheit_column'].apply(convert)

NameError: name 'dataframe' is not defined
In [ ]:
 

Subsetting and Resampling¶

Often when working with time-series data you may want to focus on a shorter window of time, or look at weekly, monthly, or annual summaries to help make the analysis more manageable.

Read more

Read more about subsetting and resampling time-series data in our Learning Portal.

For this demonstration, we will look at the last 40 years worth of data and resample to explore a summary from each year that data were recorded.

Your task

  1. Replace start-year and end-year with 1983 and 2023
  2. Replace dataframe with the name of your data
  3. Replace new_dataframe with something more expressive
  4. Call your new variable
  5. Run the cell
In [17]:
# Subset the data
weather1989to2023 = rapid_df.loc['1989':'2023']
weather1989to2023
Out[17]:
TOBS PRCP TCel
DATE
1989-01-01 7.0 0.00 -13.888889
1989-01-02 25.0 0.00 -3.888889
1989-01-03 19.0 0.00 -7.222222
1989-01-04 47.0 0.00 8.333333
1989-01-05 27.0 0.00 -2.777778
... ... ... ...
2023-12-27 32.0 0.31 0.000000
2023-12-28 17.0 0.00 -8.333333
2023-12-29 28.0 0.00 -2.222222
2023-12-30 NaN 0.00 NaN
2023-12-31 NaN 0.00 NaN

12054 rows × 3 columns

In [ ]:
# DO NOT MODIFY THIS TEST CELL
df_resp = _
points = 0

if isinstance(df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')

summary = [round(val, 2) for val in df_resp.mean().values]
if summary == [0.06, 55.67, 13.15]:
    points += 5
    print('\u2705 Great work! You correctly converted to Celcius.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for subsetting'.format(points))
✅ Great work! You called a DataFrame.
❌ Oops - your data are not correct.
You earned 1 of 5 points for subsetting

Now we are ready to calculate annual statistics¶

Here you will resample the 1983-2023 data to look the annual mean values.

Resample your data

  1. Replace new_dataframe with the variable you created in the cell above where you subset the data
  2. Replace 'TIME' with a 'W', 'M', or 'Y' depending on whether you’re doing a weekly, monthly, or yearly summary
  3. Replace STAT with a sum, min, max, or mean cal depending on what kind of statistic you’re interested inculating.
  4. Replace resampled_data with a more expressive variable name
  5. Call your new variable
  6. Run the cell
In [24]:
# Resample the data to look at yearly mean values
minwet89to23 = weather1989to2023.resample('YS').mean()
minwet89to23
Out[24]:
TOBS PRCP TCel
DATE
1989-01-01 38.072829 0.056359 3.373794
1990-01-01 40.363112 0.039068 4.646174
1991-01-01 39.945869 0.056875 4.414372
1992-01-01 39.525862 0.036714 4.181034
1993-01-01 35.522581 0.055881 1.956989
1994-01-01 39.479769 0.034540 4.155427
1995-01-01 39.150568 0.063609 3.972538
1996-01-01 36.547486 0.058785 2.526381
1997-01-01 38.825073 0.057634 3.791707
1998-01-01 40.563739 0.068343 4.757633
1999-01-01 41.688202 0.073104 5.382335
2000-01-01 39.750751 0.050771 4.305973
2001-01-01 43.371134 0.049639 6.317297
2002-01-01 33.482143 0.036126 0.823413
2003-01-01 40.455253 0.039186 4.697363
2004-01-01 38.877828 0.030242 3.821016
2005-01-01 40.627119 0.044620 4.792844
2006-01-01 40.873278 0.042870 4.929599
2007-01-01 34.806931 0.038515 1.559406
2008-01-01 34.204969 0.025892 1.224983
2009-01-01 35.871324 0.053828 2.150735
2010-01-01 39.012384 0.056767 3.895769
2011-01-01 40.313846 0.060282 4.618803
2012-01-01 42.008746 0.019341 5.560415
2013-01-01 38.392638 0.060685 3.551466
2014-01-01 39.211310 0.057726 4.006283
2015-01-01 41.351275 0.057260 5.195153
2016-01-01 42.161644 0.039508 5.645358
2017-01-01 41.013889 0.034082 5.007716
2018-01-01 36.670732 0.057335 2.594851
2019-01-01 36.159544 0.085056 2.310858
2020-01-01 41.023438 0.044006 5.013021
2021-01-01 40.363248 0.032225 4.646249
2022-01-01 39.331395 0.028421 4.072997
2023-01-01 40.144578 0.046313 4.524766
In [ ]:
# DO NOT MODIFY THIS TEST CELL
df_resp = _
points = 0

if isinstance(df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')

summary = [round(val, 2) for val in df_resp.mean().values]
if summary == [0.06, 55.37, 12.99]:
    points += 5
    print('\u2705 Great work! You correctly converted to Celcius.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for resampling'.format(points))
✅ Great work! You called a DataFrame.
❌ Oops - your data are not correct.
You earned 1 of 5 points for resampling

Plot your resampled data

In [26]:
# Plot mean annual temperature 
minwet89to23.plot(
    y='TCel',
    title='Observed Mean Temperature Over Time, Rapid City, 1989-2023',
    xlabel='Date',
    ylabel='Temperature (C)',
    legend= False)
Out[26]:
<Axes: title={'center': 'Observed Mean Temperature Over Time, Rapid City, 1989-2023'}, xlabel='Date', ylabel='Temperature (C)'>
No description has been provided for this image

Describe your plot

We like to use an approach called “Assertion-Evidence” for presenting scientific results. There’s a lot of video tutorials and example talks available on the Assertion-Evidence web page. The main thing you need to do now is to practice writing a message or headline rather than descriptions or topic sentences for the plot you just made (what they refer to as “visual evidence”).

For example, it would be tempting to write something like “A plot of maximum annual temperature in Rapid City, Colorado over time (1983-2023)”. However, this doesn’t give the reader anything to look at, or explain why we made this particular plot (we know, you made this one because we told you to)

Some alternatives for different plots of Rapid City temperature that are more of a starting point for a presentation or conversation are:

  • Rapid City, SD experienced cooler than average temperatures in 1995
  • Temperatures in Rapid City, SD appear to be on the rise over the past 40 years
  • Maximum annual temperatures in Rapid City, CO are becoming more variable over the previous 40 years

We could back up some of these claims with further analysis included later on, but we want to make sure that our audience has some guidance on what to look for in the plot.


**Temperatures in Rapid City, ND are trending upwards **¶

The yearly mean temperature in Rapid City appear to be rising since 1989. A trend line would be helpful, yet the rise would also match what we expect with the warming climate.

Graph of lows in Rapid City, CO
Lows in Rapid city over the decades

THIS ISN’T THE END! 😄¶

Don’t forget to reproduce your analysis in a new location or time!

No description has been provided for this image

Image source: https://www.independent.co.uk/climate-change/news/by-the-left-quick-march-the-emperor-penguins-migration-1212420.html


Your turn: pick a new location and/or measurement to plot 🌏 📈¶

Below (or in a new notebook!), recreate the workflow you just did in a place that interests you OR with a different measurement. See the instructions above to adapt the URL that we created for Rapid City, CO using the NCEI API. You will need to make your own new Markdown and Code cells below this one, or create a new notebook.


Congratulations, you’re almost done with this coding challenge 🤩 – now make sure that your code is reproducible¶

No description has been provided for this image

Image source: https://dfwurbanwildlife.com/2018/03/25/chris-jacksons-dfw-urban-wildlife/snow-geese-galore/

Your task

  1. If you didn’t already, go back to the code you modified about and write more descriptive comments so the next person to use this code knows what it does.
  1. Make sure to Restart and Run all up at the top of your notebook. This will clear all your variables and make sure that your code runs in the correct order. It will also export your work in Markdown format, which you can put on your website.

Always run your code start to finish before submitting!

Before you commit your work, make sure it runs reproducibly by clicking:

  1. Restart (this button won’t appear until you’ve run some code), then
  2. Run All

BONUS: Create a shareable Markdown of your work¶

Below is some code that you can run that will save a Markdown file of your work that is easily shareable and can be uploaded to GitHub Pages. You can use it as a starting point for writing your portfolio post!

In [ ]:
# This cell is a overview of the entire process
#1stImport pandas
import pandas as pd




# Check that the data was imported into a pandas DataFrame
type(rapid_df)

# select variables of interest
rapid_df = rapid_df[['TOBS', 'PRCP']]
rapid_df

#Convert to celcius
rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
rapid_df

# Subset the data for more focused anaylysis, set time frame and name it
weather1983to2023 = rapid_df.loc['1983':'2023']
weather1983to2023

# Resample the data to look at a `sum`, `min`, `max`, or `mean`
#a `'W'`, `'M'`, or `'Y'` depending on whether you’re doing a weekly, monthly, or yearly look
#makesure to resample above subset
#xxx= is the new data set/ variable that contains the changes
minwet83to23 = weather1983to2023.resample('M').min()
minwet83to23
In [27]:
# 1stImport pandas
import pandas as pd
import numpy as np  # for adding trendline to plot
import matplotlib.pyplot as plt  # for plotting
In [28]:
# get  data
lkwd_ncei_weather_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1'
    '?dataset=daily-summaries'
    '&dataTypes=TOBS,PRCP'
    '&stations=USC00054762'
    '&startDate=1962-07-28'
    '&endDate=2024-05-05'
    '&includeStationName=true'
    '&includeStationLocation=1'
    '&units=standard')
lkwd_ncei_weather_url
Out[28]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00054762&startDate=1962-07-28&endDate=2024-05-05&includeStationName=true&includeStationLocation=1&units=standard'
In [30]:
lakewood_df = pd.read_csv(
    lkwd_ncei_weather_url, 
    index_col="DATE", 
    parse_dates=True, 
    na_values=["NaN"])
lakewood_df
Out[30]:
STATION NAME LATITUDE LONGITUDE ELEVATION PRCP TOBS
DATE
1962-07-28 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 76.0
1962-07-29 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 78.0
1962-07-30 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 74.0
1962-07-31 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 73.0
1962-08-01 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 75.0
... ... ... ... ... ... ... ...
2024-04-27 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.88 NaN
2024-04-28 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.76 NaN
2024-04-29 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 NaN
2024-04-30 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 NaN
2024-05-05 USC00054762 LAKEWOOD, CO US 39.7488 -105.1205 1715.4 0.00 46.0

22449 rows × 7 columns

In [31]:
# Check that the data was imported into a pandas DataFrame
type(lakewood_df)
Out[31]:
pandas.core.frame.DataFrame
In [32]:
# select variables of interest
lakewood_df = lakewood_df[['PRCP']]
lakewood_df
Out[32]:
PRCP
DATE
1962-07-28 0.00
1962-07-29 0.00
1962-07-30 0.00
1962-07-31 0.00
1962-08-01 0.00
... ...
2024-04-27 0.88
2024-04-28 0.76
2024-04-29 0.00
2024-04-30 0.00
2024-05-05 0.00

22449 rows × 1 columns

In [ ]:
#Convert to celcius
lakewood_df['TCel'] = ((lakewood_df['TOBS'] - 32) * (5 / 9))
lakewood_df
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key)
   3804 try:
-> 3805     return self._engine.get_loc(casted_key)
   3806 except KeyError as err:

File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'TOBS'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[34], line 2
      1 #Convert to celcius
----> 2 lakewood_df['TCel'] = ((lakewood_df['TOBS'] - 32) * (5 / 9))
      3 lakewood_df

File /opt/conda/lib/python3.11/site-packages/pandas/core/frame.py:4090, in DataFrame.__getitem__(self, key)
   4088 if self.columns.nlevels > 1:
   4089     return self._getitem_multilevel(key)
-> 4090 indexer = self.columns.get_loc(key)
   4091 if is_integer(indexer):
   4092     indexer = [indexer]

File /opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
   3807     if isinstance(casted_key, slice) or (
   3808         isinstance(casted_key, abc.Iterable)
   3809         and any(isinstance(x, slice) for x in casted_key)
   3810     ):
   3811         raise InvalidIndexError(key)
-> 3812     raise KeyError(key) from err
   3813 except TypeError:
   3814     # If we have a listlike key, _check_indexing_error will raise
   3815     #  InvalidIndexError. Otherwise we fall through and re-raise
   3816     #  the TypeError.
   3817     self._check_indexing_error(key)

KeyError: 'TOBS'
In [35]:
# Subset the data for more focused anaylysis, set time frame and name it
lakewood_TP_1970to2023 = lakewood_df.loc['1970':'2023']
lakewood_TP_1970to2023
#run to check
Out[35]:
PRCP
DATE
1970-01-01 0.00
1970-01-02 0.00
1970-01-03 0.00
1970-01-04 0.00
1970-01-05 0.05
... ...
2023-12-27 0.03
2023-12-28 0.00
2023-12-29 0.00
2023-12-30 0.00
2023-12-31 0.00

19618 rows × 1 columns

In [38]:
# Resample the data to look at a `sum`, `min`, `max`, or `mean`
#a `'W'`, `'M'`, or `'Y'` depending on whether you’re doing a weekly, monthly, or yearly look
#makesure to resample above subset
#xxx= is the new data set/ variable that contains the changes
lakewood_prcp70to23 = lakewood_TP_1970to2023.resample('YS').sum()
lakewood_prcp70to23
Out[38]:
PRCP
DATE
1970-01-01 13.61
1971-01-01 13.84
1972-01-01 15.95
1973-01-01 24.98
1974-01-01 13.29
1975-01-01 18.07
1976-01-01 16.51
1977-01-01 8.97
1978-01-01 12.72
1979-01-01 19.75
1980-01-01 13.59
1981-01-01 11.18
1982-01-01 18.14
1983-01-01 21.99
1984-01-01 19.66
1985-01-01 15.20
1986-01-01 15.96
1987-01-01 24.27
1988-01-01 15.92
1989-01-01 16.79
1990-01-01 17.79
1991-01-01 19.30
1992-01-01 15.87
1993-01-01 14.46
1994-01-01 16.46
1995-01-01 20.08
1996-01-01 14.65
1997-01-01 18.66
1998-01-01 19.91
1999-01-01 21.25
2000-01-01 13.60
2001-01-01 16.06
2002-01-01 10.45
2003-01-01 17.75
2004-01-01 22.84
2005-01-01 16.72
2006-01-01 14.92
2007-01-01 16.33
2008-01-01 11.25
2009-01-01 23.48
2010-01-01 12.48
2011-01-01 20.43
2012-01-01 14.36
2013-01-01 23.59
2014-01-01 20.43
2015-01-01 27.45
2016-01-01 14.00
2017-01-01 16.04
2018-01-01 14.56
2019-01-01 17.91
2020-01-01 11.38
2021-01-01 15.30
2022-01-01 13.44
2023-01-01 22.04
In [39]:
lakewood_prcp70to23.plot(
    y='PRCP',
    title='Yearly Precipitation, Lakewood, CO, 1983-2023',
    xlabel='Date',
    kind= 'bar',
    legend= False,
    ylabel='Precipitation (in.)')
#run to check
Out[39]:
<Axes: title={'center': 'Yearly Precipitation, Lakewood, CO, 1983-2023'}, xlabel='Date', ylabel='Precipitation (in.)'>
No description has been provided for this image
In [40]:
# Resetting the index
lakewood_prcp70to23 = lakewood_prcp70to23.reset_index()
lakewood_prcp70to23
#run to check
Out[40]:
DATE PRCP
0 1970-01-01 13.61
1 1971-01-01 13.84
2 1972-01-01 15.95
3 1973-01-01 24.98
4 1974-01-01 13.29
5 1975-01-01 18.07
6 1976-01-01 16.51
7 1977-01-01 8.97
8 1978-01-01 12.72
9 1979-01-01 19.75
10 1980-01-01 13.59
11 1981-01-01 11.18
12 1982-01-01 18.14
13 1983-01-01 21.99
14 1984-01-01 19.66
15 1985-01-01 15.20
16 1986-01-01 15.96
17 1987-01-01 24.27
18 1988-01-01 15.92
19 1989-01-01 16.79
20 1990-01-01 17.79
21 1991-01-01 19.30
22 1992-01-01 15.87
23 1993-01-01 14.46
24 1994-01-01 16.46
25 1995-01-01 20.08
26 1996-01-01 14.65
27 1997-01-01 18.66
28 1998-01-01 19.91
29 1999-01-01 21.25
30 2000-01-01 13.60
31 2001-01-01 16.06
32 2002-01-01 10.45
33 2003-01-01 17.75
34 2004-01-01 22.84
35 2005-01-01 16.72
36 2006-01-01 14.92
37 2007-01-01 16.33
38 2008-01-01 11.25
39 2009-01-01 23.48
40 2010-01-01 12.48
41 2011-01-01 20.43
42 2012-01-01 14.36
43 2013-01-01 23.59
44 2014-01-01 20.43
45 2015-01-01 27.45
46 2016-01-01 14.00
47 2017-01-01 16.04
48 2018-01-01 14.56
49 2019-01-01 17.91
50 2020-01-01 11.38
51 2021-01-01 15.30
52 2022-01-01 13.44
53 2023-01-01 22.04
In [41]:
# Remove year from DATE column and add as new variable
lakewood_prcp70to23['YEAR'] = lakewood_prcp70to23['DATE'].dt.year
lakewood_prcp70to23
#run to check
Out[41]:
DATE PRCP YEAR
0 1970-01-01 13.61 1970
1 1971-01-01 13.84 1971
2 1972-01-01 15.95 1972
3 1973-01-01 24.98 1973
4 1974-01-01 13.29 1974
5 1975-01-01 18.07 1975
6 1976-01-01 16.51 1976
7 1977-01-01 8.97 1977
8 1978-01-01 12.72 1978
9 1979-01-01 19.75 1979
10 1980-01-01 13.59 1980
11 1981-01-01 11.18 1981
12 1982-01-01 18.14 1982
13 1983-01-01 21.99 1983
14 1984-01-01 19.66 1984
15 1985-01-01 15.20 1985
16 1986-01-01 15.96 1986
17 1987-01-01 24.27 1987
18 1988-01-01 15.92 1988
19 1989-01-01 16.79 1989
20 1990-01-01 17.79 1990
21 1991-01-01 19.30 1991
22 1992-01-01 15.87 1992
23 1993-01-01 14.46 1993
24 1994-01-01 16.46 1994
25 1995-01-01 20.08 1995
26 1996-01-01 14.65 1996
27 1997-01-01 18.66 1997
28 1998-01-01 19.91 1998
29 1999-01-01 21.25 1999
30 2000-01-01 13.60 2000
31 2001-01-01 16.06 2001
32 2002-01-01 10.45 2002
33 2003-01-01 17.75 2003
34 2004-01-01 22.84 2004
35 2005-01-01 16.72 2005
36 2006-01-01 14.92 2006
37 2007-01-01 16.33 2007
38 2008-01-01 11.25 2008
39 2009-01-01 23.48 2009
40 2010-01-01 12.48 2010
41 2011-01-01 20.43 2011
42 2012-01-01 14.36 2012
43 2013-01-01 23.59 2013
44 2014-01-01 20.43 2014
45 2015-01-01 27.45 2015
46 2016-01-01 14.00 2016
47 2017-01-01 16.04 2017
48 2018-01-01 14.56 2018
49 2019-01-01 17.91 2019
50 2020-01-01 11.38 2020
51 2021-01-01 15.30 2021
52 2022-01-01 13.44 2022
53 2023-01-01 22.04 2023
In [43]:
# Plot PRCP using .plot()
lakewood_prcp70to23.plot(y='PRCP',
                            x='YEAR')
#run to check
Out[43]:
<Axes: xlabel='YEAR'>
No description has been provided for this image
In [56]:
# From ChatGPT

# Define our figure and axis objects 
fig, ax = plt.subplots(figsize=(6,4))

# Compute linear regression
x = lakewood_prcp70to23['YEAR']
y = lakewood_prcp70to23['PRCP']

# Compute the slope (m) and intercept (b) of the line y = mx + b
m, b = np.polyfit(x, y, 1)

# Plot PRCP vs. YEAR as scatter plot
ax.bar(x, y, color='skyblue', edgecolor='white')

# Plot trend line
ax.plot(x, m*x + b, color='blue', label=f'Trend Line (R-squared = {np.corrcoef(x, y)[0,1]**2:.2f})')

# Add legend
ax.legend()

# Add title and axis label
ax.set(title="Total Annual Precipitaion\nLakewood, CO (1970-2023)",
       ylabel="Precipitation (in.)")
#run to check
Out[56]:
[Text(0.5, 1.0, 'Total Annual Precipitaion\nLakewood, CO (1970-2023)'),
 Text(0, 0.5, 'Precipitation (in.)')]
No description has been provided for this image
In [ ]:
#The Global Historical Climatology Network - Daily (GHCN-Daily) dataset is data sourced from 30 different sources od daily data observations. Including 90,000 weather stations, 60,000 mostly collect percipitation data while the others collectct various meteroloogical data including daily maximum and minimum temperature, temperature at the time of observation, snowfall, snow depth, etc. Data regularly synced and maintained.
#This is the data being accessed by the ncei_weather_url
#Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeill, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Hisorical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ [access date].
In [ ]:
#The sum code of creating the Precipitation bar graph with trendline in Lakewood. Reproducible data from a URL linking to ncei noaa weather data
# 1stImport pandas
import pandas as pd
import numpy as np  # for adding trendline to plot
import matplotlib.pyplot as plt  # for plotting
                                #The kernel refers to the version of Python you are using. You should use the base kernel, which should be the default option.

# get  data
lkwd_ncei_weather_url = (
    'https://www.ncei.noaa.gov/access/services/data/v1' #links to main page
    '?dataset=daily-summaries'     #the rest of these shows where the rest of the data is from
    '&dataTypes=TOBS,PRCP'         # I used the https://www.ncdc.noaa.gov/cdo-web/search to search various stations, see what data is available
    '&stations=USC00054762'        # and used the station number USC****etc
    '&startDate=1962-07-28'         # can choose dates here, but can be more easily managed with later code
    '&endDate=2024-05-05'
    '&includeStationName=true'
    '&includeStationLocation=1'
    '&units=standard')
lkwd_ncei_weather_url

#This makes it readable in panda, 
lakewood_df = pd.read_csv(
    lkwd_ncei_weather_url, 
    index_col="DATE", #sets the date as index, year and trendline will be extracted later
    parse_dates=True, 
    na_values=["NaN"])
lakewood_df
#run this

# Check that the data was imported into a pandas DataFrame
type(lakewood_df)
#run to check

# select variables of interest
lakewood_df = lakewood_df[['PRCP']]
lakewood_df
#run to check

#Convert to celcius. If working with temperature data, this may be useful.  
lakewood_df['TCel'] = ((lakewood_df['TOBS'] - 32) * (5 / 9))
lakewood_df   #this creates at TCel column with the calculated F to C Temperatures (TOBS stands for temperature observed)


# Subset the data for more focused anaylysis, set time frame and name it
lakewood_TP_1970to2023 = lakewood_df.loc['1970':'2023']
lakewood_TP_1970to2023
# run to scheck

# Subset the data for more focused anaylysis, set time frame and name it
lakewood_TP_1970to2023 = lakewood_df.loc['1970':'2023']
lakewood_TP_1970to2023
#run to check

# Resample the data to look at a `sum`, `min`, `max`, or `mean`
#a `'W'`, `'M'`, or `'Y'` depending on whether you’re doing a weekly, monthly, or yearly look
#makesure to resample above subset
#xxx= is the new data set/ variable that contains the changes
lakewood_prcp70to23 = lakewood_TP_1970to2023.resample('YS').sum()
lakewood_prcp70to23
#run to check

lakewood_prcp70to23.plot(
    y='PRCP',
    title='Yearly Precipitation, Lakewood, CO, 1983-2023',
    xlabel='Date',
    kind= 'bar',
    legend= False,
    ylabel='Precipitation (in.)')
#run to check

# Resetting the index
lakewood_prcp70to23 = lakewood_prcp70to23.reset_index()
lakewood_prcp70to23
#run to check

# Remove year from DATE column and add as new variable
lakewood_prcp70to23['YEAR'] = lakewood_prcp70to23['DATE'].dt.year
lakewood_prcp70to23
#run to check

# Plot PRCP using .plot()
lakewood_prcp70to23.plot(y='PRCP',
                            x='YEAR')
#run to check

# From ChatGPT

# Define our figure and axis objects 
fig, ax = plt.subplots(figsize=(6,4))

# Compute linear regression
x = lakewood_prcp70to23['YEAR']
y = lakewood_prcp70to23['PRCP']

# Compute the slope (m) and intercept (b) of the line y = mx + b
m, b = np.polyfit(x, y, 1)

# Plot PRCP vs. YEAR as scatter plot
ax.bar(x, y, color='skyblue', edgecolor='white')

# Plot trend line
ax.plot(x, m*x + b, color='blue', label=f'Trend Line (R-squared = {np.corrcoef(x, y)[0,1]**2:.2f})')

# Add legend
ax.legend()

# Add title and axis label
ax.set(title="Total Annual Precipitaion\nLakewood, CO (1970-2023)",
       ylabel="Precipitation (in.)")
#run to check
In [ ]:
%%capture
%%bash
jupyter nbconvert *.ipynb --to markdown
In [ ]:
%%capture
%%bash
jupyter nbconvert *.ipynb --to html