Climate change is impacting the way people live around the world¶
::: {.cell .markdown}
Higher highs, lower lows, storms, and smoke – we’re all feeling the
effects of climate change. In this workflow, you will take a look at
trends in temperature over time in Rapid City, SD.
Open reproducible
science
makes scientific methods, data and outcomes available to everyone. That
means that everyone who wants should be able to find, read,
understand, and run your workflows for themselves.
Few if any science projects are 100% open and reproducible (yet!).
However, members of the open science community have developed open
source tools and practices that can help you move toward that goal. You
will learn about many of those tools in the Intro to Earth Data Science
textbook.
Don’t worry about learning all the tools at once – we’ve picked a few
for you to get started with.
Create a
new Markdown cell below this one using the + Markdown button in the
upper left.
In the
new cell, answer the following questions using a numbered list in
Markdown:
In 1-2 sentences, define open reproducible science.
In 1-2 sentences, choose one of the open source tools that you
have learned about (i.e. Shell, Git/GitHub, Jupyter Notebook,
Python) and explain how it supports open reproducible science.
Create a
new Markdown cell below this one using the ESC + b
keyboard shortcut.
In the new
cell, answer the following question in a Markdown quote: In 1-2
sentences, does this Jupyter Notebook file have a machine-readable name?
Explain your answer.
Negatory. Spaces, upper case letters, Exlamation mark! The machine would be so confused!
Below is a scientific Python workflow. But something’s wrong – The code
won’t run! Your task is to follow the instructions below to clean and
debug the Python code below so that it runs.
Tip
Don’t worry if you can’t solve every bug right away. We’ll get there!
The most important thing is to identify problems with the code and
write high-quality GitHub
Issues.
At the end, you’ll repeat the workflow for a location and
measurement of your choosing.
Alright! Let’s clean up this code. First things first…
Python packages let you use code written by experts around the world¶
Because Python is open source, lots of different people and
organizations can contribute (including you!). Many contributions are in
the form of packages which do not come with a standard Python
download.
Correct the typo below to properly import the pandas package under
its alias pd.
Run the cell to import pandas
NOTE: **Run your code in the right **environment** to avoid import
errors**
We’ve created a coding environment for you to use that already has
all the software and libraries you will need! When you try to run some
code, you may be prompted to select a kernel. The kernel
refers to the version of Python you are using. You should use the
base kernel, which should be the default option.
In [2]:
# Import pandasimportpandasaspd
Once you have run the cell above and imported pandas, run the cell
below. It is a test cell that will tell you if you completed the task
successfully. If a test cell isn’t working the way you expect, check
that you ran your code immediately before running the test.
In [3]:
# DO NOT MODIFY THIS TEST CELLpoints=0try:pd.DataFrame()points+=5print('\u2705 Great work! You correctly imported the pandas library.')except:print('\u274C Oops - pandas was not imported correctly.')print('You earned {} of 5 points for importing pandas'.format(points))
✅ Great work! You correctly imported the pandas library.
You earned 5 of 5 points for importing pandas
There are more Earth Observation data online than any one person could ever look at¶
Here we’re using the NOAA National Centers for Environmental Information
(NCEI) Access Data
Service
application progamming interface (API) to request data from their web
servers. We will be using data collected as part of the Global
Historical Climatology Network daily (GHCNd) from their Climate Data
Online library program at
NOAA.
In the cell below, write a 2-3 sentence description of the data
source. You should describe:
who takes the data
where the data were taken
what the maximum temperature units are
how the data are collected
Include a citation of the data (HINT: See the ‘Data Citation’
tab on the GHCNd overview page).
YOUR DATA DESCRIPTION AND CITATION HERE
The Global Historical Climatology Network - Daily (GHCN-Daily) dataset is data sourced from 30 different sources od daily data observations. Including 90,000 weather stations, 60,000 mostly collect percipitation data while the others collectct various meteroloogical data including daily maximum and minimum temperature, temperature at the time of observation, snowfall, snow depth, etc. Data regularly synced and maintained.¶
This is the data being accessed by the ncei_weather_url¶
Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeill, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ [access date].¶
You can access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️¶
The cell below contains the URL for the data you will use in this part
of the notebook. We created this URL by generating what is called an
API endpoint using the NCEI API
documentation.
Note
An application programming interface (API) is a way for two or
more computer programs or components to communicate with each other.
It is a type of software interface, offering a service to other pieces
of software (Wikipedia).
However, we still have a problem - we can’t get the URL back later on
because it isn’t saved in a variable. In other words, we need to
give the url a name so that we can request in from Python later (sadly,
Python has no ‘hey what was that thingy I typed yesterday?’ function).
Pick an expressive variable name for the URL. HINT: click on the
Variables button up top to see all your variables. Your new url
variable will not be there until you define it and run the code
Reformat the URL so that it adheres to the 79-character PEP-8
line
limit.You
should see two vertical lines in each cell - don’t let your code
go past the second line
At the end of the cell where you define your url variable, call
your variable (type out its name) so it can be tested.
# DO NOT MODIFY THIS TEST CELLresp_url=_points=0iftype(resp_url)==str:points+=3print('\u2705 Great work! You correctly called your url variable.')else:print('\u274C Oops - your url variable was not called correctly.')iflen(resp_url)==218:points+=3print('\u2705 Great work! Your url is the correct length.')else:print('\u274C Oops - your url variable is not the correct length.')print('You earned {} of 6 points for defining a url variable'.format(points))
✅ Great work! You correctly called your url variable.
✅ Great work! Your url is the correct length.
You earned 6 of 6 points for defining a url variable
The pandas library you imported can download data from the internet
directly into a type of Python object called a DataFrame. In the
code cell below, you can see an attempt to do just this. But there are
some problems…
You’re ready to fix some code!
Your task is to:
Leave a space between the # and text in the comment and try
making the comment more informative
Make any changes needed to get this code to run. HINT: The
my_url variable doesn’t exist - you need to replace it with the
variable name you chose.
Modify the .read_csv() statement to include the following
parameters:
index_col='DATE' – this sets the DATE column as the index.
Needed for subsetting and resampling later on
parse_dates=True – this lets python know that you are
working with time-series data, and values in the indexed
column are date time objects
na_values=['NaN'] – this lets python know how to handle
missing values
Clean up the code by using expressive variable names,
expressive column names, PEP-8 compliant code, and
descriptive comments
Make sure to call your DataFrame by typing it’s name as the last
line of your code cell Then, you will be able to run the test cell
below and find out if your answer is correct.
# DO NOT MODIFY THIS TEST CELLtmax_df_resp=_points=0ifisinstance(tmax_df_resp,pd.DataFrame):points+=1print('\u2705 Great work! You called a DataFrame.')else:print('\u274C Oops - make sure to call your DataFrame for testing.')print('You earned {} of 2 points for downloading data'.format(points))
✅ Great work! You called a DataFrame.
You earned 1 of 2 points for downloading data
HINT: Check out the type() function below - you can use it to check
that your data is now in DataFrame type object
In [8]:
# Check that the data was imported into a pandas DataFrametype(rapid_df)
Out[8]:
pandas.core.frame.DataFrame
In [ ]:
Clean up your DataFrame
Use double brackets to only select the columns you want in your
DataFrame
Make sure to call your DataFrame by typing it’s name as the last
line of your code cell Then, you will be able to run the test cell
below and find out if your answer is correct.
In [ ]:
In [9]:
rapid_df=rapid_df[['TOBS','PRCP']]rapid_df
Out[9]:
TOBS
PRCP
DATE
1949-10-01
51.0
0.00
1949-10-02
51.0
0.00
1949-10-03
52.0
0.00
1949-10-04
45.0
0.00
1949-10-05
50.0
0.00
...
...
...
2024-02-14
24.0
0.15
2024-02-15
21.0
0.03
2024-02-16
8.0
0.20
2024-02-17
NaN
0.00
2024-02-18
NaN
0.00
26042 rows × 2 columns
In [10]:
# DO NOT MODIFY THIS TEST CELLtmax_df_resp=_points=0summary=[round(val,2)forvalintmax_df_resp.mean().values]ifsummary==[0.05,54.53]:points+=4print('\u2705 Great work! You correctly downloaded data.')else:print('\u274C Oops - your data are not correct.')print('You earned {} of 5 points for downloading data'.format(points))
❌ Oops - your data are not correct.
You earned 0 of 5 points for downloading data
Plot the precpitation column (PRCP) vs time to explore the data¶
Plotting in Python is easy, but not quite this easy:
In [11]:
rapid_df.plot()
Out[11]:
<Axes: xlabel='DATE'>
****Label and describe your plots****
Source: https://xkcd.com/833
Make sure each plot has:
A title that explains where and when the data are from
x- and y- axis labels with units where appropriate
A legend where appropriate
You’ll always need to add some instructions on labels and how you want
your plot to look.
Your task:
Change dataframe to yourDataFrame name.
Change y= to the name of your observed temperature column
name.
Use the title, ylabel, and xlabel parameters to add key text
to your plot.
Adjust the size of your figure using figsize=(x,y) where x is
figure width and y is figure height
HINT: labels have to be a type in Python called a string.
You can make a string by putting quotes around your label, just like
the column names in the sample code (eg y='TOBS').
In [12]:
#Convert to celciusrapid_df['TCel']=((rapid_df['TOBS']-32)*(5/9))rapid_df
/tmp/ipykernel_2524/1770224429.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
Out[12]:
TOBS
PRCP
TCel
DATE
1949-10-01
51.0
0.00
10.555556
1949-10-02
51.0
0.00
10.555556
1949-10-03
52.0
0.00
11.111111
1949-10-04
45.0
0.00
7.222222
1949-10-05
50.0
0.00
10.000000
...
...
...
...
2024-02-14
24.0
0.15
-4.444444
2024-02-15
21.0
0.03
-6.111111
2024-02-16
8.0
0.20
-13.333333
2024-02-17
NaN
0.00
NaN
2024-02-18
NaN
0.00
NaN
26042 rows × 3 columns
In [13]:
# Plot the data using .plotrapid_df.plot(y='TOBS',title='Observed Temperature Over Time, Rapid City, 1994-2024',xlabel='Date',legend=False,ylabel='Temperature (F)')
Out[13]:
<Axes: title={'center': 'Observed Temperature Over Time, Rapid City, 1994-2024'}, xlabel='Date', ylabel='Temperature (F)'>
Want an EXTRA CHALLENGE?
There are many other things you can do to customize your plot. Take a
look at the pandas plotting
galleries
and the documentation of
plot
to see if there’s other changes you want to make to your plot. Some
possibilities include:
Remove the legend since there’s only one data series
Increase the figure size
Increase the font size
Change the colors
Use a bar graph instead (usually we use lines for time series, but
since this is annual it could go either way)
Add a trend line
Not sure how to do any of these? Try searching the internet, or asking
an AI!
Convert units
Modify the code below to add a column that includes temperature in
Celsius. The code below was written by your colleague. Can you fix
this so that it correctly calculates temperature in Celsius and adds a
new column?
In [14]:
# Convert to celciusrapid_df['TCel']=((rapid_df['TOBS']-32)*(5/9))rapid_df
/tmp/ipykernel_2524/869984472.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
rapid_df['TCel'] = ((rapid_df['TOBS'] - 32) * (5 / 9))
Out[14]:
TOBS
PRCP
TCel
DATE
1949-10-01
51.0
0.00
10.555556
1949-10-02
51.0
0.00
10.555556
1949-10-03
52.0
0.00
11.111111
1949-10-04
45.0
0.00
7.222222
1949-10-05
50.0
0.00
10.000000
...
...
...
...
2024-02-14
24.0
0.15
-4.444444
2024-02-15
21.0
0.03
-6.111111
2024-02-16
8.0
0.20
-13.333333
2024-02-17
NaN
0.00
NaN
2024-02-18
NaN
0.00
NaN
26042 rows × 3 columns
In [15]:
# DO NOT MODIFY THIS TEST CELLtmax_df_resp=_points=0ifisinstance(tmax_df_resp,pd.DataFrame):points+=1print('\u2705 Great work! You called a DataFrame.')else:print('\u274C Oops - make sure to call your DataFrame for testing.')summary=[round(val,2)forvalintmax_df_resp.mean().values]ifsummary==[0.05,54.53,12.52]:points+=4print('\u2705 Great work! You correctly converted to Celcius.')else:print('\u274C Oops - your data are not correct.')print('You earned {} of 5 points for converting to Celcius'.format(points))
✅ Great work! You called a DataFrame.
❌ Oops - your data are not correct.
You earned 1 of 5 points for converting to Celcius
Want an EXTRA CHALLENGE?
As you did above, rewrite the code to be more expressive
Using the code below as a framework, write and apply a
function that converts to Celcius. > Functions let you
reuse code you have already written
You should also rewrite this function and parameter names to be
more expressive.
In [16]:
defa_function(a_parameter):"""Convert temperature to Celcius"""returna_parameter# Put your equation in heredataframe['celcius_column']=dataframe['fahrenheit_column'].apply(convert)
---------------------------------------------------------------------------NameError Traceback (most recent call last)
Cell In[16], line 5 2"""Convert temperature to Celcius""" 3return a_parameter # Put your equation in here----> 5 dataframe['celcius_column'] =dataframe['fahrenheit_column'].apply(convert)
NameError: name 'dataframe' is not defined
Often when working with time-series data you may want to focus on a
shorter window of time, or look at weekly, monthly, or annual summaries
to help make the analysis more manageable.
Read more
Read more about
subsetting
and
resampling
time-series data in our Learning Portal.
For this demonstration, we will look at the last 40 years worth of data
and resample to explore a summary from each year that data were
recorded.
Your task
Replace start-year and end-year with 1983 and 2023
Replace dataframe with the name of your data
Replace new_dataframe with something more expressive
Call your new variable
Run the cell
In [17]:
# Subset the dataweather1989to2023=rapid_df.loc['1989':'2023']weather1989to2023
Out[17]:
TOBS
PRCP
TCel
DATE
1989-01-01
7.0
0.00
-13.888889
1989-01-02
25.0
0.00
-3.888889
1989-01-03
19.0
0.00
-7.222222
1989-01-04
47.0
0.00
8.333333
1989-01-05
27.0
0.00
-2.777778
...
...
...
...
2023-12-27
32.0
0.31
0.000000
2023-12-28
17.0
0.00
-8.333333
2023-12-29
28.0
0.00
-2.222222
2023-12-30
NaN
0.00
NaN
2023-12-31
NaN
0.00
NaN
12054 rows × 3 columns
In [ ]:
# DO NOT MODIFY THIS TEST CELLdf_resp=_points=0ifisinstance(df_resp,pd.DataFrame):points+=1print('\u2705 Great work! You called a DataFrame.')else:print('\u274C Oops - make sure to call your DataFrame for testing.')summary=[round(val,2)forvalindf_resp.mean().values]ifsummary==[0.06,55.67,13.15]:points+=5print('\u2705 Great work! You correctly converted to Celcius.')else:print('\u274C Oops - your data are not correct.')print('You earned {} of 5 points for subsetting'.format(points))
✅ Great work! You called a DataFrame.
❌ Oops - your data are not correct.
You earned 1 of 5 points for subsetting
Here you will resample the 1983-2023 data to look the annual mean
values.
Resample your data
Replace new_dataframe with the variable you created in the cell
above where you subset the data
Replace 'TIME' with a 'W', 'M', or 'Y' depending on
whether you’re doing a weekly, monthly, or yearly summary
Replace STAT with a sum, min, max, or mean cal depending on
what kind of statistic you’re interested inculating.
Replace resampled_data with a more expressive variable name
Call your new variable
Run the cell
In [24]:
# Resample the data to look at yearly mean valuesminwet89to23=weather1989to2023.resample('YS').mean()minwet89to23
Out[24]:
TOBS
PRCP
TCel
DATE
1989-01-01
38.072829
0.056359
3.373794
1990-01-01
40.363112
0.039068
4.646174
1991-01-01
39.945869
0.056875
4.414372
1992-01-01
39.525862
0.036714
4.181034
1993-01-01
35.522581
0.055881
1.956989
1994-01-01
39.479769
0.034540
4.155427
1995-01-01
39.150568
0.063609
3.972538
1996-01-01
36.547486
0.058785
2.526381
1997-01-01
38.825073
0.057634
3.791707
1998-01-01
40.563739
0.068343
4.757633
1999-01-01
41.688202
0.073104
5.382335
2000-01-01
39.750751
0.050771
4.305973
2001-01-01
43.371134
0.049639
6.317297
2002-01-01
33.482143
0.036126
0.823413
2003-01-01
40.455253
0.039186
4.697363
2004-01-01
38.877828
0.030242
3.821016
2005-01-01
40.627119
0.044620
4.792844
2006-01-01
40.873278
0.042870
4.929599
2007-01-01
34.806931
0.038515
1.559406
2008-01-01
34.204969
0.025892
1.224983
2009-01-01
35.871324
0.053828
2.150735
2010-01-01
39.012384
0.056767
3.895769
2011-01-01
40.313846
0.060282
4.618803
2012-01-01
42.008746
0.019341
5.560415
2013-01-01
38.392638
0.060685
3.551466
2014-01-01
39.211310
0.057726
4.006283
2015-01-01
41.351275
0.057260
5.195153
2016-01-01
42.161644
0.039508
5.645358
2017-01-01
41.013889
0.034082
5.007716
2018-01-01
36.670732
0.057335
2.594851
2019-01-01
36.159544
0.085056
2.310858
2020-01-01
41.023438
0.044006
5.013021
2021-01-01
40.363248
0.032225
4.646249
2022-01-01
39.331395
0.028421
4.072997
2023-01-01
40.144578
0.046313
4.524766
In [ ]:
# DO NOT MODIFY THIS TEST CELLdf_resp=_points=0ifisinstance(df_resp,pd.DataFrame):points+=1print('\u2705 Great work! You called a DataFrame.')else:print('\u274C Oops - make sure to call your DataFrame for testing.')summary=[round(val,2)forvalindf_resp.mean().values]ifsummary==[0.06,55.37,12.99]:points+=5print('\u2705 Great work! You correctly converted to Celcius.')else:print('\u274C Oops - your data are not correct.')print('You earned {} of 5 points for resampling'.format(points))
✅ Great work! You called a DataFrame.
❌ Oops - your data are not correct.
You earned 1 of 5 points for resampling
Plot your resampled data
In [26]:
# Plot mean annual temperature minwet89to23.plot(y='TCel',title='Observed Mean Temperature Over Time, Rapid City, 1989-2023',xlabel='Date',ylabel='Temperature (C)',legend=False)
Out[26]:
<Axes: title={'center': 'Observed Mean Temperature Over Time, Rapid City, 1989-2023'}, xlabel='Date', ylabel='Temperature (C)'>
Describe your plot
We like to use an approach called “Assertion-Evidence” for presenting
scientific results. There’s a lot of video tutorials and example talks
available on the Assertion-Evidence web
page. The main thing you need to
do now is to practice writing a message or headline rather
than descriptions or topic sentences for the plot you just made (what
they refer to as “visual evidence”).
For example, it would be tempting to write something like “A plot of
maximum annual temperature in Rapid City, Colorado over time
(1983-2023)”. However, this doesn’t give the reader anything to look
at, or explain why we made this particular plot (we know, you made
this one because we told you to)
Some alternatives for different plots of Rapid City temperature that
are more of a starting point for a presentation or conversation are:
Rapid City, SD experienced cooler than average temperatures in
1995
Temperatures in Rapid City, SD appear to be on the rise over the
past 40 years
Maximum annual temperatures in Rapid City, CO are becoming more
variable over the previous 40 years
We could back up some of these claims with further analysis included
later on, but we want to make sure that our audience has some guidance
on what to look for in the plot.
**Temperatures in Rapid City, ND are trending upwards **¶
The yearly mean temperature in Rapid City appear to be rising since 1989. A trend line would be helpful, yet the rise would also match what we expect with the warming climate.
Your turn: pick a new location and/or measurement to plot 🌏 📈¶
Below (or in a new notebook!), recreate the workflow you just did in a
place that interests you OR with a different measurement. See the
instructions above to adapt the URL that we created for Rapid City, CO
using the NCEI API. You will need to make your own new Markdown and Code
cells below this one, or create a new notebook.
Congratulations, you’re almost done with this coding challenge 🤩 – now make sure that your code is reproducible¶
If you didn’t already, go back to the code you modified about and
write more descriptive comments so the next person to use this
code knows what it does.
Make sure to Restart and Run all up at the top of your
notebook. This will clear all your variables and make sure that
your code runs in the correct order. It will also export your work
in Markdown format, which you can put on your website.
Always run your code start to finish before submitting!
Before you commit your work, make sure it runs reproducibly by
clicking:
Restart (this button won’t appear until you’ve run some code),
then
Below is some code that you can run that will save a Markdown file of
your work that is easily shareable and can be uploaded to GitHub Pages.
You can use it as a starting point for writing your portfolio post!
In [ ]:
# This cell is a overview of the entire process#1stImport pandasimportpandasaspd# Check that the data was imported into a pandas DataFrametype(rapid_df)# select variables of interestrapid_df=rapid_df[['TOBS','PRCP']]rapid_df#Convert to celciusrapid_df['TCel']=((rapid_df['TOBS']-32)*(5/9))rapid_df# Subset the data for more focused anaylysis, set time frame and name itweather1983to2023=rapid_df.loc['1983':'2023']weather1983to2023# Resample the data to look at a `sum`, `min`, `max`, or `mean`#a `'W'`, `'M'`, or `'Y'` depending on whether you’re doing a weekly, monthly, or yearly look#makesure to resample above subset#xxx= is the new data set/ variable that contains the changesminwet83to23=weather1983to2023.resample('M').min()minwet83to23
In [27]:
# 1stImport pandasimportpandasaspdimportnumpyasnp# for adding trendline to plotimportmatplotlib.pyplotasplt# for plotting
In [28]:
# get datalkwd_ncei_weather_url=('https://www.ncei.noaa.gov/access/services/data/v1''?dataset=daily-summaries''&dataTypes=TOBS,PRCP''&stations=USC00054762''&startDate=1962-07-28''&endDate=2024-05-05''&includeStationName=true''&includeStationLocation=1''&units=standard')lkwd_ncei_weather_url
# Check that the data was imported into a pandas DataFrametype(lakewood_df)
Out[31]:
pandas.core.frame.DataFrame
In [32]:
# select variables of interestlakewood_df=lakewood_df[['PRCP']]lakewood_df
Out[32]:
PRCP
DATE
1962-07-28
0.00
1962-07-29
0.00
1962-07-30
0.00
1962-07-31
0.00
1962-08-01
0.00
...
...
2024-04-27
0.88
2024-04-28
0.76
2024-04-29
0.00
2024-04-30
0.00
2024-05-05
0.00
22449 rows × 1 columns
In [ ]:
#Convert to celciuslakewood_df['TCel']=((lakewood_df['TOBS']-32)*(5/9))lakewood_df
---------------------------------------------------------------------------KeyError Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key) 3804try:
-> 3805returnself._engine.get_loc(casted_key) 3806exceptKeyErroras err:
File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item()KeyError: 'TOBS'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[34], line 2 1#Convert to celcius----> 2 lakewood_df['TCel'] = ((lakewood_df['TOBS']-32) * (5/9))
3 lakewood_df
File /opt/conda/lib/python3.11/site-packages/pandas/core/frame.py:4090, in DataFrame.__getitem__(self, key) 4088ifself.columns.nlevels >1:
4089returnself._getitem_multilevel(key)
-> 4090 indexer =self.columns.get_loc(key) 4091if is_integer(indexer):
4092 indexer = [indexer]
File /opt/conda/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3807ifisinstance(casted_key, slice) or (
3808isinstance(casted_key, abc.Iterable)
3809andany(isinstance(x, slice) for x in casted_key)
3810 ):
3811raise InvalidIndexError(key)
-> 3812raiseKeyError(key) fromerr 3813exceptTypeError:
3814# If we have a listlike key, _check_indexing_error will raise 3815# InvalidIndexError. Otherwise we fall through and re-raise 3816# the TypeError. 3817self._check_indexing_error(key)
KeyError: 'TOBS'
In [35]:
# Subset the data for more focused anaylysis, set time frame and name itlakewood_TP_1970to2023=lakewood_df.loc['1970':'2023']lakewood_TP_1970to2023#run to check
Out[35]:
PRCP
DATE
1970-01-01
0.00
1970-01-02
0.00
1970-01-03
0.00
1970-01-04
0.00
1970-01-05
0.05
...
...
2023-12-27
0.03
2023-12-28
0.00
2023-12-29
0.00
2023-12-30
0.00
2023-12-31
0.00
19618 rows × 1 columns
In [38]:
# Resample the data to look at a `sum`, `min`, `max`, or `mean`#a `'W'`, `'M'`, or `'Y'` depending on whether you’re doing a weekly, monthly, or yearly look#makesure to resample above subset#xxx= is the new data set/ variable that contains the changeslakewood_prcp70to23=lakewood_TP_1970to2023.resample('YS').sum()lakewood_prcp70to23
Out[38]:
PRCP
DATE
1970-01-01
13.61
1971-01-01
13.84
1972-01-01
15.95
1973-01-01
24.98
1974-01-01
13.29
1975-01-01
18.07
1976-01-01
16.51
1977-01-01
8.97
1978-01-01
12.72
1979-01-01
19.75
1980-01-01
13.59
1981-01-01
11.18
1982-01-01
18.14
1983-01-01
21.99
1984-01-01
19.66
1985-01-01
15.20
1986-01-01
15.96
1987-01-01
24.27
1988-01-01
15.92
1989-01-01
16.79
1990-01-01
17.79
1991-01-01
19.30
1992-01-01
15.87
1993-01-01
14.46
1994-01-01
16.46
1995-01-01
20.08
1996-01-01
14.65
1997-01-01
18.66
1998-01-01
19.91
1999-01-01
21.25
2000-01-01
13.60
2001-01-01
16.06
2002-01-01
10.45
2003-01-01
17.75
2004-01-01
22.84
2005-01-01
16.72
2006-01-01
14.92
2007-01-01
16.33
2008-01-01
11.25
2009-01-01
23.48
2010-01-01
12.48
2011-01-01
20.43
2012-01-01
14.36
2013-01-01
23.59
2014-01-01
20.43
2015-01-01
27.45
2016-01-01
14.00
2017-01-01
16.04
2018-01-01
14.56
2019-01-01
17.91
2020-01-01
11.38
2021-01-01
15.30
2022-01-01
13.44
2023-01-01
22.04
In [39]:
lakewood_prcp70to23.plot(y='PRCP',title='Yearly Precipitation, Lakewood, CO, 1983-2023',xlabel='Date',kind='bar',legend=False,ylabel='Precipitation (in.)')#run to check
# Resetting the indexlakewood_prcp70to23=lakewood_prcp70to23.reset_index()lakewood_prcp70to23#run to check
Out[40]:
DATE
PRCP
0
1970-01-01
13.61
1
1971-01-01
13.84
2
1972-01-01
15.95
3
1973-01-01
24.98
4
1974-01-01
13.29
5
1975-01-01
18.07
6
1976-01-01
16.51
7
1977-01-01
8.97
8
1978-01-01
12.72
9
1979-01-01
19.75
10
1980-01-01
13.59
11
1981-01-01
11.18
12
1982-01-01
18.14
13
1983-01-01
21.99
14
1984-01-01
19.66
15
1985-01-01
15.20
16
1986-01-01
15.96
17
1987-01-01
24.27
18
1988-01-01
15.92
19
1989-01-01
16.79
20
1990-01-01
17.79
21
1991-01-01
19.30
22
1992-01-01
15.87
23
1993-01-01
14.46
24
1994-01-01
16.46
25
1995-01-01
20.08
26
1996-01-01
14.65
27
1997-01-01
18.66
28
1998-01-01
19.91
29
1999-01-01
21.25
30
2000-01-01
13.60
31
2001-01-01
16.06
32
2002-01-01
10.45
33
2003-01-01
17.75
34
2004-01-01
22.84
35
2005-01-01
16.72
36
2006-01-01
14.92
37
2007-01-01
16.33
38
2008-01-01
11.25
39
2009-01-01
23.48
40
2010-01-01
12.48
41
2011-01-01
20.43
42
2012-01-01
14.36
43
2013-01-01
23.59
44
2014-01-01
20.43
45
2015-01-01
27.45
46
2016-01-01
14.00
47
2017-01-01
16.04
48
2018-01-01
14.56
49
2019-01-01
17.91
50
2020-01-01
11.38
51
2021-01-01
15.30
52
2022-01-01
13.44
53
2023-01-01
22.04
In [41]:
# Remove year from DATE column and add as new variablelakewood_prcp70to23['YEAR']=lakewood_prcp70to23['DATE'].dt.yearlakewood_prcp70to23#run to check
Out[41]:
DATE
PRCP
YEAR
0
1970-01-01
13.61
1970
1
1971-01-01
13.84
1971
2
1972-01-01
15.95
1972
3
1973-01-01
24.98
1973
4
1974-01-01
13.29
1974
5
1975-01-01
18.07
1975
6
1976-01-01
16.51
1976
7
1977-01-01
8.97
1977
8
1978-01-01
12.72
1978
9
1979-01-01
19.75
1979
10
1980-01-01
13.59
1980
11
1981-01-01
11.18
1981
12
1982-01-01
18.14
1982
13
1983-01-01
21.99
1983
14
1984-01-01
19.66
1984
15
1985-01-01
15.20
1985
16
1986-01-01
15.96
1986
17
1987-01-01
24.27
1987
18
1988-01-01
15.92
1988
19
1989-01-01
16.79
1989
20
1990-01-01
17.79
1990
21
1991-01-01
19.30
1991
22
1992-01-01
15.87
1992
23
1993-01-01
14.46
1993
24
1994-01-01
16.46
1994
25
1995-01-01
20.08
1995
26
1996-01-01
14.65
1996
27
1997-01-01
18.66
1997
28
1998-01-01
19.91
1998
29
1999-01-01
21.25
1999
30
2000-01-01
13.60
2000
31
2001-01-01
16.06
2001
32
2002-01-01
10.45
2002
33
2003-01-01
17.75
2003
34
2004-01-01
22.84
2004
35
2005-01-01
16.72
2005
36
2006-01-01
14.92
2006
37
2007-01-01
16.33
2007
38
2008-01-01
11.25
2008
39
2009-01-01
23.48
2009
40
2010-01-01
12.48
2010
41
2011-01-01
20.43
2011
42
2012-01-01
14.36
2012
43
2013-01-01
23.59
2013
44
2014-01-01
20.43
2014
45
2015-01-01
27.45
2015
46
2016-01-01
14.00
2016
47
2017-01-01
16.04
2017
48
2018-01-01
14.56
2018
49
2019-01-01
17.91
2019
50
2020-01-01
11.38
2020
51
2021-01-01
15.30
2021
52
2022-01-01
13.44
2022
53
2023-01-01
22.04
2023
In [43]:
# Plot PRCP using .plot()lakewood_prcp70to23.plot(y='PRCP',x='YEAR')#run to check
Out[43]:
<Axes: xlabel='YEAR'>
In [56]:
# From ChatGPT# Define our figure and axis objects fig,ax=plt.subplots(figsize=(6,4))# Compute linear regressionx=lakewood_prcp70to23['YEAR']y=lakewood_prcp70to23['PRCP']# Compute the slope (m) and intercept (b) of the line y = mx + bm,b=np.polyfit(x,y,1)# Plot PRCP vs. YEAR as scatter plotax.bar(x,y,color='skyblue',edgecolor='white')# Plot trend lineax.plot(x,m*x+b,color='blue',label=f'Trend Line (R-squared = {np.corrcoef(x,y)[0,1]**2:.2f})')# Add legendax.legend()# Add title and axis labelax.set(title="Total Annual Precipitaion\nLakewood, CO (1970-2023)",ylabel="Precipitation (in.)")#run to check
#The Global Historical Climatology Network - Daily (GHCN-Daily) dataset is data sourced from 30 different sources od daily data observations. Including 90,000 weather stations, 60,000 mostly collect percipitation data while the others collectct various meteroloogical data including daily maximum and minimum temperature, temperature at the time of observation, snowfall, snow depth, etc. Data regularly synced and maintained.#This is the data being accessed by the ncei_weather_url#Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeill, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Hisorical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ [access date].
In [ ]:
#The sum code of creating the Precipitation bar graph with trendline in Lakewood. Reproducible data from a URL linking to ncei noaa weather data# 1stImport pandasimportpandasaspdimportnumpyasnp# for adding trendline to plotimportmatplotlib.pyplotasplt# for plotting#The kernel refers to the version of Python you are using. You should use the base kernel, which should be the default option.# get datalkwd_ncei_weather_url=('https://www.ncei.noaa.gov/access/services/data/v1'#links to main page'?dataset=daily-summaries'#the rest of these shows where the rest of the data is from'&dataTypes=TOBS,PRCP'# I used the https://www.ncdc.noaa.gov/cdo-web/search to search various stations, see what data is available'&stations=USC00054762'# and used the station number USC****etc'&startDate=1962-07-28'# can choose dates here, but can be more easily managed with later code'&endDate=2024-05-05''&includeStationName=true''&includeStationLocation=1''&units=standard')lkwd_ncei_weather_url#This makes it readable in panda, lakewood_df=pd.read_csv(lkwd_ncei_weather_url,index_col="DATE",#sets the date as index, year and trendline will be extracted laterparse_dates=True,na_values=["NaN"])lakewood_df#run this# Check that the data was imported into a pandas DataFrametype(lakewood_df)#run to check# select variables of interestlakewood_df=lakewood_df[['PRCP']]lakewood_df#run to check#Convert to celcius. If working with temperature data, this may be useful. lakewood_df['TCel']=((lakewood_df['TOBS']-32)*(5/9))lakewood_df#this creates at TCel column with the calculated F to C Temperatures (TOBS stands for temperature observed)# Subset the data for more focused anaylysis, set time frame and name itlakewood_TP_1970to2023=lakewood_df.loc['1970':'2023']lakewood_TP_1970to2023# run to scheck# Subset the data for more focused anaylysis, set time frame and name itlakewood_TP_1970to2023=lakewood_df.loc['1970':'2023']lakewood_TP_1970to2023#run to check# Resample the data to look at a `sum`, `min`, `max`, or `mean`#a `'W'`, `'M'`, or `'Y'` depending on whether you’re doing a weekly, monthly, or yearly look#makesure to resample above subset#xxx= is the new data set/ variable that contains the changeslakewood_prcp70to23=lakewood_TP_1970to2023.resample('YS').sum()lakewood_prcp70to23#run to checklakewood_prcp70to23.plot(y='PRCP',title='Yearly Precipitation, Lakewood, CO, 1983-2023',xlabel='Date',kind='bar',legend=False,ylabel='Precipitation (in.)')#run to check# Resetting the indexlakewood_prcp70to23=lakewood_prcp70to23.reset_index()lakewood_prcp70to23#run to check# Remove year from DATE column and add as new variablelakewood_prcp70to23['YEAR']=lakewood_prcp70to23['DATE'].dt.yearlakewood_prcp70to23#run to check# Plot PRCP using .plot()lakewood_prcp70to23.plot(y='PRCP',x='YEAR')#run to check# From ChatGPT# Define our figure and axis objects fig,ax=plt.subplots(figsize=(6,4))# Compute linear regressionx=lakewood_prcp70to23['YEAR']y=lakewood_prcp70to23['PRCP']# Compute the slope (m) and intercept (b) of the line y = mx + bm,b=np.polyfit(x,y,1)# Plot PRCP vs. YEAR as scatter plotax.bar(x,y,color='skyblue',edgecolor='white')# Plot trend lineax.plot(x,m*x+b,color='blue',label=f'Trend Line (R-squared = {np.corrcoef(x,y)[0,1]**2:.2f})')# Add legendax.legend()# Add title and axis labelax.set(title="Total Annual Precipitaion\nLakewood, CO (1970-2023)",ylabel="Precipitation (in.)")#run to check