Description
12232267 蒋夏婷
In [ ]: import pandas as pd
import matplotlib.pyplot as plt
1. Significant earthquakes since 2150 B.C.
1.1 [5 points] Compute the total number of deaths caused by earthquakes since 2150 B.C. in each country, and then print the top 20 countries along with the total number of deaths.
data1[[‘Country’,’location’]] =data1[‘Location Name’].str.split(‘:’,n=1, expand=True) total_deaths_city =data1.groupby([‘Country’]).sum().Deaths print(‘The top 20 countries along with the total number of deaths is as follow: ‘,tot
The top 20 countries along with the total number of deaths is as follow:
Country
CHINA 2075019.0 TURKEY 1094479.0
IRAN 995403.0
ITALY 498477.0
SYRIA 369224.0 HAITI 323474.0
AZERBAIJAN 317219.0
JAPAN 277142.0
ARMENIA 191890.0 ISRAEL 160120.0
PAKISTAN 145080.0
ECUADOR 135479.0
IRAQ 120200.0 TURKMENISTAN 117412.0
PERU 101511.0
PORTUGAL 83506.0
GREECE 79278.0 CHILE 64269.0
INDIA 61940.0
TAIWAN 57134.0
Name: Deaths, dtype: float64
1.2 [10 points] Compute the total number of earthquakes with magnitude larger than 3.0 (use column Ms as the magnitude) worldwide each year, and then plot the time series. Do you observe any trend? Explain why or why not?
total_earth_mag30 =data1[data1.Mag >3.0].groupby(‘Year’).count()
In [ ]:
plt.plot(total_earth_mag30.index,total_earth_mag30[‘Location Name’]) plt.ylabel(‘total number of earthquakes’) plt.xlabel(‘year’)
Text(0.5, 0, ‘year’) Out[ ]:
def CountEq_LargestEq(country,df):
data_country =df.loc[df.Country ==country.upper(),[‘Year’,’Mo’,’Dy’,’Hr’,’locatio if data_country.Mag.isnull().all(): return len(data_country),”,” else:
max_ind =data_country.loc[data_country.Mag.idxmax()].fillna(”)
total_number_eq = len(data_country) largest_date =str(max_ind[‘Year’])+’-‘+str(max_ind[‘Mo’])+’-‘
+str(max_ind[‘Dy’])+’-‘+str(max_ind[‘Hr’]) largest_loca =max_ind[‘location’].lstrip()
return total_number_eq,largest_date,largest_loca
In [ ]:
In [ ]: CountEq_LargestEq(‘china’,data1) print(data1.Mag.loc[data1.Country ==’SYRIAN COASTS’])
7 NaN
520 NaN
Name: Mag, dtype: float64
all_country =[] countries =data1.Country.drop_duplicates(keep=’first’).drop(0,axis=0) for i in countries: if i == i:
In [ ]:
0 JORDAN
1 BRITISH COLUMBIA
2 KENYA3 ISRAEL; JORDAN
4 CALIFORNIA, MEXICO …
339 COSTA RICA 340 W. LUZON ISLAND, PHILIPPINES
341 E. LUZON ISLAND, PHILIPPINES
342 BANGLADESH
343 NORTH CAROLINA
Name: country, Length: 344, dtype: object
2. Air temperature in Shenzhen during the past 25 years
In this problem set, we will examine how air temperature changes in Shenzhen during the past 25 years using the hourly weather data measured at the BaoAn International Airport. The data set is from NOAA Integrated Surface Dataset. Download the file Baoan_Weather_1998_2022.csv, move the .csv file to your working directory.
Read page 10-11 (POS 88-92 and POS 93-93) of the comprehensive user guide for the detailed format of the air temperature data (use column TMP). Explain how you filter the data in your report.
[10 points] Plot monthly averaged air temperature against the observation time. Is there a trend in monthly averaged air temperature in the past 25 years?
In [ ]:
c:ProgramDataAnaconda3libsite-packagesIPythoncoreinteractiveshell.py:3444: Dtyp eWarning: Columns (4,8,9,10,11,14,15,24,25,27,29,31,34,37,38,40,41,45,49,50) have mixe d types.Specify dtype option on import or set low_memory=False. exec(code_obj, self.user_global_ns, self.user_ns)
3. Global collection of hurricanes
The International Best Track Archive for Climate Stewardship (IBTrACS) project is the most complete global collection of tropical cyclones available. It merges recent and historical tropical cyclone data from multiple agencies to create a unified, publicly available, best-track dataset that improves inter-agency comparisons. IBTrACS was developed collaboratively with all the World Meteorological Organization (WMO) Regional Specialized Meteorological Centres, as well as other organizations and individuals from around the world.
In this problem set, we will use all storms available in the IBTrACS record since 1842. Download the file ibtracs.ALL.list.v04r00.csv, move the .csv file to your working directory. Read Column Variable Descriptions for variables in the file. Examine the first few lines of the file.
Below we provide an example to load the file as a pandas dataframe. Think about the options being used and why, and modify when necessary.
na_values=[‘NOT_NAMED’, ‘NAME’]) # replace two terms into NAN data3
In [ ]:
c:ProgramDataAnaconda3libsite-packagesIPythoncoreinteractiveshell.py:3444: Dtyp eWarning: Columns (5) have mixed types.Specify dtype option on import or set low_memor y=False. exec(code_obj, self.user_global_ns, self.user_ns)
Out[ ]: SID SEASON NUMBER BASIN SUBBASIN NAME ISO_TIME NATURE LA
1842-10-
0 1842298N11080 1842 1 NI BB NaN 25 NR 10.870
06:00:00
1842-10-
1 1842298N11080 1842 1 NI BB NaN 25 NR 10.843
09:00:00
1842-10-
2 1842298N11080 1842 1 NI BB NaN 25 NR 10.818
12:00:00
1842-10-
3 1842298N11080 1842 1 NI BB NaN 25 NR 10.800
15:00:00
1842-10-
4 1842298N11080 1842 1 NI AS NaN 25 NR 10.788
18:00:00
… … … … … … … … …
21:00:00
00:00:00
12:00:00
15:00:00
18:00:00
707176 rows × 17 columns
3.1 [5 points] Group the data on Storm Identifie (SID), report names (NAME) of the 10 largest hurricanes according to wind speed (WMO_WIND).
In [ ]: data3_SID =data3.groupby(‘SID’, as_index=False).max().sort_values(‘WMO_WIND’, ascend print(’10 largest hurricanes according to wind speed after group by storm identifie ar
10 largest hurricanes according to wind speed after group by storm identifie are : 11015 RHONDA
11909 TALIM 11865 PERCY
11867 INGRID
11872 ADELINE:JULIET
11877 NESAT 11887 EMILY
11888 HAITANG
11905 MAWAR
11908 KATRINA
Name: NAME, dtype: object
C:UsersduckAppDataLocalTemp/ipykernel_16592/1194367031.py:1: FutureWarning: Dropp ing invalid columns in DataFrameGroupBy.max is deprecated. In a future version, a Type Error will be raised. Before calling .max, select only columns which should be valid f or the function.
data3_SID =data3.groupby(‘SID’, as_index=False).max().sort_values(‘WMO_WIND’, ascend ing=False)
3.2 [5 points] Make a bar chart of the wind speed (WMO_WIND) of the 20 strongest-wind hurricanes.
assume data processed on Q1 can be used here
In [ ]: data3_lg20 =pd.to_numeric(data3_SID.WMO_WIND.head(20)) data3_lg20.index =data3_SID.NAME.head(20) data3_lg20.plot.bar(ylabel =’wimd speed’)
<AxesSubplot:xlabel=’NAME’, ylabel=’wimd speed’> Out[ ]:
3.3 [5 points] Plot the count of all datapoints by Basin as a bar chart.
In [ ]: basin =data3.groupby(‘BASIN’).count().SID basin.plot.bar(ylabel =’count of all datapoints’)
<AxesSubplot:xlabel=’BASIN’, ylabel=’count of all datapoints’> Out[ ]:
3.4 [5 points] Make a hexbin plot of the location of datapoints in Latitude and Longitude.
hb =plt.hexbin(data3.LAT, data3.LON, gridsize= 180,cmap =’YlGnBu’) plt.colorbar(hb)
plt.title(‘location of datapoints’) plt.show()
In [ ]:
In [ ]:
3.6 [5 points] Create a filtered dataframe that contains only data since 1970 from the Western North Pacific (“WP”) and Eastern North Pacific (“EP”) Basin. Use this for the rest of the problem set.
data3_6 =data3.loc[((data3.BASIN ==’WP’)|(data3.BASIN ==’EP’))&(data3.ISO_TIME >=’19 data3_6
In [ ]:
Out[ ]: SID SEASON NUMBER BASIN SUBBASIN NAME ISO_TIME NATURE L
1970-02-
350393 1970050N07151 1970 22 WP MM NANCY 19 TS 7.00
00:00:00
1970-02-
350394 1970050N07151 1970 22 WP MM NANCY 19 TS 7.24
03:00:00
1970-02-
350395 1970050N07151 1970 22 WP MM NANCY 19 TS 7.50
06:00:00
1970-02-
350396 1970050N07151 1970 22 WP MM NANCY 19 TS 7.75
09:00:00
1970-02-
350397 1970050N07151 1970 22 WP MM NANCY 19 TS 8.00
12:00:00
… … … … … … … … …
15:00:00
18:00:00
12:00:00
15:00:00
SID SEASON NUMBER BASIN SUBBASIN NAME ISO_TIME NATURE L
18:00:00
176352 rows × 17 columns
3.7 [5 points] Plot the number of datapoints per day.
In [ ]: data3_6[‘day’] =data3_6.ISO_TIME.dt.strftime(‘%Y-%m-%d’) datapiont_day =data3_6.groupby(‘day’).count() datapiont_day.SID.plot()
C:UsersduckAppDataLocalTemp/ipykernel_16592/2140946348.py:1: SettingWithCopyWarni ng:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/use r_guide/indexing.html#returning-a-view-versus-a-copy data3_6[‘day’] =data3_6.ISO_TIME.dt.strftime(‘%Y-%m-%d’)
<AxesSubplot:xlabel=’day’> Out[ ]:
arr = np.array([0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]) if (int(y) % 4 == 0 and int(y) % 100 != 0) or int(y) % 400 == 0:
arr[2] = 29 else:
arr[2] = 28
for i in range(1 , int(m)) :
sum = sum + arr[i] sum = sum + int(d) return sum
In [ ]:
In [ ]:
C:UsersduckAppDataLocalTemp/ipykernel_16592/3118200995.py:1: SettingWithCopyWarni ng:
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/use r_guide/indexing.html#returning-a-view-versus-a-copy data3_6[‘day_of_year’] =data3_6.apply(lambda x: day_of_year(x.day),axis=1)
3.9 [5 points] Calculate the anomaly of daily counts from the climatology.
In [ ]: day_anomaly =data3_6.groupby(‘day_of_year’).count() day_anomaly[‘anomaly’] =day_anomaly[‘SID’]-day_anomaly.SID.mean() day_anomaly
Out[ ]: SID SEASON NUMBER BASIN SUBBASIN NAME ISO_TIME NATURE LAT LON W
day_of_year
1 83 83 83 83 83 72 83 83 83 83
2 72 72 72 72 72 64 72 72 72 72
3 74 74 74 74 74 58 74 74 74 74
4 93 93 93 93 93 57 93 93 93 93
5 105 105 105 105 105 65 105 105 105 105 … … … … … … … … … … …
362 158 158 158 158 158 118 158 158 158 158
363 132 132 132 132 132 93 132 132 132 132
364 104 104 104 104 104 81 104 104 104 104
365 93 93 93 93 93 77 93 93 93 93
366 13 13 13 13 13 8 13 13 13 13
366 rows × 19 columns
3.10 [5 points] Resample the anomaly timeseries at annual resolution and plot. So which years stand out as having anomalous hurricane activity?
In [ ]: data3_resample =data3_6.resample(‘Y’,on =’ISO_TIME’).count()[[‘SID’]] data3_resample[‘anomaly’] =data3_resample[‘SID’]-data3_resample.SID.mean() data3_resample[[‘anomaly’]].plot()
<AxesSubplot:xlabel=’ISO_TIME’> Out[ ]:
In [ ]: print(‘The year that stand out as having anomalous hurricane activity is:’,data3_resam
The year that stand out as having anomalous hurricane activity is: 1992-12-31 00:00:00
4
download data from https://www.ncei.noaa.gov/access/search/data-search/global-summary-ofthe-month?pageNum=2, WATTON 2 WSW, MI US(USC00208706.csv)
data4
STATION
In [ ]: Out[ ]:
WATTON
200507-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 49.1
MI US
WATTON
200508-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 38.6
MI US
WATTON
200509-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 18.9
MI US
WATTON
200510-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 6.4
MI US
WATTON
200511-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 0.0
MI US
… … … … … … … … .
WATTON
202205-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, 11.7 NaN 11.7
MI US
WATTON
202206-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, 46.7 NaN 35.0
MI US
WATTON
202207-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN NaN
MI US
WATTON
202208-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 27.0
MI US
WATTON
202209-01
USC00208706 46.52664 -88.64243 424.9 2 WSW, NaN NaN 10.2
MI US
In [ ]:
4.3 [5 points] Conduct at least 5 simple statistical checks with the variable, and report your findings.
In [ ]: # check the basic destribution of the number of cooling days print(data4[‘CLDD’].describe())
count 185.000000 mean 9.860000 std 16.944379 min 0.000000 25% 0.000000
50% 0.000000
75% 13.000000 max 81.200000
Name: CLDD, dtype: float64
# normalization data data4_std =data4[[‘TAVG’,’PRCP’,’SNOW’]].apply(lambda x: (x – np.min(x)) / (np.max(x
# calculate mouthly statistic value month_avg =data4_std.resample(‘Y’).mean() month_max =data4_std.resample(‘Y’).max() month_min =data4_std.resample(‘Y’).min()
In [ ]:
In [ ]: # plot three main variables
month_avg[[‘TAVG’,’PRCP’,’SNOW’]].plot(title=’month average’, kind=’bar’)
# report the average, maximun and minimun of monthly data
print(‘The maximun annual average temperature year is:’,month_avg[[‘TAVG’]].idxmax()[0 print(‘The maximun annual average precipitation year is:’,month_avg[[‘PRCP’]].idxmax() print(‘The maximun annual average snow depth year is:’,month_avg[[‘SNOW’]].idxmax()[0]
The maximun annual average temperature year is: 2005
The minimun annual average temperature year is: 2014
The maximun annual average snow depth year is: 2014
The minimun annual average snow depth year is: 2015
month_max[[‘TAVG’,’PRCP’,’SNOW’]].plot(title=’month maximun’)
print(‘The maximun annual maximan year is:’,month_max[[‘TAVG’]].idxmax()[0].year,’ ‘
In [ ]:
,
The minimun annual maximun year is: 2009




Reviews
There are no reviews yet.