Preliminaries: package installation, mounting to Gdrive
!pip install pandas fiona shapely pyproj rtree folium matplotlib mapclassify
!pip install geopandas
!pip install openturns
from google.colab import drive
drive.mount('/content/drive')
import geopandas as gpd
import pandas as pd
In analyzing extreme rainfalls, we can analyze the historical maximum rainfall that occured.
# other preliminaries to mapping
import matplotlib.pyplot as plt
# provincial shape file for mapping
province_loc = "drive/MyDrive/Colab Notebooks/Data/provinces/provinces.geojson"
province = gpd.read_file(province_loc)
# reading historical rainfall maxima per station
stations = pd.read_csv("drive/MyDrive/Colab Notebooks/Data/NOAA PH Weather Stations.csv")
stations.head()
# plotting historical maximum 1-day rainfall (Rx1day) per station
fig, ax1 = plt.subplots(figsize=(10, 10))
# basemap
province.plot(ax=ax1, alpha = 0.1)
# layer
plot_stations = gpd.GeoDataFrame(stations, geometry=gpd.points_from_xy(stations.LON, stations.LAT), crs = 4326)
plot_stations.plot(ax = ax1,column = 'Rx1day (mm)', legend = True)
plt.title("Historical Single Day Rainfall Amount (mm) Recorded by PH Weather Stations")
Data Sources:
An initial analysis for this is that north and eastern part of the country experienced the highest rainfall magnitude compared to the other parts of the country.
However, this does not answer the question "how frequent are these extreme events?"
Obtaining probabilities is very important for risk assessment in the Philippines, especially that we are ranked 1 in the World Risk Index (World Risk Report 2022) and is more insightful if we analyze the probabilities of the extreme events.
In probabilistic risk assessment in hydrology and flood modelling, the concept of "return periods" or "recurrence intervals" is commonly used.
For example, if a 400 mm of rainfall is a 50-year return period event, then it has a 1/50 = 2% chance of being exceeded within a year. Another intuitive interpretation is that a 400 mm of rainfall is expected to happen once every 50 years.
Another example: how can we assure that a dam to be constructed can last at least 100 years if we do not have 100 years worth of data? Using current data, we will extrapolate and estimate the single day rainfall magnitude that will exist once in every 100 years.
With this approach, even if we don't have a historically long data (i.e. observed data does not cover 50-100 years), we can determine the amount of rainfall that is exceeded once every 50 years, once every 100 years, and so on, to assist planning big ticket infrastructures such as dams and bridges.
This is also helpful in estimating more frequent extreme events, such as 5-year, 10-year, and 15-year return levels since these are the events that people encounter much often, which has great utility in disaster risk reduction and mitigation.
rainfall_max = pd.read_csv("drive/MyDrive/Colab Notebooks/Data/max.spread.csv")
rainfall_max.head()
We will just fit multiple univariate distributions by estimating the location $\mu$, scale $\sigma$, and shape $\xi$ parameters in each station using Maximum Likelihood Estimation. We will use the openturns
package to fit a distribution.
import openturns as ot
import numpy as np
After estimating the distribution, return levels can be computed. As a preliminary, note that the $p^{th}$ quantile of a random variable that follows GEV distribution is
\begin{equation} \hat{x_p}=\left\{ \begin{array}{@{}ll@{}} {\hat\mu+\frac{\hat\sigma(1-(-log(p))^\hat\xi)}{\hat\xi}}, & \text{if}\ \hat\xi \neq 0\ \\ {\hat\mu-\hat\sigma(log(-log(p))}, & \text{if}\ \hat\xi = 0 \end{array}\right. \end{equation}Then, the T-year return level $x_T$ is calculated accordingly by setting $$ p = 1-\alpha=1-\frac{1}{T} $$
The following function qgevd
computes the $p^{th}$ quantile from an estimated distribution.
# return level function
def qgevd(p, location, scale, shape):
if shape !=0:
quantile = location + (scale*(1-(-np.log(p))**shape))/shape
elif shape ==0:
quantile = location - scale*(np.log(-np.log(p)))
return quantile
The following code outputs a dataframe of parameters and T-year rainfall return level per station.
# initialize the parameter dataframe
pars = pd.DataFrame(columns=['station_name','location','scale','shape', 'rl_5', 'rl_10', 'rl_25','rl_50', 'rl_100'])
for i in range(1,len(rainfall_max.columns)):
# getting specific column containing data from specific station i
station_data = rainfall_max.iloc[:, i]
# removing NaN
station_data = station_data[np.logical_not(np.isnan(station_data))]
# converting to ot.point object
sample = ot.Sample([[p] for p in station_data])
# fitting a GEVD to the data
gev = ot.GeneralizedExtremeValueFactory().buildAsGeneralizedExtremeValue(sample)
# extracting the parameters
loc0 = list(gev.getParameter())[0]
scale0 = list(gev.getParameter())[1]
shape0 = -list(gev.getParameter())[2] # note the negation. In other literature in the formulation of the pdf, this is used.
# n year return level calculation
rl_5 = qgevd(p = 1-1/5, location = loc0, scale = scale0, shape = shape0)
rl_10 = qgevd(p = 1-1/10, location = loc0, scale = scale0, shape = shape0)
rl_25 = qgevd(p = 1-1/25, location = loc0, scale = scale0, shape = shape0)
rl_50 = qgevd(p = 1-1/50, location = loc0, scale = scale0, shape = shape0)
rl_100 = qgevd(p = 1-1/100, location = loc0, scale = scale0, shape = shape0)
pars_temp = pd.DataFrame([[rainfall_max.columns[i], loc0, scale0, shape0, rl_5, rl_10, rl_25, rl_50, rl_100]],
columns=['station_name','location','scale','shape', 'rl_5', 'rl_10', 'rl_25','rl_50', 'rl_100'],
index = [i-1])
# appending the parameters to the pars dataframe
pars = pars.append(pars_temp)
# we concatenate the parameter dataframe to the initial data frame containint PAGASA station coordinates
stations_pars = pd.concat([stations, pars], axis=1)
plot_stations_pars = gpd.GeoDataFrame(stations_pars, geometry=gpd.points_from_xy(stations_pars.LON, stations_pars.LAT), crs = 4326)
plot_stations_pars
The visualization below shows the amount of rainfall (mm) that comes once in every 5 years and once in every 100 years per weather station in the Philippines.
fig, (ax1, ax2) = plt.subplots(1,2)
# basemap
province.plot(ax=ax1, alpha = 0.1)
# point layer
plot_stations_pars.plot(column = 'rl_5',legend = True,vmin = 100, vmax = 600, ax = ax1)
# Title
ax1.set_title("5-year Rainfall Return Levels (mm)")
# basemap
province.plot(ax=ax2, alpha = 0.1)
# point layer
plot_stations_pars.plot(column = 'rl_100',legend = True,vmin = 100, vmax = 600, ax = ax2)
# Title
ax2.set_title("100-year Rainfall Return Levels (mm)")
# setting chart size
plt.gcf().set_size_inches(20, 10)
The left chart shows the amount of rainfall that comes every 5 years. The chart on the right is the amount of rainfall that comes every 100 years. Regions in the northern and eastern part receive more extreme rainfall levels.
Major caveat for this is that has not been cleaned yet and the date range and data availability are not the same for all stations, which makes some of the estimates inaccurate.
Now that we have estimated the parameters and the return levels per weather station, it is more useful if the estimates are at a provincial level. For this, we will perform an interpolation procedure of the estimated parameters.
province.explore()
In the plot below, the red points are the provincial centroids while the green points are the locations of the weather stations.
province_centroid = province.centroid
fig, ax1 = plt.subplots(figsize=(10, 8))
province.plot(ax = ax1, alpha = 0.2, linewidth = 1)
province_centroid.plot(ax = ax1, color = 'red', label = 'Province Centroid')
plot_stations.plot(ax = ax1,color = 'green', label = 'PAGASA Weather Station')
leg = ax1.legend()
To recall, we have two geodataframes:
province
, which contains the boundaries per each province; andplot_stations_pars
, which contains the location of each weather station and the GEVD parameters to be used in computation of T-year return levels.For now, we will assume that the values estimated for the provincial centroid represents the whole province.
First, we define the interpolation method: Ordinary Kriging
where:
We will use the pykrige
package to accomplish the spatial interpolation.
!pip install pykrige
import pykrige as pyk
Now, create the ordinary kriging object by plugging in the required data as arrays: the x and y coordinates, and the value to be interpolated.
We perform this three times, one per parameter (location $\mu$, scale $\sigma$, and shape $\xi$) to be interpolated.
# Create the ordinary kriging objects
# location \mu
OK_loc = pyk.OrdinaryKriging(
np.array(plot_stations_pars['LON']),
np.array(plot_stations_pars['LAT']),
np.array(plot_stations_pars['location']),
variogram_model="linear",
verbose=False,
enable_plotting=False,
)
# scale \sigma
OK_scale = pyk.OrdinaryKriging(
np.array(plot_stations_pars['LON']),
np.array(plot_stations_pars['LAT']),
np.array(plot_stations_pars['scale']),
variogram_model="linear",
verbose=False,
enable_plotting=False,
)
# shape \xi
OK_shape = pyk.OrdinaryKriging(
np.array(plot_stations_pars['LON']),
np.array(plot_stations_pars['LAT']),
np.array(plot_stations_pars['shape']),
variogram_model="linear",
verbose=False,
enable_plotting=False,
)
We now use the ordinary kriging object for the interpolation on arbitrary points (provincial centroids).
# z is the estimate, ss is the variance
z_loc, ss_loc = OK_loc.execute(style = "points",
xpoints = np.array(province.centroid.x),
ypoints = np.array(province.centroid.y))
z_scale, ss_scale = OK_scale.execute(style = "points",
xpoints = np.array(province.centroid.x),
ypoints = np.array(province.centroid.y))
z_shape, ss_shape = OK_shape.execute(style = "points",
xpoints = np.array(province.centroid.x),
ypoints = np.array(province.centroid.y))
province["location"] = z_loc
province["scale"] = z_scale
province["shape"] = z_shape
We now then apply the qgevd function per province to estimate the corresponding 5, 10, 25, 50, and 100-year return levels.
Recall that the rows in the province
dataframe represent a province.
# 5-year return level
province['rl_5'] = province.apply(lambda z: qgevd(1-1/5, z['location'], z['scale'], z['shape']),
axis = 1,result_type='expand')
# 10-year return level
province['rl_10'] = province.apply(lambda z: qgevd(1-1/10, z['location'], z['scale'], z['shape']),
axis = 1,result_type='expand')
# 25-year return level
province['rl_25'] = province.apply(lambda z: qgevd(1-1/25, z['location'], z['scale'], z['shape']),
axis = 1,result_type='expand')
# 50-year return level
province['rl_50'] = province.apply(lambda z: qgevd(1-1/50, z['location'], z['scale'], z['shape']),
axis = 1,result_type='expand')
# 100-year return level
province['rl_100'] = province.apply(lambda z: qgevd(1-1/100, z['location'], z['scale'], z['shape']),
axis = 1,result_type='expand')
province.head()
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1,4, figsize = (28,7))
# 5-year return level
province.plot(column = 'rl_5', legend = True, vmin = 120, vmax = 500,ax = ax1)
ax1.set_title("5-year Rainfall Return Levels (mm)")
#10-year return level
province.plot(column = 'rl_10', legend = True, vmin = 120, vmax = 500,ax = ax2)
ax2.set_title("10-year Rainfall Return Levels (mm)")
#25-year return level
province.plot(column = 'rl_25', legend = True, vmin = 120, vmax = 500,ax = ax3)
ax3.set_title("25-year Rainfall Return Levels (mm)")
# 50-year return level
province.plot(column = 'rl_50', legend = True, vmin = 120, vmax = 500,ax = ax4)
ax4.set_title("50-year Rainfall Return Levels (mm)")
province.plot(figsize = (8,8),column = 'rl_100', legend = True, vmin = 120, vmax = 500)
plt.title("100-year Rainfall Return Levels (mm)")
Luzon and eastern portion of Visayas experience the most extreme rainfalls in the country, and more frequently.
For future studies, we can downscale this down to city level while considering numerous landscape features such as mountains, water bodies, infrastructure, land-cover characteristics, and components of the climate system.
This is particulary useful for the Philippines having a complex topography. For example, rainfall intensities recorded in Port Area, Manila is different from the rainfall intensities recorded in Science Garden, Quezon City, while being in the same region (NCR).
To summarize, these are the steps that we did: