Solar energy¶
The majority of RES assets comes from solar energy plants in Germany. Further data updates will show total numbers but until now, more than 1.4 million of approximately 1.9 millions solar plants are recorded (2020). Solar energy includes classic solar installations on private roofs as well as solar parks and photovoltaic (PV) systems on commercial estates. The following section defines data structure of requested solar data.
Data structure¶
Requested solar data from MaStR can be divided into categories with the most important features shown below:
general data: plant name, MaStR ID, location (federal state, county, municipality), date of notification, date of initial operation (ddei), date of decommissioning (ddes), status, verification by grid operator, etc.
technical data: gross electrical output (installed power), modules, connected inverter, type of feed-in, module orientation etc.
grid connection: voltage level, location
EEG (German Renewable Energies Act): EEG ID
EEDATEN uses MariaDB as database management system (DBMS) to save and manage requested data. It should be noted that received data may contain errors and missing values what makes data preprocessing and optionally further analyses necessary.
Historical development¶
This section describes the process of creating choropleth maps to plot development of solar energy in Germany historically.
Filter preprocessed data¶
With SQLAlchemy preprocessed data can be requested and loaded from connected database to Python.
import sqlalchemy
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
import pandas as pd
import geopandas as gpd
SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://db_user:db_pw@localhost/database'
engine = sqlalchemy.create_engine(SQLALCHEMY_DATABASE_URI, echo=True)
Base = automap_base()
Base.prepare(engine, reflect=True)
SolareStrahlungsenergie = Base.classes.SolareStrahlungsenergie
session = Session(engine)
#filter solar, year not null, location not null, status not null, mastr id like 'SEE%'
df_solar = pd.DataFrame(data=session.query(SolareStrahlungsenergie.mastrnummer, SolareStrahlungsenergie.betriebsstatus, SolareStrahlungsenergie.bruttoleistung, SolareStrahlungsenergie.bundesland, SolareStrahlungsenergie.landkreis, SolareStrahlungsenergie.gemeinde, SolareStrahlungsenergie.gemeindeschluessel, SolareStrahlungsenergie.ddei, SolareStrahlungsenergie.ddes).filter(
SolareStrahlungsenergie.betriebsstatus is not None, SolareStrahlungsenergie.mastrnummer.like('SEE%'), SolareStrahlungsenergie.bundesland is not None, SolareStrahlungsenergie.landkreis is not None, SolareStrahlungsenergie.gemeinde is not None).all())
As result we get a pandas dataframe of preprocessed and filtered data for solar energy. The following command prints the first rows of the dataframe
print(df_solar.head().to_string())
The head of the dataframe with the columns MaStR ID, status, installed power (kW), federal state, county, municipality, key, date of initial operation and date of decommissioning looks like:
mastrnummer betriebsstatus bruttoleistung bundesland landkreis gemeinde gemeindeschluessel ddei ddes
0 SEE984033548619 In Betrieb 3.96 Nordrhein-Westfalen Münster Münster 05515000 2007-07-20 NaT
1 SEE901901460125 In Betrieb 7.41 Baden-Württemberg Ostalbkreis Schwäbisch Gmünd 08136065 2013-01-31 NaT
2 SEE983679054270 In Betrieb 5.04 Brandenburg Havelland Nauen 12063208 2016-02-19 NaT
3 SEE978732598938 In Betrieb 6.36 Bayern Regensburg Pentling 09375180 2016-12-16 NaT
4 SEE970592691989 In Betrieb 7.20 Saarland Saarlouis Saarlouis 10044115 2011-08-12 NaT
Mapping¶
After we loaded RES master data to Python, mapping process follows next. Therefore RES data are connected with spatial data. Firstly, we load county map data in shapefile or geojson format to Python.
#geoinformation
gdf_county = gpd.read_file("/path/to/.shp")
# new columns for aggregated electrical capacity (important for choropöeth later)
gdf_county['bruttoleistung_je_landkreis_2020'] = 0
gdf_county['bruttoleistung_je_landkreis_2015'] = 0
gdf_county['bruttoleistung_je_landkreis_2010'] = 0
gdf_county['bruttoleistung_je_landkreis_2005'] = 0
#loop geodataframe to get information of each feature
for index, row in gdf_county.iterrows():
#location filter for adminstration level county (distinguish between Landkreis, kreisfreie Stadt, etc.)
if (row.BEZ == "Kreisfreie Stadt") or (row.BEZ == "Stadtkreis"):
filter = (df_solar.landkreis == row.GEN) & (df_solar.gemeinde == row.GEN)
#lookup_kfstadt list of kreisfreie Städte
elif ((row.BEZ == "Landkreis") or (row.BEZ == "Kreis")) & lookup_kfstadt(row.GEN) == True:
filter = (df_solar.landkreis == row.GEN) & (df_solar.gemeinde != row.GEN)
else:
filter = (df_solar.landkreis == row.GEN)
# incoming/ outgoing until 2005
solar_2005_zu = sum(df_solar.bruttoleistung[filter & (df_solar.ddei <= '2005-12-31')])
solar_2005_ab = sum(df_solar.bruttoleistung[filter & (df_solar.ddes <= '2005-12-31')])
# incoming/ outgoing until 2010
solar_2010_zu = sum(df_solar.bruttoleistung[filter & (df_solar.ddei <= '2010-12-31')])
solar_2010_ab = sum(df_solar.bruttoleistung[filter & (df_solar.ddes <= '2010-12-31')])
# incoming/ outgoing until 2015
solar_2015_zu = sum(df_solar.bruttoleistung[filter & (df_solar.ddei <= '2015-12-31')])
solar_2015_ab = sum(df_solar.bruttoleistung[filter & (df_solar.ddes <= '2015-12-31')])
# status "active"
solar_2020_zu = sum(df_solar.bruttoleistung[filter & (df_solar.betriebsstatus =='In Betrieb')])
#calculate and assign difference
gdf_county['bruttoleistung_je_landkreis_2005'].iloc[index] = solar_2005_zu - solar_2005_ab
gdf_county['bruttoleistung_je_landkreis_2010'].iloc[index] = solar_2010_zu - solar_2010_ab
gdf_county['bruttoleistung_je_landkreis_2015'].iloc[index] = solar_2015_zu - solar_2015_ab
gdf_county['bruttoleistung_je_landkreis_2020'].iloc[index] = solar_2020_zu
#geopandas dataframe to geojson
gdf_county.to_file('/path/to/.geojson', driver='GeoJSON')
Mapping is done by looping through geodaframe of county spatial data and assign to each feature (county) electrical capacity based on the difference between incoming and outgoing plants for specific years. For each county in gdf_county incoming and outgoing electrical capacity is calculated and afterwards the difference is assigned to new created column for defined years. Finally, processed geodataframe is saved as geojson file in order to take the file as input for plotting choropleth maps in the next section.
Plot choropleth¶
In choropleth maps, features are colored according their individual specification. In this example each county as feature is colored according to its electrical capacity. Therefore we load the above created geodataframe to Python.
#package to plot choropleth map
import matplotlib.pyplot as plt
#load geodataframe with mapped electrical capacity for each feature for each year
gdf_mapped = gpd.read_file('/path/to/.geojson')
jahr = ['2005', '2010', '2015', '2020']
fig, axs = plt.subplots(2, 2, figsize=(12, 9),
facecolor='w',
constrained_layout=True,
subplot_kw=dict(aspect='equal'))
axs = axs.ravel()
#generates plot for each year
for index in range(0,4):
axs[index].set_title("Installierte Leistung PV in " + jahr[index])
#analyzed column in geodataframe
gdf_mapped.plot(column=gdf_mapped.columns[26-index], ax=axs[index], cmap='YlOrBr', vmin=0, vmax=350000)
#highlight boundary
gdf_mapped.boundary.plot(ax=axs[index], edgecolor='gainsboro', linewidth=0.2)
axs[index].get_xaxis().set_visible(False)
axs[index].get_yaxis().set_visible(False)
# legend, labels
patch_col = axs[0].collections[0]
cb = fig.colorbar(patch_col, ax=axs, shrink=0.5, ticks=[0, 50000, 100000, 150000, 200000, 250000, 300000, 350000])
cb.set_label('Leistung in MW')
cb.set_ticklabels([' 0', '50', '100', '150', '200', '250', '300', '350'])
plt.savefig('/path/to/.jpg', dpi=400)
The next step consists of creating 2x2 subplots and loop through it. The length of the loop is determined by the amount of years that will be analyzed. For each year we plot the geodataframe and specify the column for colormap (cmap) parameter. Next, we highlight the boundaries of the dataframe and finally set legend parameters. The result looks like:
Interactive choropleth map¶
In the presented example above we used matplotlib.pyplot to plot choropleth maps. Another option to create interactive choropleth maps offers the package plotly.express. The mentioned package contains choropleth function to plot interactive maps and for this purpose firstly we have to create another dataframe in addition to existing geoinformation (geojson).
#read geoinformation
gdf_county = gpd.read_file("/path/to/.shp")
#df_solar: requested RES data (see section "Filter preprocessed data")
#empty list
df = []
#years under investigation
jahre = ['2020-12-31']
#loop features (counties) and assign electrical power for each year
for index, row in gdf_county.iterrows():
#location filter for adminstration level county (distinguish between Landkreis, kreisfreie Stadt, etc.)
if (row.BEZ == "Kreisfreie Stadt") or (row.BEZ == "Stadtkreis"):
filter = (df_solar.landkreis == row.GEN) & (df_solar.gemeinde == row.GEN)
elif ((row.BEZ == "Landkreis") or (row.BEZ == "Kreis")) & lookup_kfstadt(row.GEN) == True:
filter = (df_solar.landkreis == row.GEN) & (df_solar.gemeinde != row.GEN)
else:
filter = (df_solar.landkreis == row.GEN)
#incoming plants
zu = np.zeros(len(jahre))
#outgoing plants
ab = np.zeros(len(jahre))
#loop years
for jahr in range(0, len(jahre)):
#electrical capacity incoming plants for specific year
zu[jahr] = sum(df_solar.bruttoleistung[filter & (df_solar.ddei <= jahre[jahr])])
#electrical capacity outgoing plants for specific year
ab[jahr] = sum(df_solar.bruttoleistung[filter & (df_solar.ddes <= jahre[jahr])])
df.append([row.AGS.zfill(5), pd.to_datetime(jahre[jahr]).year, round(zu[jahr] - ab[jahr], 0)])
#create dataframe with columns key, date, year capacity
dff = pd.DataFrame(data=df, columns=['AGS', 'datum', 'leistung'])
#save dataframe as .csv
dff.to_csv('/path/to/.csv')
The code above loops through the counties (features) and assigns the electrical capacity for the year (years) considered in this analysis (year 2020). We get a list with feature key, year and electrical capacity. The results are saved in format .csv and serves as input for choropleth function as well as geoinformation in format .geojson. Additionally, we specify “featureidkey” as column in gdf_county that matches with key column from .csv file.
import plotly.express as px
#electrical capacity mapped for each feature in separate .csv file
#contains column (in this example AGS) as key for mapping
df = pd.read_csv('/path/to/.csv')
#geoinformation in .geojson format with same column as key for mapping
gdf_county = gpd.read_file('/path/to/.geojson')
# Choropleth
fig = px.choropleth(
#imported .csv with mapped electrical capacity for each feature
df, locations='AGS',
#geoinformation .geojson
geojson=gdf_county,
#column that will be colored
color='leistung',
#mapping column
featureidkey='properties.AGS',
animation_frame='datum',
color_continuous_scale='YlGn',
projection='mercator',
)
fig.update_geos(fitbounds='locations', visible=False)
fig.show()
The browser opens after the code finished and displays the map. As you move the mouse over the counties, popup box opens containing information about the selected feature.