Data preprocessing

In section Data source mentioned errors when dealing with data of RES will be discussed in this section. The focus lies on the detection and elimination of wrong location data and wrong indication of installed power. Issues that may arise with missing voltage level information are related to Machine Learning (ML) and RES data.

Location data of RES

Location data of RES consists of federal state, county, municipality and key. Because of territorial reform issues arise during mapping process of RES data and spatial data. Detected issues base on wrong location data. Territorial reforms in a federal state of Germany lead to wrong location data because entered location data during initial registration of RES in MaStR has changed. Typical errors are:

  • location has changed federal state

  • location has changed related county, municipality

Changed location goes hand in hand with changed key that maps location of RES and spatial data and finally assigned electrical capacity for each federal state, county, municipality is wrong (mismatch between key from location and key in spatial data). Let’s take two short examples:

  • old: “Gehlberg”, “Ilm-Kreis”,”16070017”, Thüringen

Territorial reform 2018/ 2019 in Thüringen:

  • new: “Gehlberg”, “Suhl”, “16054000”, “Thüringen”

The following commands update location “Gehlberg” regarding territorial reform.

import sqlalchemy
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session

SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://db_user:db_pw@localhost/database'

engine = sqlalchemy.create_engine(SQLALCHEMY_DATABASE_URI, echo=True)
Base = automap_base()
Base.prepare(engine, reflect=True)

session = Session(engine)
#e.g. wind energy
Wind = Base.classes.Windenergie

#dict location name: old state [0], new state [1], old county [2], new county [3], new key [4]
gemeinden = {
      "Gehlberg": ["Thüringen", "", "Ilm-Kreis", "Suhl", "16054000"]
}
#update entries in database for wind energy
for key, value in gemeinden.items():
  #updates columns in table wind energy
  session.query(Wind).filter(Wind.bundesland == value[0], Wind.landkreis == value[2], Wind.gemeinde == key).update({Wind.landkreis: value[3], Wind.gemeindeschluessel: value[4]}, synchronize_session = 'fetch')
  ...

The dictionary “gemeinden” saves location details of municipalities that are affected by territorial reform. In th next step we loop through the dictionary and update the database entries for each RES that belongs to the municipality. That process has to be repeated for each energy source (wind, solar, …)

Let’s consider example two:

  • old: “Wohlsborn”, “Weimarer Land”, “16071097”, “Thüringen”

Territorial reform 2018/ 2019 in Thüringen:

  • new: “Wohlsborn”, “Weimarer Land”, “16071102”, “Thüringen”

The following commands update location “Wohlsborn” regarding territorial reform.

# dict location name: old state [0], new state [1], old county [2], new county [3], old key [4], new key [5])
gemeinden = {
      "Wohlsborn": ["Thüringen", "", "Weimarer Land", "Weimarer Land", "16071097", "16071102"]
}
#update entries in database for wind energy
for key, value in gemeinden.items():
      session.query(Wind).filter(Wind.bundesland == value[0], Wind.landkreis == value[2], Wind.gemeinde == key).update({Wind.landkreis: value[3], Wind.gemeindeschluessel: value[5]}, synchronize_session = 'fetch')
      ...

We apply the same procedure as mentioned in the first example except that the dictionary contains different information about territorial reform.

The examples above demonstrate how to overcome wrong location data. Next, we will take a closer look how to handle wrong electrical capacity data.

Electrical capacity of RES

Wrong data of electrical capacity is a huge problem when dealing with data from MaStR. In the majority of cases people who register power plants in MaStR incorrectly mix up thousand separator with serious impact on data quality. Using the example of solar energy plants demonstrates this issue. For example we detect a plant with 5 MW electrical capacity connected to low voltage grid. Actually, the solar plant has an installed power of 5 kW but because of mixing up the thousand separator the plant is registered as a 5 MW plant. As a simple approach we filter all plants with assigned status “data correction” from grid operator to detect plants with wrong installed power. Unfortunately, grid operators have not yet verified all RES data with the effect that many plants with status “unverified” and incorrect data are part of MaStR (cf. verification process grid operator, Data source).

Another approach will be presented next with wind power plants as an example.

import sqlalchemy
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session

import pandas as pd
import geopandas as gpd

SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://db_user:db_pw@localhost/database'

engine = sqlalchemy.create_engine(SQLALCHEMY_DATABASE_URI, echo=True)
Base = automap_base()
Base.prepare(engine, reflect=True)

Wind = Base.classes.Windenergie
session = Session(engine)

#dataframe of wind plants with filter location not null, status not null, mastr id like 'SEE%'
df_wind = pd.DataFrame(data=session.query(Wind.mastrnummer, Wind.netzbetreiberpruefung, Wind.betriebsstatus, Wind.bruttoleistung, Wind.bundesland, Wind.landkreis, Wind.gemeinde, Wind.gemeindeschluessel, Wind.lage, Wind.breitengrad, Wind.laengengrad, Wind.ausrichtung, Wind.nabenhoehe, Wind.rdurchmesser, Wind.meer, Wind.spannungsebene).filter(
                          Wind.betriebsstatus is not None, Wind.mastrnummer.like('SEE%'), Wind.bundesland is not None, Wind.bundesland is not None, Wind.landkreis is not None, Wind.gemeinde is not None))

#filter dataframe based on features installed power, rotor diameter and hub height
df_wind_filtered = (df_wind[(df_wind_on["bruttoleistung"] < 50) & (df_wind["rdurchmesser"] > 10) & (df_wind["nabenhoehe"] < 200)])

#loop through df_wind_filtered with decision rule regarding thousand separator
for index, row in df_wind_filtered.iterrows():

    # installed power < 50 kW and hub height > 40 m and rotor diameter > 35 --> installed capacity * 1000
    if (row["bruttoleistung"] < 50) & (row["nabenhoehe"] > 40) & (row["rdurchmesser"] > 35):
        # print(index)
        df_wind_filtered.loc[index, "bruttoleistung"] = row["bruttoleistung"] * 1000
    else:
        df_wind_filtered.loc[index, "bruttoleistung"] = row["bruttoleistung"]

Firstly, we load wind power plants to python, adding some filter options on top. Next, incorrect wind power plants are selected in df_wind_filtered because plants that fulfill those conditions are unrealistic. Afterwards we loop through the dataframe and apply thousand separator.

The presented approach has to be applied to further energy sources as well but with different decision rules e.g. based on voltage level or amount of modules. The following section outlines an approach to verify the mapping process and also detect wrong location data.

Verification

On EEDATEN we implemented a test function that applies the basic concept of power balance to the mapping process. The advantage of this approach is that the mapping process is verified and issues because of territorial reforms are detected.

The installed power of a federal state is determined by:

\[P_{i, e} = \sum_{n=1}^N P_n\]

with N plants in federal state i, energy source e

The sum of installed power for all counties in a state is calculated by:

\[P_{i,e} = \sum_{j=1}^J P_j \text{ with } P_j = \sum_{n=1}^N P_n\]

and N plants in county j in federal state i, energy source e

On municipality level we get finally the following power balance:

\[P_{i,e} = \sum_{j=1}^J \sum_{k=1}^K \sum_{n=1}^N P_{j,k,n}\]

with N plants in municipality k in county j in federal state i, energy source e

On municipality level EEDATEN requests municipality keys for all location in a list that are assigned to that municipality from an external reference database. In case the municipality key of the RES location under investigation is not part of the mentioned list, the test function provides the incorrect assigned location key for further investigations.