Understanding the code

Code from inhabit_table.py

Code from calibrate.py

calibrate.aggregated_plots_parameter_variation(ip_changes, ip, default_foldername, show_plots=False)

calibrate.collect_metrics(ip_changes, ip, show_plots=False)

calibrate.get_inhabit(ip_new, create_calibrate)

calibrate.main(delete_files=True, scen_run=False, *args, **flags)

calibrate.scenario_runner(ip, scenarios): Runs through given scenarios and creates plots and data

Code from allocation.py

Create a matrix with preferences where to move in to per household category.

The marix will have the same dimensions as the inhabit matrix. Each row in the matrix represents one household category. The columns represent the household categories. The cells in each row are filled with values summing up to 1. A value in a cell means the following: The probability that the hosehold with the given configuration (the row) wants to move to the dwelling type of the respective column.

From the input sheet “inputs.csv” a variant for the generation of the preference matrix must be chosen.

Currently available variant: - “current_quintile”: the preferences of the household category

are equal to the distribution of households in the inhabit matrix in the same household category (including the same quintile)

Variants in the future: - “quintile_above”: the preference of the household category in quintile qx

is equal to the distribution of households in the inhabit matrix of the same household category but in the above quintile (q{x+1})

“highest_quintile”: the preference of the household category in quintile x is equal to the distribution of households in the inhabit matrix of the same household category but in the highest quintile (q5)
“avg_highest_current_quintile”: the preference of the household category in quintile qx is equal to the average of a) the distribution of hh in the inhabit matrix of the same hh category and b) the distribution of hh in the inhabit matrix in the same hh category but in the highest quintile ((qx+q5)/2)

scripts.allocation.get_all_cases(ip)

scripts.allocation.get_alloc_dwell_order(ip, needed_dwellings, ds_dict, hh_configuration, alloc_limiter)

scripts.allocation.get_args_and_handler(ip, all_cases, dwell_var, dwell_order, changed_dwellings, params)

scripts.allocation.get_move_in_want(pref_v, ip, hh_stock, inhabit_v, alloc_rate_it)

scripts.allocation.get_osciallating_vals(changed_dwellings, dwell_order, ip): Generate all [current_room, target_room] combinations in oscillating order. - dwell_order: ‘oscillate_larger’ starts up then down

‘oscillate_lower’ starts down then up

scripts.allocation.get_quintile_limits(df, alloc_pref_limit)

scripts.allocation.get_regtype(df, regtyp, to_group, hh_dwell='dwell')

scripts.allocation.get_underoccupation(ip, needed_dwellings, hh_configuration)

scripts.allocation.other_handler(option1, option2, changed_dwellings)

scripts.allocation.pref_current_quintile(inhabit_v, ip, alloc_pref_limit)

Create preference matrix from inhabit vector that focuses on the same quintile. The whole household-configuration stays the same. The people want to move to the same places as they already live.

Parameters

inhabit_v (dataframe) – inhabit vektor
ip (dict) – inputs

Returns

preference matrix

Return type

dataframe

scripts.allocation.pref_no_underoccupation(inhabit_v, ip, alloc_pref_limit)

Function that calculates the preference matrix for the preference setting ‘no_preferred_underoccupation’, where the preferences “current_quintile” are manipulated, so that househods want prefere dwellings with a maximum number of rooms of hh_hize+1

Call the calc_shares function, that calculates the historic preferences.

go trough the household sizes
go trough the room sizes larger than hh_size+1 for all entries add the preference factor to the dwelling size without underoccupation. And Overwrite the entry with 0.

Args:: inhabit: cols_dwell:
Returns:: preference matrix with dimensions of the inhabit matrix

scripts.allocation.pref_q4_aspiration(inhabit_v, ip, alloc_pref_limit)

Function that calculates the preference matrix for the preference setting ‘quitile 4 aspiration’, that is a mix of “current quintile” for all except from q4 that has the preference quintile above Steps:

Call the calc_shares function, that calculates the historic preferences.

go through the quintiles, starting from the second highest,
and copy preferences from the quintile above to it use .xs method of multiindex to access the quintile preferences

Args:: inhabit: cols_dwell:
Returns:: preference matrix with dimensions of the inhabit matrix

scripts.allocation.pref_quintile_above(inhabit_v, ip, alloc_pref_limit)

Function that calculates the preference matrix for the preference setting ‘quintile above’ Steps:

Call the calc_shares function, that calculates the historic preferences.

go through the quintiles, starting from the second highest,
and copy preferences from the quintile above to it use .xs method of multiindex to access the quintile preferences

Args:: inhabit: cols_dwell:
Returns:: preference matrix with dimensions of the inhabit matrix

scripts.allocation.standard_handler(check_value, check_list, changed_dwellings)

Code from alloc_calibration.py

This script evaluates the yearly results of the dwelling allocation process. It therefore compares the inhabit matrices of the allocation output with the respective empirical inhabit matrix.

As a metric the root-mean-square error (rmse) is calculated for the whole matrix.

scripts.alloc_calibration.inhabit_metrics(inhabit_1_v, inhabit_2_v)

Calculates a metric between two inhabit matrices

Args:: inhabit_1: inhabit_2:

Returns:

scripts.alloc_calibration.main(ip)

Code from check_values_labels.py

Code from dwelling.py

Load dwelling information from SOEP.

Disaggregation: House type (EFH, MFH, …) –> Restoration status –> House Owner –> Number of Rooms

scripts.dwelling.building_type(df, ip, col_name)

Return building type namings from tabula.

SFH = Single Family House 1-2 dwellings RH = Apartment Building with 1-2 dwellings as double house, row house, … MFH = Multi Family House, 3-12 dwellings GMH (AB) = Apartment Building with 13+ dwellings

scripts.dwelling.house_condition(df, ip, col_name)

Translate house condition information from int to string.

1: In a good condition –> renovated –> tabula: ambitiously sanitized 2: Some renovations –> not renovated –> tabula: sanitized 3: Full renovations –> not renovated –> tabula: not sanitized 4: Dilapidated –> not renovated –> tabula: not sanitized

scripts.dwelling.owner_type(df, ip, col_name)

Translate owning status info from int to string from hlf0013_h and hgowner.

hlf0013_h from hl (low data availability) 1: Communal Dwelling, 2: Co-Operative Apt., 3: Company Apt., 4: Private Owner, 5: Do Not Know, 6: Private Company, 7: Non Profit Organization (Church, Foundation, etc.) 8: NaN

hgowner 1: owner 2: main tenant 3: sub-tenant 4: tenant 5: living in a home (Heim) or shared accomodation

scripts.dwelling.region_type(df, ip, col_name): Assign region type to hids.

scripts.dwelling.room_num(df, ip, col_name)

Limit the number of rooms to 1, 2, 3 and 4+.

1: 1 2: 2 3: 3 … ip[‘max_rooms’]: ip[‘max_rooms’]+ (e.g. 4+, or 7+) …

Code from evaluate_missings.py

Analyse inhabit table or empirical data of variables from SOEP Core.

scripts.evaluate_missings.main(): Check if in variables there is enough data for every year.

Code from filters.py

Define filters for soep dataframes

scripts.filters.age_filter(df, ip)

Add column "sage" to dataframe. min_age <= age <= max_age.

Parameters:

df: Dataframe, loaded soep csv.
min_age: int, minimum age.
max_age: int, maximal age.

scripts.filters.filter_df(df, ip, set_year)

scripts.filters.households_filter(df, ip)

Keep the first non NaN row per household for all variables that are not gebjahr, hid or syear. As we group by households (and syear) and there are only household variables in df except for gebjahr, the values in the other columns are the same for all people (rows) in the same household. Keep all entries, add column and mark the entries with True at the oldest person in Household, False for the rest.

Parameters:

df: Dataframe, loaded soep csv.
ip:

scripts.filters.netto_filter(df)

Limit hh interviews to those that were conducted successfully.

Parameters:

df: Dataframe, loaded soep csv.

scripts.filters.year_filter(df, set_year=2019)

Select only interview years of certain year.

Parameters:

df: Dataframe, loaded soep csv.
set_year: int, specific year to be used.

Code from household.py

Define functions to count households by income, household type, age and household size.

Functions definded in this script: - Load data on person level - Define changes in disaggregagtion of different SOEP variables and replace encoding - Calculate income quintiles - Caclulate weighted shares for one variable (e.g. income)

Disaggregation of heouseholds in inhabit matrix: Region –> Income Quantile –> household type –> age –> number of people

Functions are applied in “inhabit_matrix.py”.

scripts.household.age_class(df, ip, col_name): Assign age classes to every person.

scripts.household.household_types(df, ip, col_name)

Replace encoding of household types with own encoding.

Merge “single parent” (value 3) or “couple” with children” (values 4-6) merge “multiple generation-HH” and “other combination” 1: 1-Pers.-HH single 2: Couple Without Children couple_no_child 3: Single Parent single_parent 4: Couple With Children LE 16 couple_parent 5: Couple With Children GT 16 couple_parent 6: Couple With Children LE And GT 16 couple_parent 7: Multiple Generation-HH other 8: Other Combination other

scripts.household.income_quintiles(df, ip, col_name)

Calculate income quintiles for df and add them in "income_quintile" column.

q1: the quintil with least income
q2
q3
q4
q5: the quintil with highest income

scripts.household.size_class(df, ip, col_name): Assign Household size class to every person.

Code from inputs.py

inputs for the model

scripts.inputs.load_inputs(**flags): Load input csv and add path to composita file

Code from misc.py

Define several global variables, paths and basic functions.

scripts.misc.add_mean_ls(ip, inhabit)

scripts.misc.check_absolute_path(path): Checks if the current working directory is inhabit. If yes Path is kept, if not ‘../’ is added to the path

scripts.misc.check_empty(df, source)

scripts.misc.clean_nan(df, on_cols=None)

Remove rows which contain NaN value in specific columns from df.

Args: - df: pandas.DataFrame to be cleaned - on_cols: list of strings containing the column names to consider if on_cols=None (default value), all columns of the df are considered

Returns: - cleaned dataframe

scripts.misc.create_stocks(inhabit, ip)

scripts.misc.debug_messages(message, ip): prints debugging information. If flag is set, save them to string and return.

scripts.misc.get_age_classes(ip)

scripts.misc.get_full_inhabit(all_dimensions, ip, df_all_v, index_var)

scripts.misc.get_negative_dict(): Create NaN entries for range(-8, 0) entries.

scripts.misc.get_save_path(folder1, folder2)

scripts.misc.get_split(ip, split_large_dwell)

scripts.misc.load_inhabit_moving(inhabit_or_move, year, ip, evidence_folder, load_abs=False, alt_path=None, use_weights=True)

Load the data from inhabit_{year}.csv or move_out_{year}.csv to dataframes.

Columns in matrices: hid, building_type, ownership, condition, rooms, region_type, growth_type, income_quintile, hh_type, hh_size, age, weights Data loaded with one household per row, then grouped.

Args: - inhabit_or_move_out = “inhabit” or “move_out” (subfolder in output) - year: year for which to load the data

Return: - df_grouped: grouped and weighted dataframe, but not unstacked - df_absolute: grouped and unweighted dataframe, but not unstacked

scripts.misc.order_move_in_want(move_in_want_v, ip): Allocate searching percentiles of households to free dwellings according to their preferences. The order of allocation is saved in ip. Use mapping a, b, c, d, e, … for hh_type in the order of which it is given. map back after sorting

scripts.misc.printd(str, ip): Prints debugging information if flag is set.

scripts.misc.save_params_to_file(ip)

scripts.misc.stock_assertions(dwell_stock, hh_stock)

scripts.misc.timer_func(func): Decorator for adding time measurements.

Code from move_out_rate.py

Evaluate the inhabit matrices and move out matrices created by inhabit.py.

scripts.move_out_rate.apply_factors(mor, ip, year, factors1, factors2)

scripts.move_out_rate.create_mor(ip, mor, year, all_disaggs, mors_factors, mors_disaggs, keys, factors1_def, factors2_def, factors1_scen, factors2_scen)

scripts.move_out_rate.interpolate(year_1, year_2, factors)

Interpolate from x to y on a yearly basis. First create years between x and y, then check factors values for years x and y and interpolate the values for the years in between.

Args:: year_1 (str): year based on columns from factors dataframe. year_2 (str): year based on columns from factors dataframe. factors (df): loaded factors for each column and year for margins.
Returns:: factors: updated with values for the yeas between x and y

scripts.move_out_rate.linear_regression(x_regression, y_regression, x_new)

scripts.move_out_rate.load_factors(ip, scen_def)

Load factors from move_out_rate_input.xlsx file and apply yearly interpolation.

Args:: ip (dict): contains user-input from xlsx and in-program added variables.
Returns:: factors (df): loaded and interpolated factors to apply to yearly move_out_rates.

Code from occupation_charts.py

Create charts to evaluate all inhabit tables over time.

scripts.occupation_charts.create_charts(ip, plot_year, ms_year, t_year, s_year, incr, decr)

scripts.occupation_charts.get_chart_vars(ip, ms_year, t_year, s_year, incr_decr, all_avg_disaggs, all_disaggs)

scripts.occupation_charts.get_inh_all(inh_path, ms_year, t_year, s_year)

scripts.occupation_charts.plot_all_scenarios(dfs, s_year, ms_year, t_year, save_path, all_avg_disaggs, incr_decr)

scripts.occupation_charts.plot_all_yrs(df_all, avg_disagg, disagg, start_year, model_start_year, end_year, save_path, inh_scen, scen)

Plots the average value of “avg_disagg” by “disagg” for all years.

Params: - df: weighted DataFrame with columns year, “avg_disagg, “disagg” - avg_disagg: str, main avg_disagg (e.g. “underoccupation”) - disagg: list of str, disaggregation avg_disagg, e.g. [“income_group”] - start_year: int - end_year: int - save_path: str - multilelvel: bool, if True: multiple groupby avg_disagg can be passed as a list of strings, if only one str is passed in the list: leave out multilevel parameter (defaults to False) - share: bool, if True: title is called “Share of underoccupation => 1” resp. “>=2”

Return: Plots a line graph with - x-axis: years - y-axis: main variable (weighted) - lines for each disaggregation (groups of groupby variable)

scripts.occupation_charts.plot_line(df_incr, df_decr, df_d, avg_disagg, disagg, plot_year, save_path, scen, incr_decr, year): Return: Plots a line graph with - x-axis: years - y-axis: main variable (weighted) - lines

scripts.occupation_charts.plot_mor(ip, incr_decr)

scripts.occupation_charts.underoccup(ip, inh_all, disagg, start_year, end_year, save_path, average_disagg, plot_year): Calculate the underoccupation over the years for different disaggregations

scripts.occupation_charts.underoccup_col(ip, df, average_disagg)

Code from soep_loader.py

Load soep data

scripts.soep_loader.attachVariable(soep_path, add_vars, merge_from, merge_to, match_keys, new_names=None)

Merge add_vars+match_keys from merge_from to merge_to by match_keys.

Parameters:

soep_path: path to soep folder respectively the data folder
add_vars: list of variable names to merge to Dataframe
merge_from: Dataframe to be loaded, name of the dataset to merge from
merge_to: Dataframe, base dataset to merge variables to (e.g. df)
match_keys: list of variable names to use as keys for merging datasets
new_names: directly rename variables before merging them

Returns:

updated dataset “add_to” (e.g. df) with new variables

Attention: Should be run only once.

scripts.soep_loader.movers(df, ip): Add variable for residential move to hids.