Understanding the code

Code from inhabit_table.py

Code from household.py

Code from dwelling.py

Load dwelling information from SOEP.

Disaggregation: House type (EFH, MFH, …) –> Restoration status –> House Owner –> Number of Rooms @Author: guuug

scripts.dwelling.house_condition(df, cols_used)

Translate house condition information from int to string.

1: In a good condition –> renovated 2: Some renovations –> renovated 3: Full renovations –> not renovated 4: Dilapidated –> not renovated

scripts.dwelling.load_data(soep_path)

Generate main df.

scripts.dwelling.owner_type(df, soep_path, cols_used)

Translate owner/tenant information from int to string from hl/hlf0013_h.

1: Communal Dwelling, 2: Co-Operative Apt., 3: Company Apt., 4: Private Owner, 5: Do Not Know, 6: Private Company, 7: Non Profit Organization (Church, Foundation, etc.) 8: NaN

scripts.dwelling.room_num(df, cols_used)

Limit the number of rooms to 1, 2, 3 and 4+.

1: 1 2: 2 3: 3 4: 4+ 5: 4+ …

scripts.dwelling.tabula_building_types(df, soep_path, cols_used)

Return building type namings from tabula. SFH = Single Family House 1-2 dwellings MFH = Multi Family House, 3-12 dwellings AB = Apartment Building with 13+ dwellings

Code from misc.py

Define several global variables, paths and basic functions.

scripts.misc.attachVariable(soep_path, add_vars, merge_from, merge_to, match_keys, new_names=None)

Merge add_vars+match_keys from merge_from to merge_to by match_keys.

Parameters:
  • add_vars: list of variable names to merge to Dataframe

  • merge_from: Dataframe to be loaded, name of the dataset to merge from

  • merge_to: Dataframe, base dataset to merge variables to (e.g. df)

  • match_keys: list of variable names to use as keys for merging datasets

  • new_names: directly rename variables before merging them

Returns:
  • updated dataset “add_to” (e.g. df) with new variables

Attention: Should be run only once.

scripts.misc.clean_nan(df, on_cols=None)

Remove rows which contain NaN value in specific columns from df.

Args: - df: pandas.DataFrame to be cleaned - on_cols: list of strings containing the column names to consider if on_cols=None (default value), all columns of the df are considered

Returns: - cleaned dataframe

scripts.misc.filters(df, only_households=False, age_filter=False, min_age=15, max_age=120, netto_filter=False, year_filter=False, set_year=2019, analysis_file=None)

Define and apply filters to Dataframe. Add column "sage" to dataframe.

Parameters:
  • df: Dataframe, loaded soep csv

  • only_households: only keep the oldest person per household and survey year

  • age_filter: filter person age. min_age <= age <= max_age.

  • min_age: int, minimum age.

  • max_age: int, maximal age.

  • netto_filter: Boolean, only keep data that is complete etc.

  • year_filter: Boolean, only keep data from specified year.

  • analysis_file: run analysis and write results to file

scripts.misc.get_negative_dict()

Create NaN entries for range(-8, 1) entries.

scripts.misc.load_inputs()
scripts.misc.timer_func(func)

Decorator for adding time measurements.

Code from check_values_label.py