Understanding the code
Code from inhabit_table.py
Code from household.py
Code from dwelling.py
Load dwelling information from SOEP.
Disaggregation: House type (EFH, MFH, …) –> Restoration status –> House Owner –> Number of Rooms @Author: guuug
- scripts.dwelling.house_condition(df, cols_used)
Translate house condition information from int to string.
1: In a good condition –> renovated 2: Some renovations –> renovated 3: Full renovations –> not renovated 4: Dilapidated –> not renovated
- scripts.dwelling.load_data(soep_path)
Generate main df.
- scripts.dwelling.owner_type(df, soep_path, cols_used)
Translate owner/tenant information from int to string from hl/hlf0013_h.
1: Communal Dwelling, 2: Co-Operative Apt., 3: Company Apt., 4: Private Owner, 5: Do Not Know, 6: Private Company, 7: Non Profit Organization (Church, Foundation, etc.) 8: NaN
- scripts.dwelling.room_num(df, cols_used)
Limit the number of rooms to 1, 2, 3 and 4+.
1: 1 2: 2 3: 3 4: 4+ 5: 4+ …
- scripts.dwelling.tabula_building_types(df, soep_path, cols_used)
Return building type namings from tabula. SFH = Single Family House 1-2 dwellings MFH = Multi Family House, 3-12 dwellings AB = Apartment Building with 13+ dwellings
Code from misc.py
Define several global variables, paths and basic functions.
- scripts.misc.attachVariable(soep_path, add_vars, merge_from, merge_to, match_keys, new_names=None)
Merge add_vars+match_keys from merge_from to merge_to by match_keys.
- Parameters:
add_vars: list of variable names to merge to Dataframe
merge_from: Dataframe to be loaded, name of the dataset to merge from
merge_to: Dataframe, base dataset to merge variables to (e.g. df)
match_keys: list of variable names to use as keys for merging datasets
new_names: directly rename variables before merging them
- Returns:
updated dataset “add_to” (e.g. df) with new variables
Attention: Should be run only once.
- scripts.misc.clean_nan(df, on_cols=None)
Remove rows which contain NaN value in specific columns from df.
Args: - df: pandas.DataFrame to be cleaned - on_cols: list of strings containing the column names to consider if on_cols=None (default value), all columns of the df are considered
Returns: - cleaned dataframe
- scripts.misc.filters(df, only_households=False, age_filter=False, min_age=15, max_age=120, netto_filter=False, year_filter=False, set_year=2019, analysis_file=None)
Define and apply filters to Dataframe. Add column "sage" to dataframe.
- Parameters:
df: Dataframe, loaded soep csv
only_households: only keep the oldest person per household and survey year
age_filter: filter person age. min_age <= age <= max_age.
min_age: int, minimum age.
max_age: int, maximal age.
netto_filter: Boolean, only keep data that is complete etc.
year_filter: Boolean, only keep data from specified year.
analysis_file: run analysis and write results to file
- scripts.misc.get_negative_dict()
Create NaN entries for range(-8, 1) entries.
- scripts.misc.load_inputs()
- scripts.misc.timer_func(func)
Decorator for adding time measurements.