Inhabit Data Calibration Workflow

This document explains the new calibration workflow for inhabit data that efficiently calibrates multiple years to Census 2022 targets.

Overview

The calibration workflow solves the problem of repeatedly running expensive IPF (Iterative Proportional Fitting) calibration for each year by:

  1. Calculating calibration factors once from a base year (2019) to Census 2022 targets

  2. Applying the same factors to all other years (2000-2018) instantly

  3. Handling dwelling stock calibration consistently across all years

  4. Saving factors for future reuse

Key Benefits

  • Performance: ~100x faster for additional years (simple multiplication vs. IPF iterations)

  • Consistency: Same calibration logic applied to all years

  • Reusability: Factors saved to disk for future runs

  • Flexibility: Multiple workflow options for different use cases

How It Works

1. Factor Calculation (Base Year 2019)

# Calculate factors from 2019 data using Census 2022 targets
factor_lookup = load_census.calibrate_inhabit_final(2019, ip, factor_lookup=None)

This runs the full IPF calibration on 2019 data and extracts calibration factors for each combination of:

  • region_type_dwell (rural/urban)

  • building_type (MFH/SFH)

  • ownership (private owner/private tenant)

  • rooms (1, 2, 3, 4, 5, 6, 7+)

  • condition (renovated/not renovated)

2. Factor Application (Other Years)

# Apply pre-calculated factors to any year
load_census.calibrate_inhabit_final(2018, ip, factor_lookup=factor_lookup)

This multiplies each row’s weight by the appropriate factor - no iterative calculations needed.

3. Factor Structure

factor_lookup = {
    ('rural', 'SFH', 'private owner', '4', 'renovated'): 1.15,
    ('urban', 'MFH', 'private tenant', '2', 'not renovated'): 0.92,
    # ... for all combinations
}

Usage Examples

Option 2: Individual Year with Saved Factors

# Use previously saved factors
success = load_census.calibrate_year_with_saved_factors(
    year=2020,
    ip=ip,
    base_year=2019
)

Option 3: Manual Control

# Calculate factors
factor_lookup = load_census.calibrate_inhabit_final(2019, ip, factor_lookup=None)

# Save factors
load_census.save_calibration_factors(factor_lookup, ip, 2019)

# Apply to other years
for year in [2018, 2017, 2016]:
    load_census.calibrate_inhabit_final(year, ip, factor_lookup=factor_lookup)

Integration Points

1. inhabit_matrix.py

The calibrate_to_census(ip) function now:

  • Uses 2019 as base year for calibration

  • Calculates factors once from 2019 → Census 2022

  • Applies same factors to years 2000-2018

  • Returns factors for use in dwelling stock calibration

def calibrate_to_census(ip):
    base_year = 2019
    cen_factor = load_census.calibrate_inhabit_final(base_year, ip)
    load_census.save_calibration_factors(cen_factor, ip, base_year)

    for year in range(base_year - 1, ip["empirical_start_year"] - 1, -1):
        load_census.calibrate_inhabit_final(year, ip, cen_factor)

    return cen_factor

2. dwelling_stock.py

The get_dwelling_stock() function now:

  • Calculates dwelling stock factors once for base year 2019

  • Reuses same factors for all other years

  • Handles both inhabit and dwelling stock calibration consistently

def get_dwelling_stock(inhabit, ip, year, mor, split_large_dwell, cen_fac):
    # ... existing code ...

    if year == 2019 and cen_fac is None:
        # Calculate dwelling stock factors for base year
        cen_fac = load_census.load_census_factor(ip, ds, aggweights="dwells")
        load_census.save_calibration_factors(cen_fac, ip, f"dwelling_stock_{year}")
    elif cen_fac is None:
        # Load saved factors for other years
        cen_fac = load_census.load_calibration_factors(ip, "dwelling_stock_2019")

    # Apply factors
    ds = misc.apply_calibrate_factor(ds, cen_fac, ip, "dwells")
    # ... rest of function ...

File Structure

Input Files

  • data/evidence/inhabit/inhabit_2019.csv (base year data)

  • data/evidence/inhabit/inhabit_2018.csv (other years)

  • data/evidence/move_out/move_out_2019.csv (move data)

Output Files

  • data/factorized_evidence/inhabit/inhabit_2019.csv (calibrated data)

  • data/factorized_evidence/move_out/move_out_2019.csv (calibrated move data)

  • data/calibration_factors/calibration_factors_2019.pkl (saved factors)

  • data/calibration_factors/calibration_factors_dwelling_stock_2019.pkl (dwelling factors)

Technical Details

IPF Calibration Process (Base Year Only)

  1. Data Preparation: Create contingency table from inhabit data

  2. Target Loading: Load Census 2022 targets for dwelling categories

  3. IPF Iterations: Iteratively adjust weights to match marginal totals

  4. Convergence: Stop when changes are below tolerance (1e-6)

  5. Factor Extraction: Calculate factor = calibrated_weight / original_weight

Factor Application Process (Other Years)

  1. Factor Loading: Load pre-calculated factors from disk or memory

  2. Row Matching: Match each data row to appropriate factor by characteristics

  3. Weight Multiplication: new_weight = original_weight * factor

  4. Consistency Check: Verify totals match expected Census proportions

Performance Comparison

Method

Time per Year

Iterations

Accuracy

Full IPF

~30 seconds

~50 per region

Exact

Factor Application

~0.3 seconds

None

Same as base year

Assumptions

  1. Temporal Stability: Calibration adjustments needed are similar across years

  2. Structural Consistency: Same dwelling categories exist in all years

  3. Base Year Quality: 2019 data is representative for factor calculation

  4. Census Relevance: 2022 Census targets are appropriate for all years

Error Handling

  • Missing Factors: Warning logged, original weight retained

  • Missing Files: Clear error messages with expected file paths

  • Invalid Data: Validation checks for required columns and data types

  • Convergence Issues: Maximum iteration limits with progress reporting

Validation

To validate the calibration:

  1. Total Consistency: Sum of calibrated weights should match Census totals

  2. Marginal Accuracy: Totals by building type, ownership, rooms should match targets

  3. Distribution Preservation: Relative relationships within categories preserved

  4. Year Consistency: Similar adjustment patterns across all years

Troubleshooting

Common Issues

  1. “No calibration factor found”: - Check if all required columns exist in data - Verify category values match expected format

  2. “Calibration factors file not found”: - Run base year calibration first - Check file permissions in data directory

  3. “IPF did not converge”: - Check Census data quality - Verify target totals are reasonable - Increase maximum iterations if needed

Debug Mode

Enable debug mode in config:

debug: true

This provides detailed logging of:

  • Calibration factor calculations

  • Weight adjustments by category

  • Convergence progress

  • File operations