Inhabit Data Calibration Workflow
This document explains the new calibration workflow for inhabit data that efficiently calibrates multiple years to Census 2022 targets.
Overview
The calibration workflow solves the problem of repeatedly running expensive IPF (Iterative Proportional Fitting) calibration for each year by:
Calculating calibration factors once from a base year (2019) to Census 2022 targets
Applying the same factors to all other years (2000-2018) instantly
Handling dwelling stock calibration consistently across all years
Saving factors for future reuse
Key Benefits
Performance: ~100x faster for additional years (simple multiplication vs. IPF iterations)
Consistency: Same calibration logic applied to all years
Reusability: Factors saved to disk for future runs
Flexibility: Multiple workflow options for different use cases
How It Works
1. Factor Calculation (Base Year 2019)
# Calculate factors from 2019 data using Census 2022 targets
factor_lookup = load_census.calibrate_inhabit_final(2019, ip, factor_lookup=None)
This runs the full IPF calibration on 2019 data and extracts calibration factors for each combination of:
region_type_dwell(rural/urban)building_type(MFH/SFH)ownership(private owner/private tenant)rooms(1, 2, 3, 4, 5, 6, 7+)condition(renovated/not renovated)
2. Factor Application (Other Years)
# Apply pre-calculated factors to any year
load_census.calibrate_inhabit_final(2018, ip, factor_lookup=factor_lookup)
This multiplies each row’s weight by the appropriate factor - no iterative calculations needed.
3. Factor Structure
factor_lookup = {
('rural', 'SFH', 'private owner', '4', 'renovated'): 1.15,
('urban', 'MFH', 'private tenant', '2', 'not renovated'): 0.92,
# ... for all combinations
}
Usage Examples
Option 1: Batch Processing (Recommended)
# Calibrate multiple years efficiently
factor_lookup = load_census.calibrate_with_factor_reuse(
base_year=2019,
other_years=[2000, 2001, 2002, ..., 2018],
ip=ip
)
Option 2: Individual Year with Saved Factors
# Use previously saved factors
success = load_census.calibrate_year_with_saved_factors(
year=2020,
ip=ip,
base_year=2019
)
Option 3: Manual Control
# Calculate factors
factor_lookup = load_census.calibrate_inhabit_final(2019, ip, factor_lookup=None)
# Save factors
load_census.save_calibration_factors(factor_lookup, ip, 2019)
# Apply to other years
for year in [2018, 2017, 2016]:
load_census.calibrate_inhabit_final(year, ip, factor_lookup=factor_lookup)
Integration Points
1. inhabit_matrix.py
The calibrate_to_census(ip) function now:
Uses 2019 as base year for calibration
Calculates factors once from 2019 → Census 2022
Applies same factors to years 2000-2018
Returns factors for use in dwelling stock calibration
def calibrate_to_census(ip):
base_year = 2019
cen_factor = load_census.calibrate_inhabit_final(base_year, ip)
load_census.save_calibration_factors(cen_factor, ip, base_year)
for year in range(base_year - 1, ip["empirical_start_year"] - 1, -1):
load_census.calibrate_inhabit_final(year, ip, cen_factor)
return cen_factor
2. dwelling_stock.py
The get_dwelling_stock() function now:
Calculates dwelling stock factors once for base year 2019
Reuses same factors for all other years
Handles both inhabit and dwelling stock calibration consistently
def get_dwelling_stock(inhabit, ip, year, mor, split_large_dwell, cen_fac):
# ... existing code ...
if year == 2019 and cen_fac is None:
# Calculate dwelling stock factors for base year
cen_fac = load_census.load_census_factor(ip, ds, aggweights="dwells")
load_census.save_calibration_factors(cen_fac, ip, f"dwelling_stock_{year}")
elif cen_fac is None:
# Load saved factors for other years
cen_fac = load_census.load_calibration_factors(ip, "dwelling_stock_2019")
# Apply factors
ds = misc.apply_calibrate_factor(ds, cen_fac, ip, "dwells")
# ... rest of function ...
File Structure
Input Files
data/evidence/inhabit/inhabit_2019.csv(base year data)data/evidence/inhabit/inhabit_2018.csv(other years)data/evidence/move_out/move_out_2019.csv(move data)
Output Files
data/factorized_evidence/inhabit/inhabit_2019.csv(calibrated data)data/factorized_evidence/move_out/move_out_2019.csv(calibrated move data)data/calibration_factors/calibration_factors_2019.pkl(saved factors)data/calibration_factors/calibration_factors_dwelling_stock_2019.pkl(dwelling factors)
Technical Details
IPF Calibration Process (Base Year Only)
Data Preparation: Create contingency table from inhabit data
Target Loading: Load Census 2022 targets for dwelling categories
IPF Iterations: Iteratively adjust weights to match marginal totals
Convergence: Stop when changes are below tolerance (1e-6)
Factor Extraction: Calculate factor = calibrated_weight / original_weight
Factor Application Process (Other Years)
Factor Loading: Load pre-calculated factors from disk or memory
Row Matching: Match each data row to appropriate factor by characteristics
Weight Multiplication: new_weight = original_weight * factor
Consistency Check: Verify totals match expected Census proportions
Performance Comparison
Method |
Time per Year |
Iterations |
Accuracy |
|---|---|---|---|
Full IPF |
~30 seconds |
~50 per region |
Exact |
Factor Application |
~0.3 seconds |
None |
Same as base year |
Assumptions
Temporal Stability: Calibration adjustments needed are similar across years
Structural Consistency: Same dwelling categories exist in all years
Base Year Quality: 2019 data is representative for factor calculation
Census Relevance: 2022 Census targets are appropriate for all years
Error Handling
Missing Factors: Warning logged, original weight retained
Missing Files: Clear error messages with expected file paths
Invalid Data: Validation checks for required columns and data types
Convergence Issues: Maximum iteration limits with progress reporting
Validation
To validate the calibration:
Total Consistency: Sum of calibrated weights should match Census totals
Marginal Accuracy: Totals by building type, ownership, rooms should match targets
Distribution Preservation: Relative relationships within categories preserved
Year Consistency: Similar adjustment patterns across all years
Troubleshooting
Common Issues
“No calibration factor found”: - Check if all required columns exist in data - Verify category values match expected format
“Calibration factors file not found”: - Run base year calibration first - Check file permissions in data directory
“IPF did not converge”: - Check Census data quality - Verify target totals are reasonable - Increase maximum iterations if needed
Debug Mode
Enable debug mode in config:
debug: true
This provides detailed logging of:
Calibration factor calculations
Weight adjustments by category
Convergence progress
File operations