Inhabit Data Calibration Workflow ================================= This document explains the new calibration workflow for inhabit data that efficiently calibrates multiple years to Census 2022 targets. Overview -------- The calibration workflow solves the problem of repeatedly running expensive IPF (Iterative Proportional Fitting) calibration for each year by: 1. **Calculating calibration factors once** from a base year (2019) to Census 2022 targets 2. **Applying the same factors** to all other years (2000-2018) instantly 3. **Handling dwelling stock calibration** consistently across all years 4. **Saving factors** for future reuse Key Benefits ------------ - **Performance**: ~100x faster for additional years (simple multiplication vs. IPF iterations) - **Consistency**: Same calibration logic applied to all years - **Reusability**: Factors saved to disk for future runs - **Flexibility**: Multiple workflow options for different use cases How It Works ------------ 1. Factor Calculation (Base Year 2019) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Calculate factors from 2019 data using Census 2022 targets factor_lookup = load_census.calibrate_inhabit_final(2019, ip, factor_lookup=None) This runs the full IPF calibration on 2019 data and extracts calibration factors for each combination of: - ``region_type_dwell`` (rural/urban) - ``building_type`` (MFH/SFH) - ``ownership`` (private owner/private tenant) - ``rooms`` (1, 2, 3, 4, 5, 6, 7+) - ``condition`` (renovated/not renovated) 2. Factor Application (Other Years) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Apply pre-calculated factors to any year load_census.calibrate_inhabit_final(2018, ip, factor_lookup=factor_lookup) This multiplies each row's weight by the appropriate factor - no iterative calculations needed. 3. Factor Structure ~~~~~~~~~~~~~~~~~~~ .. code-block:: python factor_lookup = { ('rural', 'SFH', 'private owner', '4', 'renovated'): 1.15, ('urban', 'MFH', 'private tenant', '2', 'not renovated'): 0.92, # ... for all combinations } Usage Examples -------------- Option 1: Batch Processing (Recommended) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Calibrate multiple years efficiently factor_lookup = load_census.calibrate_with_factor_reuse( base_year=2019, other_years=[2000, 2001, 2002, ..., 2018], ip=ip ) Option 2: Individual Year with Saved Factors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Use previously saved factors success = load_census.calibrate_year_with_saved_factors( year=2020, ip=ip, base_year=2019 ) Option 3: Manual Control ~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python # Calculate factors factor_lookup = load_census.calibrate_inhabit_final(2019, ip, factor_lookup=None) # Save factors load_census.save_calibration_factors(factor_lookup, ip, 2019) # Apply to other years for year in [2018, 2017, 2016]: load_census.calibrate_inhabit_final(year, ip, factor_lookup=factor_lookup) Integration Points ------------------ 1. inhabit_matrix.py ~~~~~~~~~~~~~~~~~~~~ The ``calibrate_to_census(ip)`` function now: - Uses 2019 as base year for calibration - Calculates factors once from 2019 → Census 2022 - Applies same factors to years 2000-2018 - Returns factors for use in dwelling stock calibration .. code-block:: python def calibrate_to_census(ip): base_year = 2019 cen_factor = load_census.calibrate_inhabit_final(base_year, ip) load_census.save_calibration_factors(cen_factor, ip, base_year) for year in range(base_year - 1, ip["empirical_start_year"] - 1, -1): load_census.calibrate_inhabit_final(year, ip, cen_factor) return cen_factor 2. dwelling_stock.py ~~~~~~~~~~~~~~~~~~~~ The ``get_dwelling_stock()`` function now: - Calculates dwelling stock factors once for base year 2019 - Reuses same factors for all other years - Handles both inhabit and dwelling stock calibration consistently .. code-block:: python def get_dwelling_stock(inhabit, ip, year, mor, split_large_dwell, cen_fac): # ... existing code ... if year == 2019 and cen_fac is None: # Calculate dwelling stock factors for base year cen_fac = load_census.load_census_factor(ip, ds, aggweights="dwells") load_census.save_calibration_factors(cen_fac, ip, f"dwelling_stock_{year}") elif cen_fac is None: # Load saved factors for other years cen_fac = load_census.load_calibration_factors(ip, "dwelling_stock_2019") # Apply factors ds = misc.apply_calibrate_factor(ds, cen_fac, ip, "dwells") # ... rest of function ... File Structure -------------- Input Files ~~~~~~~~~~~ - ``data/evidence/inhabit/inhabit_2019.csv`` (base year data) - ``data/evidence/inhabit/inhabit_2018.csv`` (other years) - ``data/evidence/move_out/move_out_2019.csv`` (move data) Output Files ~~~~~~~~~~~~ - ``data/factorized_evidence/inhabit/inhabit_2019.csv`` (calibrated data) - ``data/factorized_evidence/move_out/move_out_2019.csv`` (calibrated move data) - ``data/calibration_factors/calibration_factors_2019.pkl`` (saved factors) - ``data/calibration_factors/calibration_factors_dwelling_stock_2019.pkl`` (dwelling factors) Technical Details ----------------- IPF Calibration Process (Base Year Only) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Data Preparation**: Create contingency table from inhabit data 2. **Target Loading**: Load Census 2022 targets for dwelling categories 3. **IPF Iterations**: Iteratively adjust weights to match marginal totals 4. **Convergence**: Stop when changes are below tolerance (1e-6) 5. **Factor Extraction**: Calculate factor = calibrated_weight / original_weight Factor Application Process (Other Years) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1. **Factor Loading**: Load pre-calculated factors from disk or memory 2. **Row Matching**: Match each data row to appropriate factor by characteristics 3. **Weight Multiplication**: new_weight = original_weight * factor 4. **Consistency Check**: Verify totals match expected Census proportions Performance Comparison ~~~~~~~~~~~~~~~~~~~~~~ +--------------------+---------------+----------------+-------------------+ | Method | Time per Year | Iterations | Accuracy | +====================+===============+================+===================+ | Full IPF | ~30 seconds | ~50 per region | Exact | +--------------------+---------------+----------------+-------------------+ | Factor Application | ~0.3 seconds | None | Same as base year | +--------------------+---------------+----------------+-------------------+ Assumptions ----------- 1. **Temporal Stability**: Calibration adjustments needed are similar across years 2. **Structural Consistency**: Same dwelling categories exist in all years 3. **Base Year Quality**: 2019 data is representative for factor calculation 4. **Census Relevance**: 2022 Census targets are appropriate for all years Error Handling -------------- - **Missing Factors**: Warning logged, original weight retained - **Missing Files**: Clear error messages with expected file paths - **Invalid Data**: Validation checks for required columns and data types - **Convergence Issues**: Maximum iteration limits with progress reporting Validation ---------- To validate the calibration: 1. **Total Consistency**: Sum of calibrated weights should match Census totals 2. **Marginal Accuracy**: Totals by building type, ownership, rooms should match targets 3. **Distribution Preservation**: Relative relationships within categories preserved 4. **Year Consistency**: Similar adjustment patterns across all years Troubleshooting --------------- Common Issues ~~~~~~~~~~~~~ 1. **"No calibration factor found"**: - Check if all required columns exist in data - Verify category values match expected format 2. **"Calibration factors file not found"**: - Run base year calibration first - Check file permissions in data directory 3. **"IPF did not converge"**: - Check Census data quality - Verify target totals are reasonable - Increase maximum iterations if needed Debug Mode ~~~~~~~~~~ Enable debug mode in config: .. code-block:: yaml debug: true This provides detailed logging of: - Calibration factor calculations - Weight adjustments by category - Convergence progress - File operations