by Ryan Boyle, Senior Economist, Northern Trust
Data analysts must appreciate the difference between zero and null. Zero is a number, a measured empty quantity. Null is the absence of data, a missing value; mathematical operations cannot be applied to null values.
Last week’s discussion of the Bureau of Labor Statistics (BLS) highlighted the poor survey responses affecting employment estimates. The BLS estimate of inflation is also challenged by missing data for prices. At a time when economic aggregates are so vital, it is helpful to understand how the agency accounts for nulls.
Filling data gaps creates the potential for data fluctuations.
Imputation is the process of replacing missing data with plausible values. While not strictly necessary, imputation is helpful in economic data to calculate aggregates consistently and to not leave gaps in historical time series. Analysts make an effort to replace missing values with the most comparable value available.
For the prices that feed into the monthly consumer price index (CPI) inflation measure, the BLS imputes according to a hierarchy: In home cell imputation, null prices are estimated from observed prices for a similar item and geographic area, like indexing the price of wheat bread to the price of all bread; rents may be similarly derived. In different cell imputation, missing prices are imputed from collected prices of the same item in other geographies. If neither approach is workable, carry forward imputation assumes no changes and copies the value from the prior month.
The BLS does not disclose which of its readings are imputed or the total number of imputations, but the agency announced in June that roughly 15% of prices are not being directly collected. Coupled with halted data gathering in three geographies due to staff shortages, we assume more than 15% of prices are imputed. And the quality of imputations is declining. As BLS resources were curtailed this year, a rising share of imputations were made using the less precise different-cell method.
Properly calibrated, estimation can work well, but it does increase the margin of error in the short run. As with the employment surveys, price collection methods could be improved. Many values are gathered through consumer interviews or observed by price-checkers walking through stores. Automation would require development of a whole new approach to data gathering.
Null values are a natural outcome in data collection, but their rise is a sign of declining data quality. Insufficient investment in the statistical agencies risks nullifying the value of critical statistics.
Copyright © Northern Trust