Venue: The Fuqua School of Business, Duke University, 1 Towerview Drive, Durham, NC 27708-0120

 

Presentation

The probabilistic reduction approach to the specification of binary regression models: simulation and regression analysis

Authors: Ebere Onukwugha (University of Maryland); Jason Bergtold (Kansas State University); C. Daniel Mullins (University of Maryland)

Presenter: Ebere Onukwugha (University of Maryland)

Discussant: Ralph Bradley (US Bureau of Labor Statistics)

Session: Modeling

Room: Classroom F

When: Tuesday 10:30 a.m. - noon

Purpose: The probabilistic reduction (PR) approach formalizes the structure underlying conditional mean models and the interrelationships among included covariates. Resulting specifications like the Bernoulli Regression Model (BRM) of binary choice typically lead to a more complex specification than is dictated by the traditional logistic specification (TLS). We compared the fit of the BRM to the fit of the TLS in a Monte Carlo simulation framework and in a regression model of disparities in hospital discharge disposition. Methods: The analysis included two simulations and an empirical application. The two simulations differed by the distribution (Gamma or Normal) of the continuous covariate. Simulations were conducted for 1,000 replications and for sample sizes ranging from 100 to 50,000. The statistic of interest was the proportion of replicates for which the p-value on the Hosmer-Lemeshow (HL) test exceeded 0.05. An empirical application used data on 48,254 live discharges in Maryland following a hospitalization for stroke. The dependent variable was an indicator for a discharge to a medical care facility. Covariates included race, gender, age, marital status, insurance status, emergency room admission, and stroke type. The likelihood ratio, HL, modified deviance (MD) and modified Pearson (MP) tests assessed model fit. The applied research question related to the evidence for racial disparities in discharges to step-down care following a stroke admission. Results: Simulation results (proportion of well-calibrated replicates for TLS; proportion of well-calibrated replicates for BRM) showed that the relative advantage of the BRM increased with sample size. The models performed well at a sample size of 100 for the Gamma (0.948; 0.953) and the Normal (0.773; 0.913). However, their performance diverged as the sample size increased; the BRM specification outperformed the TLS at a sample of 50,000 for the Gamma (0.083; 0.957) and at a sample of 5,000 for the Normal (0.000; 0.967). In the empirical application, diagnostic tests (p-values) indicated that the BRM provided a better fit to the data than the TLS. The likelihood ratio test comparing both models yielded a test statistic equal to 611.82 with 19 degrees of freedom, leading to the rejection of the null hypothesis at the 5% level. The HL (p<0.0001), MD (p<0.0001) and MP (p<0.0001) tests indicated that the TLS is misspecified while the HL (p= 0.26), MD (p= 0.10) and MP (p=0.06) tests indicated that the BRM is well-calibrated to the data. The estimated effect of African-American race (AOR; 95% confidence interval) on the likelihood of a discharge to step-down care following an admission for stroke was lower and more precise in the BRM (AOR=1.19; 1.17?1.21) compared to the TLS (AOR=1.25; 1.20?1.31). Conclusion: The TLS was more likely than the BRM to fail model specification tests in specific instances using both simulated and observed data. While these findings may not generalize to alternative datasets or simulations, they underscore the significance of specification testing for binary choice models and suggest that the BRM specification may lead to more reliable inference compared with the more traditional specification of binary choice models.