Modelling well being care value is usually problematic as a result of are distributed in a non-normal method. Typically, there are numerous $0 observations (i.e., people who don’t use any well being care) and value distribution that’s strongly proper skewed amongst well being care customers due a disproportionate variety of people with very excessive well being care prices. This statement is well-known by well being economists however a complicating issue for modelers is mapping illness value to particular well being care states. For occasion, whereas the price of most cancers care could fluctuate based mostly on illness stage and whether or not the most cancers has progressed; the price of heart problems will differ if the affected person has a myocardial infarction.
A paper by Zhou et al. (2023) gives a pleasant tutorial on tips on how to estimate prices with illness mannequin states utilizing generalized linear fashions. The tutorial incorporates for important steps.
Step 1: Preparing the dataset:
- The dataset sometimes requires calculating value for discrete time intervals. For occasion, when you have claims knowledge, you might have data on value by date, however for analytic functions could wish to have a dataset with value data by individual (rows) with the columns being the fee by yr (or month). Alternatively, you would create the unit of statement to be the person-year (or person-month) and every row could be a separate person-year document.
- Next, one should specify the illness states. In every time interval, the individual is assigned to a illness state. Challenges embrace figuring out how granular to make the states (e.g. simply MI vs timing since MI) and tips on how to deal with multi-state situations.
- When knowledge are censored one can (i) add a covariate to point knowledge are censored or (ii) exclude observations with partial knowledge. If value knowledge are lacking (however the affected person shouldn’t be in any other case censored), a number of imputation strategies could also be used. Forming the time intervals of research requires mapping to the choice mannequin’s cycle size, dealing with censoring appropriately, and probably reworking knowledge.
- A pattern knowledge set is proven under.
Step 2: Model choice:
- The paper recommends utilizing a two-part mannequin with a generalized linear mannequin (GLM) framework, since OLS assumptions round normality and homoscedasticity within the residuals are sometimes violated.
- With the GLM, the anticipated worth of value is reworked non-linearly, as proven within the method under. You are required to estimate each a hyperlink operate and the distribution of the error time period. “The hottest ones (mixtures of hyperlink operate and distribution) for healthcare prices are linear regression (identification hyperlink with Gaussian distribution) and Gamma regression with a pure logarithm hyperlink.)
- To mix the GLM with a two-part mannequin, one merely estimate the equation above on all constructive values after which calculates a logit or probit mannequin for the probability a person has constructive value.
Step 3: Selecting the ultimate mannequin.
- Model choice first should take into account which covariates are included within the regression which may be obtained by stepwise choice utilizing a pre-specified statistical significance. However this can lead to over becoming. Alternative covariate choice methods embrace bootstrapping stepwise choice and penalized methods (e.g. least angle choice and shrinkage operator, LASSO). Interactions between covariates is also thought-about.
- Overall match may be evaluated utilizing the imply error, imply absolute error and root imply squared error (the final is mostly used). Better becoming fashions have smaller errors.
Step 4: Model prediction
- While predicted value are straightforward to do, the affect of illness state on value is extra complicated. The authors advocate the next:
For a one-part non-linear mannequin or a two-part mannequin, marginal results may be derived utilizing recycled prediction. It contains the next two steps: (1) run two situations throughout the goal inhabitants by setting the illness state of curiosity to be (a) current (e.g. recurrent most cancers) or (b) absent (e.g. no most cancers recurrence); (2) calculate the distinction in imply prices between the 2 situations. Standard errors of the imply distinction may be estimated utilizing bootstrapping.
The authors additionally present an illustrative instance making use of this strategy to modeling hospital value related to cardiovascular occasions within the UK. The authors additionally present the pattern code in R as nicely and you may obtain that right here.