![]() The data contains influential outliers.For example, if the data has two roughly identical peaks, (a bimodal data set), the mean model will estimate a point inside the valley in between the two peaks which is not a good representative of the data. it has more than one highly frequently occurring value. Such is often the case in insurance claims or healthcare costs data where most claims are small valued but the data has a long tail of claims of increasing value. The data are highly skewed to the left or to the right.Here are three examples of data sets for which the mean may not be the appropriate statistic to estimate: In all cases, we are asking the model to estimate the conditional mean.īut in some data sets, the mean is not a good exemplar of the data. ![]() The red dots are the estimated conditional mean prices while the gold dots are the observed conditional mean prices (Image by Author)įor other kinds of data, a more complex model such as a Poisson or a Cox Proportional Hazards model may be employed. The trend line of the trained OLS model plotted on top of the raw data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |