Principal Components Regression

Detta är en Magister-uppsats från Lunds universitet/Statistiska institutionen

Sammanfattning: "Principal Components Regression" (PCREG), also sometimes called "biased regression methods" is used to model data with highly correlated explanatory variables. This well-known method has been discussed by multiple previous authors (Marquardt 1970, Hoerl and Kennard 1970, Lott 1973, Hawkins 1973, Webster, Grace and Mason 1974, Hoerl, Kennard and Baldwin 1975, Marquardt and Snee 1975, Hocking, Speed and Lynn 1976 Smith and Campbell 1980, Park 1981). In recent decades, when modeling data with highly correlated explanatory variables, statisticians and researchers have frequently used multiple regression models that are related to the "Partial Least Squares regression (PLSREG) and SIMPLE algorithm to avoid multicollinearity in the data. PCREG uses the estimated components in the same way as PLSREG to estimate the regression coefficients. Both of these biased regression methods are discussed in detail by Wold 1966, Horel and Kennard 1970, Dijkstra 1983, 1985, de Jong 1993 SIMPLE algorithm, gutters et al. 1994, van der Voet 1994. Most available software; however, only gives users the opportunity to run PLSREG analyses. Examples include SAS, Minitab, Statistica and SPSS (although SPSS require that a special component be downloaded and installed to run PLSREG analyses). All these programs are under development and each provides slightly different output. All programs that run PLSREG analyses automatically rank principal components in descending order of eigenvalues. One purpose of this essay is to clarify why this descending order may not always be the most useful order in all kinds of research. A second purpose is to show that it is possible to use a newly developed program that runs PCREG in SAS to choose components on the basis of their weight and importance in relation to the explanatory variables. Additionally, the general theory of PCREG and “ridge” regression (RREG) will be discussed. Two data sets (one large and one small) will be used to compare the results of analyses run using PCREG, PLSREG, and RREG. Two data sets were necessary because the more variables in a dataset, the more difficult it is to choose how many estimated components must be used in the analysis and the more potentially misleading the automatic eigenvalues choice of the existing software programs can be. The PCREG analyses of the two data sets (run using PROC IML in SAS 9.2) will include an estimation of the regression coefficients, standard errors of the coefficients, and t-test and (p) values. Interpretation of regression coefficients in PCREG is done in exactly the same way as in regression analysis. The numerical results of the PCREG analyses suggest that the PCREG algorithm used to write the new program can also be used to improve PLSREG estimates based on rotated principal components. Although this work has been motivated by the problem of analyzing quantitative variables in socio-economic and demographic data sets, the method is also applicable to problems that involve estimating regression coefficients that are based on highly correlated quantitative explanatory variables. Thus, the method is probably relevant to a wide range of problems in the physical, chemical, medical, financial, and technical sciences.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)