Determining the number of significant factors in Principal Component Analysis (PCA)
PCA in data analysis is a denoising method that is very widely used in
many fields such as computational biology, demography, financial data
and elsewhere. The main idea, going back to the nineties and earlier, is
to use the singular value decomposition of the empirical covariance of
the data and then to split it into its "significant" factors plus a
residual. As the number of significant factors increases the residual
should look more and more like a purely random matrix, using the
Marchenko-Pastur (1967) law as a criterion. There are many reasons why
this rather simple idea is too simple and a better algorithm is needed.
This is what will be discussed and a method allowing for correlations in
the residual will be presented and then used with both equity returns
data and with implied volatility surfaces data. This is joint work with
G. Bonnell, B. Healy and A. Papanicolaou.