# Determining the number of significant factors in Principal Component Analysis (PCA)

## Location

PCA in data analysis is a denoising method that is very widely used in

many fields such as computational biology, demography, financial data

and elsewhere. The main idea, going back to the nineties and earlier, is

to use the singular value decomposition of the empirical covariance of

the data and then to split it into its "significant" factors plus a

residual. As the number of significant factors increases the residual

should look more and more like a purely random matrix, using the

Marchenko-Pastur (1967) law as a criterion. There are many reasons why

this rather simple idea is too simple and a better algorithm is needed.

This is what will be discussed and a method allowing for correlations in

the residual will be presented and then used with both equity returns

data and with implied volatility surfaces data. This is joint work with

G. Bonnell, B. Healy and A. Papanicolaou.