[edit]

# Principal Component Regression with Semirandom Observations via Matrix Completion

*Proceedings of The 24th International Conference on Artificial Intelligence and Statistics*, PMLR 130:2665-2673, 2021.

#### Abstract

Principal Component Regression (PCR) is a popular method for prediction from data, and is one way to address the so-called multi-collinearity problem in regression. It was shown recently that algorithms for PCR such as hard singular value thresholding (HSVT) are also quite robust, in that they can handle data that has missing or noisy covariates. However, such spectral approaches require strong distributional assumptions on which entries are observed. Specifically, every covariate is assumed to be observed with probability (exactly) $p$, for some value of $p$. Our goal in this work is to weaken this requirement, and as a step towards this, we study a “semi-random” model. In this model, every covariate is revealed with probability $p$, and then an adversary comes in and reveals additional covariates. While the model seems intuitively easier, it is well known that algorithms such as HSVT perform poorly. Our approach is based on studying the closely related problem of Noisy Matrix Completion in a semi-random setting. By considering a new semidefinite programming relaxation, we develop new guarantees for matrix completion, which is our core technical contribution.