Reading report of Multi-view Intact Space Learning

Introduction

Most of the objects have different views, so it is suitable to assume that an individual view is more likely to be insufficient for our learning. Therefore, integration of multi-view information is necessary and useful. In the paper, the authors propose the Multi-view Intact Space Learning(MISL) algorithm to integrate the information from multiple views.

Exisiting multi-view learning methods have some limitations: the issue of single-view insufficiency is not clearly addressed, fail to discover latent intact spaces, no theoratical supports to guarantee their performance.

Main Algorithm

The authors assume “view insufficiency” that each view only captures partial information. In multi-view learning, each view zv is a particular reflection of the object x, and we can obtain the view generation function fv on x:
image.png
And fv(x)=Wvx, since the function is non-invertible, Wv is not column full-rank. The object is to learn a series of view generation functions 〖{Wv}〗_(v=1)^mto generate multi-view data points from a latent intact space. It is clear to see that a straightforward approach is to minimize the empirical risk using the L1 or L2 loss. But the authors found that both two losses were not robust to outliers, therefore they deploy Cauchy estimator in MISL:
image.png
And the upper bounded influence function is:
image.png

MISL

Using the Cauchy loss and regularization terms, the resulting objective function is:
image.png
Given an input of multiple insuffiency views, the latent intact space and view generation funtions can be found. And we can infer the new data point x with optimal Wv. When the view space Z is in an infinite-dimensional space, we need to extense the kernal:
image.png
(not quite understand)

Optimization

The authors develop an IRR algorithm to optimize the sub-problems over the view generation function and the latent intact space X. The process is simple. Take finding the latent intact space X as an example.

First, we given fixed view generation functions 〖{Wv}〗_(v=1)^m. After the simplification of the objective, setting the gradient of the expression with respect to x to 0, and then using the residual of the example x on each view, we can gain weight function. Finally, we have the resulting formula:
image.png
The calculation of W is almost the same. Following is the pseudo code of IRR algorithm:
image.png

Theoretical Analysis

In this section, the authors give three analysis: convergence analysis, view sufficiency analysis and the generalization error analysis. Convergence analysis mainly analyze the convergence of IRR algorithm, and I will focus on the latter two analysis.

View insufficiency analysis

The authors use the information theory to explain the view insufficiency assumption, which can be described as:
image.png
From proposition 1, we can know:
image.png
which is very clear. And we can also obtain that more views will bring in more information. And later the authors use an information theoretical framwork developed by Sridharan and Kakade to analyze multi-view learning algorithms.

Generalization Error Analysis

In this section, the authors propose a new definition of multi-view stability.
image.png
And this stability has an implicit connection with robustness. Besides, the authors use Rademacher complexity. Finally, they give the calculated multi-view stability and Rademacher complexity of their algorithm, which is also the generalization error bound of MISL.

My Idea

To my best knowledge, the objective function of MISL algorithm is a bit simple compared with other papers (Although I do not quite understand about the kernel extension part). In the paper, the authors seem not to clearly explain about the view generation function W, which I think needed consideration. Maybe there is better way to give the clear definition of this part. And I think we can draw lessons from the heterogeneous network embedding.

Summary

The authors first introduce their proposed algorithm and their optimization method, and then give three kinds of analysis of the algorithms, including convergence analysis, view sufficiency analysis and the generalization error analysis. Finally, comes the experiment part.