Reading report of Robust continuous clustering

Movitivation

Clustering has been studied for decades, but the existing clustering algorithms still have many shortcomings, like having limited effectiveness in high dimensions and often requiring tuning parameters for different domains and datasets. In order to solve the above problems, the authors present a new clustering algorithm called RCC(Robust continuous clustering), which achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets.

Main Algorithm

RCC

The fundamental principle of the algorithm is to find the appropriate representatives U for the corresponding data point X. And we need to operate the optimization on U.
The RCC objective 1 has the following form:
image.png
The funtion ρ(∙) is a robust penalty function on the regularization terms, which is central to the method. There are two expectations on this term: (1)we want representative ui from the same latent cluster to collapse into a single point; (2) we want the penalty function to be robust enough. Therefore, the authors decided to use robust estimators. The weight wp,q and λ are the balance factors, and ε is the set of edges in the graph.

The second term in the objective is related to the mean shift objective. Later, the authors provide a reformulated objective 2.
image.png
The new objective adds an auxiliary variable lp,q for each connection in ε, and together optimize both the representative U and the line process L, where is a penalty on the connection. Why the authors use objective 2 is because its form enables efficient and scalable optimization by iterative solution of linear least-square systems, which yields a general approach, for three reasons:

  1. accommodate many robust nonconvex functions ρ.
  2. reduce clustering to the application of linear system solvers.
  3. easily scale to high dimensions and large datasets.

In the paper, ρ(y) uses the Geman-McClure estimater, and μ is a scale parameter:
image.png
And the corresponding penalty function that equals objective 1 and 2 is :
image.png
Later, the authors give the optimization of the above objectives, which I don’t introduce much here. When U are fixed, the optimal value of each lp,q is given by:
image.png
And when L are fixed, we can rewrite the expression to solve U, which turns into a linear least-squares problem:
image.png
The pseudo code of RCC is followed:
image.png

RCC-DR

In RCC, the dimensionality of the embedding U is the same as the dimensionality of the data X. And since RCC optimizes a continuous and differentiable objective, it can be used within end-to-end feature learning pipelines. In this section, the authors extend RCC to perform joint clustering and dimensionality reduction, called RCC-DR.
The final formulation of the RCC-DR objective is:
image.png

Where D∈R(Dxd) is a dictionary, zi∈R(Dxd) is a sparse code corresponding to the ith data sample, and ui∈R(d) is the low dimensional embedding of xi. The optimization is performed by block coordinate descent over U, Z, D, L1 and L2. The variables U are updated by solving linear system.
image.png
image.png
And the dictionary D and codes Z are initialized using PCA and updated by different methods.
image.png
image.png

Contribution

In the paper, the authors propose an algorthm called RCC, which optimizes a continuous objective based on robust estimation. And its end-to-end feature learning pipelines allow it to perform joint clustering and dimensionality reduction. This method achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets.

My Idea

Actually, I don’t quite understand about this paper. I don’t know how the authors comp up with such wonderful idea about robust estimators, and how can they smartly turn a complicated optimization problem into linear least-squares solvers. Maybe it will be better if we can get the code of the Algorithm to study.

Summary

The authors first give us the principle of RCC, and then extend it by to perform joint clustering and dimensionality reduction, called RCC-DR. And most of the thesis is talking about the objectives of the algorithm, and the optimization method they use. In the final part, they do a lot of experiments compared with RCC, RCC-DR and other 13 baselines about the results and running time on different datasets.