Abstract
In many domains there will exist different representations or “views” describing the same set of objects. Taken alone, these views will often be deficient or incomplete. Therefore a key problem for exploratory data analysis is the integration of multiple views to discover the underlying structures in a domain. This problem is made more difficult when disagreement exists between views. We introduce a new unsupervised algorithm for combining information from related views, using a late integration strategy. Combination is performed by applying an approach based on matrix factorization to group related clusters produced on individual views. This yields a projection of the original clusters in the form of a new set of “meta-clusters” covering the entire domain. We also provide a novel model selection strategy for identifying the correct number of meta-clusters. Evaluations performed on a number of multi-view text clustering problems demonstrate the effectiveness of the algorithm.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Download to read the full chapter text
Chapter PDF
References
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. 11th Annual Conference on Computational learning theory, pp. 92–100 (1998)
Bickel, S., Scheffer, T.: Multi-view clustering. In: Proc. 4th IEEE International Conference on Data Mining, pp. 19–26 (2004)
Berthold, M., Patterson, D.: Towards learning in parallel universes. In: Proc. 2004 IEEE International Conference on Fuzzy Systems, vol. 1 (2004)
Pavlidis, P., Weston, J., Cai, J., Noble, W.: Learning Gene Functional Classifications from Multiple Data Types. Journal of Computational Biology 9(2), 401–411 (2002)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. JMLR 3, 583–617 (2002)
Jain, A.K., Fred, A.: Data clustering using evidence accumulation. In: Proc. 16th International Conference on Pattern Recognition., vol. 4, pp. 276–280 (2002)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 2096(1), 359–392 (1998)
Dimitriadou, E., Weingessel, A., Hornik, K.: A combination scheme for fuzzy clustering. International Journal of Pattern Recognition and Artificial Intelligence 16(7), 901–912 (2002)
de Sa, V.: Spectral clustering with two views. In: ICML Workshop on Learning With Multiple Views (2005)
Zeng, E., Yang, C., Li, T., Narasimhan, G.: On the Effectiveness of Constraints Sets in Clustering Genes. In: Proc. 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2007), pp. 79–86 (2007)
Greene, D., Cunningham, P.: Producing accurate interpretable clusters from high-dimensional data. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 486–494. Springer, Heidelberg (2005)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: A head start for non-negative matrix factorization. Pattern Recognition (2008)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification, 193–218 (1985)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proc. 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551–556 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Greene, D., Cunningham, P. (2009). A Matrix Factorization Approach for Integrating Multiple Data Views. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-04180-8_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)