Researches on Document Clustering
Clustering algorithms
- Agglomerative hierarchical clustering algorithm (AHC)
- Algorithm
1. Put each document in the collection into one cluster
2. Identify the two closet clusters and combine these two clusters as a new cluster
3. Repeat Step 2 until that the halting criteria arrive
- O(N2)
- K-Means algorithm
- Buckshot algorithm
- Fast, linear time algorithm
- A K-Means algorithm where the initial cluster centroids are created by applying AHC to a sample of the document in the collection