Input:
- Dataset: {x_1, x_2, ..., x_N}, where x_i ∈ ℝ^d
- Number of clusters: K
- Convergence threshold: ε (e.g., 10^-4)
Output:
- K cluster centroids {μ_1, μ_2, ..., μ_K} and cluster assignments {C_1, C_2, ..., C_K}
Steps:
1. (initialize centroids):
- Randomly select K data points as the initial centroids {μ_1, μ_2, ..., μ_K}.
2. (repeat until convergence):
| a. (assign points to nearest centroid):
| - For each data point x_i:
| - Assign x_i to the cluster C_k where k = argmin_j ||x_i - μ_j||^2.
| b. (update centroids):
| - For each cluster C_k, recompute the centroid:
| - Set μ_k = (1 / |C_k|) Σ_{x_i ∈ C_k} x_i.
| c. Check for convergence:
| - Compute the change in centroids Δ = Σ_k ||μ_k^new - μ_k^old||.
| - If Δ < ε, stop the iteration.
3. (return the final centroids and cluster assignments)