Inventors:
Nina Mishra - San Ramon CA, US
Liadan O'Callaghan - Mountain View CA, US
Sudipto Guha - Chatham NJ, US
Rajeev Motwani - Palo Alto CA, US
International Classification:
G06F015/00
Abstract:
A technique that uses a weighted divide and conquer approach for clustering a set S of n data points to find k final centers. The technique comprises 1) partitioning the set S into P disjoint pieces S, . . . , S; 2) for each piece S, determining a set Dof k intermediate centers; 3) assigning each data point in each piece Sto the nearest one of the k intermediate centers; 4) weighting each of the k intermediate centers in each set Dby the number of points in the corresponding piece Sassigned to that center; and 5) clustering the weighted intermediate centers together to find said k final centers, the clustering performed using a specific error metric and a clustering method A.