Description
Project2:
Praveen Wadikar pwadikar@asu.edu
Abstract—Understand how to implement k-means
Keywords: Statistical machine learning, k-means;
I. TASK AND DATASET
you are required to implement the K-means algorithm and apply your implementation on the given dataset (AllSamples.npy), which contains a set of 2-D points. You are required to implement the following strategy for choosing the initial cluster centers.
A. Algorithm
• 1.Randomly choose clusters
• 2a.Assign labels based on closest center
• 2b.Find new centers from means of points
• 3.Repeat 2a and 2b until convergence (no change in mean)
II. ANALYSIS
The figure shows that the Initial means (brown triangles) were selected randomly. Later the means converged to points marked with (x). As seen the yellow dots converged around a mean inbetween the clusters. However for k=5 mean which lies between two clusters now classifies data. If we increase K then this convergence of class increases. However increasing k values beyond a point causes small improvement in the loss.
This should be the cue for maximum value of k
This is show in figure3
Fig. 1. Plot of the clustered data with k=3
***
III. RESULTS
Fig. 2. Plot of the clustered data with k=5
check for k=3
[ [ 2.61946868 5.96519477] [ 6.49724962 7.52297293]
[ 5.55524182 2.18980958]]
21944.1174733 check for k=5
[ [ 6.57957643 7.57333595]
[ 7.51004923 2.29128354] [ 2.8337661 6.9189569 ] [ 2.25525758 2.94637009] [ 4.26556254 2.36877056]] k=5 loss 2693.53143865
Fig. 3. Plot of the clustered data with k=3,5,8
def c a l c u l a t e d i s t a n c e (u , x ) :
itshape#itshape p r i n t itshape(itshapeuitshape) itshape#itshape p r i n t itshape(itshapexitshape) x1 = u[0]−x [0] y1 = u[1]−x [1] d = ( x1**2 + y1 **2) **1/2 return d
def find cl (X, n clusters , centers ) :
centroids = centers
a =[]
l a b e l s = [ ]
IV. CODE
2
centers , l a b e l s = find cl ( data , 5 , i point2 )
print ( centers ) loss = find loss ( data ,4 , labels , i point2 ) print ( ’k=5 loss ’ , loss )
itshape#itshapeforitshape itshapeiitshape itshapein itshape itshaperangeitshape(itshapelenitshape( itshapeXitshape)itshape)itshape: itshape#itshape itshape itshape l a b e l s itshape . itshapeappenditshape(0)
itshape#itshape p r i n t itshape( ’itshape l a b e l itshape ’ , itshapelenitshape(itshape l a b e l s itshape)itshape)
itshape#itshape p r i n t itshape(itshape l a b e l s itshape [ 0 ] ) while True :
for i in range ( len (X) ) : for j in range ( n c l u s t e r s ) :
a . append ( c a l c u l a t e d i s t a n c e (X[ i ] , centroids [ j ] ) )
arr = numpy . array ( a ) r e s u l t = numpy . where ( arr == numpy . amin ( arr ) )
l a b e l s . append ( r e s u l t [ 0 ] [ 0 ] ) a . c l e a r ( )
labels new = np . array ( l a b e l s )
itshape#itshape p r i n t itshape(itshapelabels newitshape) new centroids = np . array ( [X[ labels new == i ] . mean (0)
for i in range ( n c l u s t e r s ) ] )
i f np . all ( centroids == new centroids ) :
break
l a b e l s . c l e a r ( ) centroids = new centroids return centroids , l a b e l s
def find loss (X, n clusters , labels , centroids ) : loss = 0 for j in range ( len ( centroids ) ) :
for i in range ( len (X) ) : i f ( l a b e l s [ i ] == j ) :
loss = loss + ( c a l c u l a t e d i s t a n c e (X[ i ] , centroids [ j ] ) ) **2
return loss print ( ’ check for k=3 ’ ) centers , l a b e l s = find cl ( data , 3 , i point1 )
print ( centers )
loss = find loss ( data ,3 , labels , i point1 ) print ( loss ) print ( ’ check for k=5 ’ )




Reviews
There are no reviews yet.