CSC 5800 Intelligent Systems Homework 4

CSC 5800 : Intelligent Systems 

Homework 4 

Due Date: November 11 th , 2015 

Total: 100 Points 

Problem 1. Baye’s Theorem (10 Points; 2 + 3 + 5) 

Suppose the fraction of undergraduate students who smoke is 15% and the fraction of 

graduate students who smoke is 23%. It is also given that one-fifth of the college students 

are graduate students and the rest are undergraduates. 

(a) What is the probability that a student who smokes is a graduate student? 

(b) Is a randomly chosen smoker more likely to be a graduate or undergraduate student? 

(c) Suppose 30% of the graduate students live in a dorm but only 10% of the 

undergraduate students live in a dorm. If a student smokes and lives in the dorm, is he or 

she more likely to be a graduate or undergraduate student? You can assume independence 

between students who live in a dorm and those who smoke. 

Problem 2. Bayesian Classification (15 Points; 2 + 4 + 3 + 4 + 2) 

Consider the data set shown in the following Table: 

(a) Estimate the conditional probabilities for P(A|+), P(B|+), P(C|+), P(A|−), P(B|−), 

and P(C|−). 

(b) Use these estimates of conditional probabilities to predict the class label for a test 

sample (A = 0,B = 1, C = 0) using the naive Bayes approach. 

(c) Estimate the conditional probabilities using the m-estimate approach, with p = 1/2 

and m = 4. 

(d) Repeat part (b) using the conditional probabilities given in part (c). 

(e) Compare the two methods for estimating probabilities. Which method is better 

and why?

Problem 3. K-means Clustering 

(10 Points) 

Consider the following six points (with (x,y) representing location in a 2D space) and let 

us try to group them into three clusters. 

The distance function is Euclidean distance. Suppose initially we assign A1, B1, and C1 

as the center of each cluster. Using the k-means algorithm show the: 

(i) Cluster assignment of each data point after the first iteration 

(ii) Centroids after the first iteration 

Problem 4. K-means Clustering (20 Points; 8 + 12) 

PART I. Consider the following one-dimensional dataset {1,2,3,5,9}. Perform k-means 

algorithm with 2 clusters and initial centroids are 0 and 9. Compute the following: (i) 

Final centroids (ii) Cohesion (iii) Separation. 

PART II. Consider the following set of one-dimensional data points: {0.1, 0.2, 0.4, 0.5, 

0.6, 0.8, 0.9}. (i) Suppose we apply k-means clustering to obtain three clusters, A, B, and 

C. If the initial centroids are located at {0, 0.25, 0.6}, respectively, show the cluster 

assignments and locations of the centroids after the first three iterations. Compute the 

SSE of the k-means solution (after 3 iterations). (ii) Apply bisecting k-means (with k=3) 

on the data. First, apply k-means on the data with k=2 using initial centroids located at 

{0.1,0.9} Next, compute the SSE for each cluster (make sure you indicate the SSE values 

in your answer). Choose the cluster with larger SSE value and split it further into 2 subclusters. 

You can choose the two data points with the smallest and largest values as your 

initial centroids. For example, if the cluster to be split contains data points (0.20, 0.40, 

0.60, and 0.80), then the centroids should be initialized to 0.20 and 0.80. Show the 

clustering solution produced obtained applying bisecting k-means. 

Problem 5. WEKA – K-means Clustering 

(10 Points) 

Load iris.arff file into Weka. Click on the Cluster tab and choose “SimpleKMeans” 

algorithm for clustering and set “numClusters” to 3. Select “Classes to cluster evaluation” 

and click on the “Ignore attributes” and select “class”. Start the clustering. 

(a) How many instances were clustered incorrectly? Provide the confusion matrix. 

(b) How many instances are in cluster2? How many of these instances were 

incorrectly clustered and which cluster they should belong to? 

(c) Right-click on the result list and click on “visualize cluster assignments”. Set the 

x-axis to instance_numbr and y-axis to sepallength. Change the color to class. 

Which type of iris flower has all instances clustered correctly? 

Problem 6. Hierarchical Clustering (20 Points; 5 + 7 + 8) 

(a) Perform Hierarchical clustering (single Linkage) on the following one-dimensional 

dataset {0.1, 1, 1.7, 3.4, 3.9, 4.7} (i) If we want to obtain two clusters, show the cluster 

membership for each datapoint. (ii) Draw the Dendrogram. 

(b) Consider the following four data points and use the cosine similarity measure and 

perform the hierarchical clustering using the single linkage clustering algorithm. Give the 

proximity matrix and draw the corresponding Dendrogram obtained after clustering.

A: (0 2 0 0); B. (2 0 1 2); C: (2 1 0 2); D: (2 2 1 0) 

(c) Use the similarity matrix in the following Table to perform single and complete 

linkage hierarchical clustering. Show your results by drawing a dendrogram that will 

clearly show the order in which the points are merged. Also, give the updated similarity 

matrix after each merge. 

Problem 7. DBSCAN Clustering 

(15 Points) 

Consider the data set shown in Figure 3. Suppose we apply DBSCAN algorithm with 

Eps=0.15 (in Euclidean distance) and MinPts = 3. 

List all the core points in the diagram (you can use the labels of the data points in the 

diagram). Note: a point is considered a core point if there are more than MinPts number 

of points (including the point itself) within a neighborhood of radius Eps. List all the 

border points in the diagram. List all the noise points in the diagram.

CSC 5800 Intelligent Systems Homework 4

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?