Identify and discuss the concepts of data mining and knowledge discovery.

Question description

Learning Objectives:

1. Apply various data mining methods and tools for knowledge discovery and decision-

making.

2. Critique and interpret the results of data mining experiments.

3. Compare data mining algorithms and their performance.

4. Identify and discuss the concepts of data mining and knowledge discovery.

In this assignment, you are required to do the following tasks:

1. Download student performance data set from SULMS. In one paragraph, describe this

dataset.

2. Inspect this dataset and identify if any outliers exist. Justify your answer. [5 marks]

3. Create a plot showing

a. Average study time Male vs Female. [10 marks]

b. Average travel time Male vs Female. [10 marks]

c. Average grade Male vs Female. [10 marks]

d. Average failure Male vs Female. [10 marks]

e. Total pass and fail. The chart should show total number of students who has

pass grade and fail grade. Fail grad is < 11 else pass. [10 marks]

f. Create a scatter plot to show if there is a correlation between the age and grade.

Summarize your findings. [ 10 marks]

4. Create a classification model to classify students into two groups (fail and pass). ( use weka )

Compare between at least three different classifiers. [10 marks]

5. Create testing set to evaluate your classifiers. [ 5 marks] (use weka )

6. Provide a detailed analysis about your results. ( use weka )

a. Describe each classifier you used. Not more than one paragraph per classifier.

[9 marks]

b. Describe which classifier performed the best. Justify your answer. Explain the

weakness and strength of the classifiers you selected for this type of data. [6

marks]

7. Write a conclusion (at most 2 paragraphs) summarizing the most important findings of

the assignment; in particular, address the data analysis and results obtained from your

prediction. [5 marks]