You will use R to mine actual data for a problem of interest. These could be data from a problem from your current job if you have one, something of interest to the School of Management or College, data acquired from the web, etc. (there are suggestions as to places where you can find relevant data on the electronic reading list for this course). You will design the data mining task, mine the data, and describe your results. You also will research existing solutions to the problem, if any have been proposed or documented. Your own data and results need not be on a par with actual industry results; the goal is for you to get as realistic a hands-on experience as possible, given the constraints of what you have learned.
You should use the CRISP-DM data mining process to structure your research and report. Keep in mind that it may be ineffective simply to proceed linearly through the steps, and this may need to be reflected in your analysis. You should interact with me from the preparation of your initial ideas through to the preparation of your report, as a consultant would interact with a firm or funding source in preparing a research report. Use your imagination, prior experience, or ask for help to fill in any gaps between the material available and what you would be able to find out if you actually could interact with the client firm.
• Identify, define, and motivate the business problem that you are addressing.
• How (precisely) will a data mining solution address the business problem?
(NB: I’d like to see a good definition/motivation of the business problem and a precise statement of how a data mining solution will address the problem. It’s not so important that the hands-on results match perfectly. It’s more important that you have the experience of working through a realistic problem definition.)
• Identify and describe the data (and data sources) that will support data mining to address the business problem. Include those aspects of the data that we talk about in class and/or in the quizzes.
• Specify how these data are integrated to produce the format required for data mining.
(NB: data preparation can be time consuming. Get started early. Talk to me if you need advice.)
• Specify the type of model(s) built and/or patterns mined.
• Discuss choices for data mining algorithm: what are alternatives, and what are the pros and cons?
• Discuss why and how this model should “solve” the business problem (i.e., improve along some dimension of interest to the firm).
• Discuss how the result of the data mining is/should be evaluated. How should a business case be developed to project expected improvement? ROI? If this is impossible/very difficult, explain why and identify any viable alternatives.
• Discuss how the result of the data mining will be deployed.
• Discuss any issues the firm should be aware of regarding deployment.
• Are there important ethical considerations?
• Identify the risks associated with your proposed plan and how you would mitigate them.
The RStudio script following the data set should be under Logistic Regression and Clustering. Link: https://archive.ics.uci.edu/ml/datasets/wholesale+customers
Currently 0 writers are viewing this order