Data Mining – part 1

This model will be used by a real estate agency to help their clients understand what their house should sell for so they can make an educated decision about listing price. Secondarily, the model will be used by a home contractor. S/he would like to be able to tell clients the selling value of adding an additional bathroom.
Part 1 of the project involves the first three steps in the data mining process: sample, explore and modify. You will be preparing the data for model building, which will be done in Part 2 of the project. You will need to make decisions regarding data that is in text form, missing data, potentially incorrect data, the inclusion of potential outliers, binning strategy and variable transformation. Please make sure your decisions are justified. Note that the specific requirements and relative weights are outlined in the grade sheet.
After the data has been coded, compute descriptive statistics on all of the continuous variables. Briefly discuss.
After the data has been coded, compute the frequencies (including both count and %) of ALL of the categoricalvariables. Briefly discuss.
Run a correlation table. Discuss at least 3 correlations.
There is quite a bit of discussion in this assignment. Rather than putting the discussion in Excel, it is preferred that you prepare this assignment as a Word (or PDF) document and include the relevant Excel output as figures. You should submit both the Word (or PDF) file and the Excel file that contains your work. Please note that only the printed Word (or PDF) file will be graded. The Excel file will only be opened if necessary.
Different people will make different decisions which may ultimately impact the model they develop. While there are wrong things you could do (like using a 0 for all missing values for square footage), there is not one “right” answer. Make sure you document and justify the decisions you make. It is fine (perhaps even ideal) to note decisions that were made because this is an academic project. For example “Given the appropriate resources, I would have __ to get the missing values for ___. Lacking the resources for this option, I elected to ___ recognizing that this decision ___.”