You just started working for a real estate company and they are looking to make a huge investment into the growing Nashville area. They’ve acquired a dataset about recent sales and want you to build a model to help them accurately find the best value deals when they go to visit next week. There is a concern that houses are going over their asking price and this dataset will help us observe that. Hint: You will have to create the dependent variable to understand whether it is over/under price (you can have multiple categories but remember the limitations of logistic vs decision tree type models).
Use proper data cleansing techniques to ensure that you have the highest quality data to model this problem. Detail your process and discuss the decisions you made to clean the data.
Build a logistic regression model to accurately identify overpricing/underpricing and determine what is driving those prices.
Build a decision tree model and compare the results with the results of the previous model.
Build a Random Forest model and compare the results with the results of the previous models.
Build a Gradient Boost model and compare the results with the results of the previous models.
Use multiple benchmarking metrics to compare and contrast the three models. Based on your findings, provide evidence of which model you believe the real estate company should use and what are the key variables to focus on to drive value of the houses they should be targeting.