Uncategorized

STA 9708: Statistics Project

 

Instructions: 

  • Write in the first person.
  • Share your thought-processes, miscues, work-arounds, and insights.
  • Ground your work in the concepts and approaches taken in class and lecture notes.
  • Use Microsoft Word and Excel because these are dominant tools of business; handwritten work inserted into Word is acceptable where reasonable.
  • See LN12A and the Inclass LN12 Excel files as examples of
    • (i) writing in the first person;
    • (ii) expressing thought-processes, miscues, work-arounds, and insights;
    • (iii) presenting the project in Microsoft Word, including tables, graphics, and summaries of Microsoft Excel work;
    • (iv) using an Excel file to show details of computations.
  • Sharing your work with anyone else is cheating: Don’t do it.
  • Do not plagiarize.

 

 

Required for all students:  I(a), II(a), III(a), and IV(a)

 

Required for students trying to get an A- or A:   I(b), I(c), II(b), III(b), and IV(b)

 

Explanation:  I have never before employed the above division and I do not mean to discourage any student from working on all the problems.  My intention is to create a “reduced anxiety plateau” in light of the exceptional challenges this semester has presented, including the wide range of impacts on individuals.

 

I(a) Monopoly (see LN3 & LN4 materials)         

Details about Monopoly are given in photos on pages 2, 3, and 5.

 

(i)  Explain in your own words how the probability distribution of the sum of two fair dice is computed.  Write this up as if you are explaining it to friend not in this class. As an experiment, you could try explaining to another person how to find the probability of getting a 5, and then report and reflect on what happened when you asked them to show how to compute the probability of getting a 10.

 

(ii) Select your own starting position on the Monopoly board (not one used earlier) and place at least one house and one hotel within the range of your roll, where “a roll” is the sum of two dice.  The rent you must pay is shown on the Deed cards. Assume you do not own any of the properties in that neighborhood.  For simplicity, here, count landing on Chance or on Community Chest to result in a $0 outcome.

 

(iii) Let the term payout denote the amount of money you pay on the next roll.  Construct the payout distribution by roll.  Display that in four columns:  (a) the roll – values 2 to 12, (b) the property name, (c) probability of the roll, and (d) the payout.

 

(iv) Show how the expected value and variance of the payout is computed from the payout probability distribution.

 

(v) Suppose a player could choose to buy insurance for the payout of the their next roll.  Explain how expected value is related to the pricing of that insurance.  Discuss in the context of your particular gamble.

 

(vi) How is the variability of the payout related to the pricing of the above insurance?  Be specific to your setting.

 

I(b) Monopoly

 

Pick a starting place within reach of a Chance square.

 

(i) Find the probability that on the next roll you will end up in Jail or pass Go.  Include the possibilities presented by the Chance cards, some of which will send you to jail or cause you to pass Go.

 

(ii) For one of your Chance cards, explain how conditional probability is involved in finding the probability that your roll results in getting that card.  Define your terms.

 

I(c) Monopoly Simulation

(I am presenting a drastic rewrite of this question.  If you made substantial progress on the original version, feel free to stick with it.)

Rewrite

One of the best pieces of advice in mathematics is this:  If you cannot solve the problem as given, make it simpler and simpler until you can solve that modified problem.  Then try to work back up. That may be ancient wisdom, or it may one of the many excellent tips from George Polya, a great mathematician and a great teacher.

In the spirit of that advice, I am offering a rewrite of the Monopoly simulation question.  This will be step-by-step approach.

To the above simplification principle, I will add an equally important principle: first write a simulation concerning a question to which you already know the answer.  That allows for trouble-shooting.  In our case, you already computed the expected value of a Monopoly starting position in I(a).  That expected value will give you a target to shoot at with the sample average of your simulations, below.   I will now walk you towards writing a simulation of your I(a) game.

I have been working in that direction in class during the past week:  I intend to turn that material into a lecture note simulation guide, written entirely in Excel, using our original in-class Monopoly example as the set-up.

 

(i) We can simulate rolls of a single fair die with the Excel command

=Roundup(6*Rand(),0)

Pull that command apart and explain how it works.

 

(ii) Let a “roll” denote the sum of the roll of two fair dice.  How would a roll be simulated?

 

(iii) Your goal now is to simulate outcomes from the starting position you employed in Question I(a). Using Excel’s index function (or vlookup if you prefer), show how you can map simulated rolls to rents.  What should now happen is that a simulated roll lands you on a property and you function displays the rent for that property!  This does not happen by itself – trial and error is required.

 

(iv) Fill in a 100 rows with the simulation from above.  Compute their sample average in an Excel cell.  Using Multiple Spincycle, write that cell value to 1000 rows.  Compute the grand average of those 1,000 rows (which summarizes 100,000 trials).  How close does this come to value you computed in I(a) for the expected value?  Would you get the same value if you repeated this simulation?  Do we have a name for that kind of variation?

 

(v) Now, you have a Monoply simulation platform with which you can tackle expected value problems that are too hard to work through by theory. Tackle a challenge of your choice.  For example, you try to introduce the doubles rule, simplified as needed to start.  Or, introduce Chance cards starting with just one “motion” card, such as Advance to Go.  Or, a player wants to buy insurance for the next cycle around the board:  given the current layout of ownership, houses, and hotels, use simulation to estimate a fair price for the insurance.

 

 

  1. Forecast Intervals for Daily Stock Returns (based on LN6)

 

II(a) Use data from 1-1-2006 to the present of the daily returns for your stock.

 

(i) Construct a moving window of length n=30 from which you compute a 95% prediction interval for the 31st  day.  I am posting a template into which you can paste your data if you don’t have time to work out the details yourself.

 

(ii) Compute that running forecast interval across all the data.  Show a graph of the time series and the upper and lower 95% prediction limits across time, as in the template.

 

(iii) Compute the coverage rate.  Does it look to be reasonably close to 95%?

 

 

 

II(b) It must be that the prediction interval will take time to respond to a sudden increase in risk or to a sudden decrease in risk.

 

(i) Using the squared forecast error, explore that topic graphically and in terms of a mean-squared forecast error.  Do you find evidence that increases in risk happen more suddenly than decreases?

 

(ii) If you reduce or increase the time span from 30 days, what happens to the performance of the forecasts?

 

(iii) Can you improve performance using different time spans for the sample average and the sample standard deviation?

 

 

III(a) Two-Sample t-test (based on LN8, Sections 8.1 and 8.2)

 

(i) Create your own data set consisting of two different samples, drawn either from Census data or from Ellis Island immigration data, that contains numeric values – such as age.  Show details of the sourcing and context.

 

(ii) State the null and alternate hypotheses in words that apply to the particular topic you are addressing.  Define the populations you are referencing, their approximate sizes, and how the population averages would (in principle) be computed.  Perform a two-sided, two-sample t-test.  In so far as common sense allows, explain what you are doing as you go.

 

(iii) Show the graph of the test with all features shown and labeled.

 

(iv) State the conclusion of the test and the grounds.

 

 

III(b) Do a second two-sample test, using hypotheses on population proportions rather than population averages.  That material is covered in Lecture Note 12.

 

(i) For a two-sample test for proportions, develop a one-sided research hypothesis.

 

(ii) Explain why it is crucial to distinguish between the sample proportions and the population proportions.

 

(iii) Systematically walk the reader through the hypothesis test in the context of your data.  Explain under what conditions the null hypothesis will be rejected in terms of your sample proportions.  Explain how the mechanics of the test provides the answer to the question, “How far is far?” in the context of your setting.

 

(iv) Graph the test, labeling all relevant portions.

 

(v) State the probabilistic meaning of your p-value, referencing (a) the numeric value of your p-value, (b) the value of test-statistic found, and (c) the relevant area in the graph of the test.

 

(vi) Go to MathCracker website and run the appropriate program to check your z-test work.

  

IV(a) (See LN9 and LN10). A random sample of 31 bags of Pretzel M&M’s was taken from a production run of 500,000 bags.  Treat that production run as a population.  The number of Pretzel M&M’s (Count) and the net weight (NetWt) were recorded for each of the 31 bags and are shown on the right, together with the column Totals.  NetWt (y) was regressed on Count (x) and the Excel output is shown below.

 

(i) Give a 95% prediction interval for the net weight of a bag with 13 Pretzel M&M’s.

 

(ii) State the null and alternate hypotheses in words for the usual slope test.

 

(iii) State your conclusion for the slope test and on what grounds you could justify that conclusion.

 

(iv) A mischievous child is given two additional bags of Pretzel M&M’s, one containing 13 M&M’s and the other 16.  She randomly selects and eats three M&M’s from the bag with 16, so both bags now have 13 M&M’s.  Both bags are now weighed: one of the bags is found to have net weight of 34.1 grams; the other of 31.5 grams.  Which of those bags do you think originally contained 16 M&M’s?  Explain your reasoning and estimate the original net weight of the bag that initially held 16.

 

IV(b) Continuation of IV(a)

 

Explain what is being estimated by the standard error of the slope, above, starting with the following thought:  “There are zillions of possible samples of 31 bags that could be taken from the production run of 500,000 bags.  Each sample would yield its own …”