Blog
Data for this and subsequent assignments come from the Breast Cancer Care
Data for this and subsequent assignments come from the Breast Cancer Care in Chicago study, a population-based study of breast cancer patients. The final sample size was 989 eligible patients (415 non-Hispanic Black, 398 non-Hispanic White, and 176 Hispanic). The goal of this assignment is (once again) to examine whether lacking private health insurance influences mode of breast cancer detection, separately for non-Hispanic (NH) White, NH Black, and Hispanic patients. For this assignment, you will: (1) re-evaluate (and if needed) re-estimate your final model from Assignment 1 based on feedback, and (2) compare your multivariable model results to results based on adjustment methods using propensity scores and inverse probability of treatment weights.
This assignment should be written as if it were a draft manuscript submitted to a major epidemiology journal. You do not need to write an abstract, introduction, or a discussion section. You only need to write a methods and a results section and include the appropriate tables.
Hand in your assignment as a MS Word document.
Hand in your Stata .do file as a Stata .do file.
Do not convert these files to pdf before submitting.
Methods-section (50 points): Your final regression model should be corrected, based on any feedback from assignment 1.
20 points: Please describe your approach for building your final multivariable model for your outcome.
Your methods section should include the following information, in this order:
Steps involved in assessing and incorporating prior knowledge into your analysis, including determination of a-priori variables to include or exclude based on prior knowledge.
Description of your stratified analysis for confounding/selection/precision assessment and a brief description of the results of that analysis (how they inform your model-building approach).
Description of your main effects model building approach and a brief description of the results of that analysis.
Description of how you examined additional interactions and a brief description of the results of that analysis.
Description of how you examined potential inclusion of set-asides in your final model and a brief description of the results of that analysis.
10 points: Please describe your approach for building your propensity score (PS).
Your methods section should include the following information, in this order:
Details about your exposure model, including which generalized linear model form you used.
What variables were considered for inclusion in your exposure model, and why.
Decision criteria for retaining or removing these variables from your final exposure model.
Note: because you are interested in subgroup associations by ethnicity, you should include ethnicity and important interaction terms with ethnicity in your exposure model.
How you dealt with off-support and residual confounding issues;
Choose one approach for controlling for confounding via PS as a model covariate or stratification variable, and defend that approach compared to other approaches.
10 points: Please describe your approach for using your PS to control for confounding via weighting.
Your methods section should include the following information, in this order:
What target you chose to weight the distribution of your PS to, and why;
What type of generalized linear model you chose to use to estimate your weighted prevalence difference associations of interest, and why;
The specific independent variables terms used in the weighted model of your outcome.
10 points: Please include the following tables as part of your results.
Table 1. Prevalence differences for the association of lacking private health insurance with symptomatic detection, separately for NHW, NHB and Hispanic patients,
Using a traditional multivariable model for the outcome;
Using the PS-adjusted or stratified method of your choice;
Using the PS-weighted method of your choice.
Results (35 points)
Please include the following text as part of your results section.
10 points: A brief description of the prevalence difference results of your traditional multivariable outcome model (Table 2);
10 points: A brief description of the prevalence difference results of the PS-adjusted or stratified method of your choice (Table 2);
10 points: A brief description of the results of the PS-weighted method of your choice (Table 2).
5 points A brief description of sensitivity analyses conducted to determine the optimal truncation of weights to obtain minimally based, maximally precise prevalence difference estimates.
15 points: Your Stata .do file
Please submit a clean copy of your Stata code for this assignment. Use the template do file for assignment 2 that is included. Either start with this template or you can copy and paste relevant code from your working do file into the correct locations in this clean do file. Do not alter the top of the program. Save it as lastname_firstname_hw2.do and submit it along with your write up. Before you submit it, run it, and make sure that it runs completely from start to finish using the class dataset bccc as the starting dataset. Remove or comment out all extraneous code that produces extraneous output. Please include lots of white space and add clear comments describing what you are doing and why.
The appendix is worth 15 points. For example: code that is (1) clean, (2) in the correct order, (3) very well annotated, and (4) that runs from start to finish would receive 15 points; If any one of the above is not satisfactory, you would lose 10 points; if two are unsatisfactory you would receive 5 points, etc.
Table 1. Association of lacking private health insurance with prevalence of symptomatic breast cancer.
N
PD
(95% CI)
P-Value
Pδ4
Crude
NHW
NHB
Hispanic
Final Model1
NHW
NHB
Hispanic
PS-adjusted Model2
NHW
NHB
Hispanic
IPTW Model3
NHW
NHB
Hispanic
1 Final model adjusted for . . .
2 Propensity scores and inverse probability of treatment weights are based on a logistic regression of lacking private health insurance on . . .
3 Analyses weighted to the distribution of propensity in . . .
4 Assumes a Null Hypothesis Interval of (-0.05, 0.05) and a support interval equal to the 95% CI.

