Question
发布时间:2014-08-31 11:42:06
发布时间:2014-08-31 11:42:06
Please answer the following FOUR questions. There is a word limit of 500 words per question. You can include selected SPSS output where appropriate - this is not included in the word limit.
The file WeD_Peru_Nonfood.sav contains variables that can be used to predict the percentage people spend on non-food items. The data is from 219 participants from seven Peruvian communities that were investigated by the Wellbeing in Developing (WeD) countries ESRC research group. The variables are the following:
COMMUNITY: categorical variable identifying the seven WeD-Peru communities.
EDYEARS: Years of formal education
NFOODEXP: participant’s monthly non-food expenditure
FOODEXP: participant’s monthly food expenditure
TOTALEXP: participant’s monthly total expenditure
AVGEXP: average expenditure of the community where the participant’s live
INDI: Intermediate needs deprivation index. (0= the household is not deprived in terms of intermediate needs, 1= the household is deprived in 1 out of 10 intermediate needs … 10= the household is deprived in 10 out of 10 intermediate needs).
As we have a priori knowledge of what determines the percentage of expenditure on non-food items, estimate via regression analysis the following model using the ‘Enter’ method:
PERNONFOOD = bo+b1EDYEARS+b2INDI+ b3AVGEXP+ b4URBAN+εi
A. Prior to running the analysis:
1. Generate the dependent variable (PERNONFOOD) by computing the percentage of expenditure spent on non-food items (Transform → Compute variable).
2. Create the dichotomous variable URBAN (1=urban community, 0=non-urban) by recoding COMMUNITY.
B. Conduct the multiple regression analysis, report and comment on your results including the following:
∙ The coefficients and their significance
∙ The extent to which our model predicts the percentage people spend on non-food items
∙ Meaning of the ANOVA included in the SPSS output
∙ The existence of multicollinearity in the data set
∙ The presence of homocedasticity and whether the residuals are normal or not
∙ Assess the risk of having biased coefficients
∙ Can this model generalise beyond our sample?
The file WeD_Thai_Health.sav contains the following variables concerning 588 household heads from the seven WeD Thai communities:
AGE: age in years
SEX: (1=male, 0=female)
COMMUNITY: categorical variable identifying the seven WeD-Thai communities. It will be included in the regression as it represents the structural characteristics of the sites (availability of health care services, local health risks or proximity to hospitals for example).
CHRONIC ILLNESS: The household has a member suffering from a chronic illness (1=yes, 0=no)
HEALTHCARE: The household has an ill person who has not been treated or has no access to vaccination (1=yes, 0=no)
WEALTH: number of assets owned by the household
Additionally, the file contains information on people’s satisfaction with the health care (SATHEALTH) available to the household. This was captured through the following question:
‘Concerning the health care your family gets which of the following is true? The health care your family get is:
A. Conduct an exploratory analysis to explain what determines people’s satisfaction with healthcare in Thailand (using a stepwise method of your choice). But first remember:
1. To transform SATHEALTH into a dichotomous variable (1=more than adequate + just adequate, 0= not adequate)
2. To generate dummy variables representing each community (take site 41 as baseline group)
B. Run the logistic regression, justify your choice of stepwise method, report and interpret the results. You should consider:
∙ Goodness-of-fit
∙ Hosmer and Lemenshow Test
∙ Predictors left in the final equation
∙ The regression coefficient, their significance and the meaning of exp(β)
∙ Outliers and influential cases
∙ Whether the model correctly classifies most cases or not
3. Patient Satisfaction and the ‘Annual Health Check’ (25% of the marks)
The file MRes_PatSat.sav contains data on 168 English acute hospitals (identified by the codes: REM, RCF, RTK …). The data were obtained from a large sample survey conducted at the individual patient level but analysed at the hospital level. The eight variables: Admission, Ward, Doctors … Discharge can be thought of as a rating (in the theoretical range 0 – 100) for each hospital for each of the various aspects of an inpatient stay. It is proposed to combine these eight variables into a (summative) scale of overall Patient Satisfaction with the hospitals.
A. What do you think of this proposal, particularly in the sense of the reliability of the resulting scale?
B. The values of the resulting scale are given in PS_ovr via Transform→Compute PS_ovr = mean(Admission, Ward, …, Discharge). Variable Xcllnt gives the proportion (%) of patients of each hospital who have given the response ‘Excellent’ to the question: “Overall, how would you rate the care you have received?” (other possible responses were: ‘Very good’, ‘Good’, ‘Fair’ and ‘Poor’). How does knowledge of this latter measure affect any tentative conclusions you may have come to in A.? Which (PS_ovr or Xcllnt), if either, do you think would be preferable as a measure of overall Patient Satisfaction?
C. A regulatory body (the Healthcare Commission) carries out an annual assessment of the Service Quality of each hospital (the so-called ‘Annual Health Check’). Performance is assessed against a large number of indicators (standards and targets) and summarised as: ‘Poor’, ‘Fair’, ‘Good’ or ‘Excellent’. These ratings for the hospitals are given in variable SQ. Investigate the relationship between Patient Satisfaction (as seen by patients) and Service Quality (as seen by the Healthcare Commission).
4. Getting ahead (25% of the marks)
A sample of 1819 people responded to a survey conducted by a well-known (American) business magazine. The survey asked them to rate the importance of the variables listed below to ‘getting ahead in life’. The scale was: 1 = essential, 2 = very important, 3 = fairly important, 4 = not very important, 5 = not important at all.
The data is in the file MRes_gettingahead.sav (the other two variables: id and sex record the identity and gender of the respondent respectively). Use factor analysis to explore and interpret the structure of this data. Does there seem to be a difference between the responses of males and females? (NB You will note that as well as the values 1…5 mentioned above there are also the values 0, 8 and 9 specified as Missing Values (0 = “NAP” (Not Applicable), 8 = “CANT CHOOSE” (i.e. Don’t Know), 9 = “NA” (Not Available). Some rows have 0 in every field apart from id and sex! Why bother entering “non-data”? Well this data is a subset of a larger dataset so that the rows which appear empty here aren’t really empty. Just ignore the missing data for your analysis.)