Question

发布时间:2014-08-31 11:42:06

Please answer the following FOUR questions. There is a word limit of 500 words per question. You can include selected SPSS output where appropriate - this is not included in the word limit.

1. Predicting percentage spent on non-food items in Peru (25% of the marks)

The file WeD_Peru_Nonfood.sav contains variables that can be used to predict the percentage people spend on non-food items. The data is from 219 participants from seven Peruvian communities that were investigated by the Wellbeing in Developing (WeD) countries ESRC research group. The variables are the following:

COMMUNITY: categorical variable identifying the seven WeD-Peru communities.

EDYEARS: Years of formal education

NFOODEXP: participant’s monthly non-food expenditure

FOODEXP: participant’s monthly food expenditure

TOTALEXP: participant’s monthly total expenditure

AVGEXP: average expenditure of the community where the participant’s live

INDI: Intermediate needs deprivation index. (0= the household is not deprived in terms of intermediate needs, 1= the household is deprived in 1 out of 10 intermediate needs 10= the household is deprived in 10 out of 10 intermediate needs).

As we have a priori knowledge of what determines the percentage of expenditure on non-food items, estimate via regression analysis the following model using the ‘Enter’ method:

PERNONFOOD = bo+b1EDYEARS+b2INDI+ b3AVGEXP+ b4URBAN+εi

A. Prior to running the analysis:

1. Generate the dependent variable (PERNONFOOD) by computing the percentage of expenditure spent on non-food items (Transform Compute variable).

2. Create the dichotomous variable URBAN (1=urban community, 0=non-urban) by recoding COMMUNITY.

B. Conduct the multiple regression analysis, report and comment on your results including the following:

The coefficients and their significance

The extent to which our model predicts the percentage people spend on non-food items

Meaning of the ANOVA included in the SPSS output

The existence of multicollinearity in the data set

The presence of homocedasticity and whether the residuals are normal or not

Assess the risk of having biased coefficients

Can this model generalise beyond our sample?

2. Predicting satisfaction with health in Thailand (25% of the marks)

The file WeD_Thai_Health.sav contains the following variables concerning 588 household heads from the seven WeD Thai communities:

AGE: age in years

SEX: (1=male, 0=female)

COMMUNITY: categorical variable identifying the seven WeD-Thai communities. It will be included in the regression as it represents the structural characteristics of the sites (availability of health care services, local health risks or proximity to hospitals for example).

CHRONIC ILLNESS: The household has a member suffering from a chronic illness (1=yes, 0=no)

HEALTHCARE: The household has an ill person who has not been treated or has no access to vaccination (1=yes, 0=no)

WEALTH: number of assets owned by the household

Additionally, the file contains information on people’s satisfaction with the health care (SATHEALTH) available to the household. This was captured through the following question:


‘Concerning the health care your family gets which of the following is true? The health care your family get is:

A. Conduct an exploratory analysis to explain what determines people’s satisfaction with healthcare in Thailand (using a stepwise method of your choice). But first remember:

1. To transform SATHEALTH into a dichotomous variable (1=more than adequate + just adequate, 0= not adequate)

2. To generate dummy variables representing each community (take site 41 as baseline group)

B. Run the logistic regression, justify your choice of stepwise method, report and interpret the results. You should consider:

Goodness-of-fit

Hosmer and Lemenshow Test

Predictors left in the final equation

The regression coefficient, their significance and the meaning of exp(β)

Outliers and influential cases

Whether the model correctly classifies most cases or not

3. Patient Satisfaction and the ‘Annual Health Check’ (25% of the marks)

The file MRes_PatSat.sav contains data on 168 English acute hospitals (identified by the codes: REM, RCF, RTK …). The data were obtained from a large sample survey conducted at the individual patient level but analysed at the hospital level. The eight variables: Admission, Ward, Doctors … Discharge can be thought of as a rating (in the theoretical range 0 – 100) for each hospital for each of the various aspects of an inpatient stay. It is proposed to combine these eight variables into a (summative) scale of overall Patient Satisfaction with the hospitals.

A. What do you think of this proposal, particularly in the sense of the reliability of the resulting scale?

B. The values of the resulting scale are given in PS_ovr via TransformCompute PS_ovr = mean(Admission, Ward, …, Discharge). Variable Xcllnt gives the proportion (%) of patients of each hospital who have given the response ‘Excellent’ to the question: “Overall, how would you rate the care you have received?” (other possible responses were: ‘Very good’, ‘Good’, ‘Fair’ and ‘Poor’). How does knowledge of this latter measure affect any tentative conclusions you may have come to in A.? Which (PS_ovr or Xcllnt), if either, do you think would be preferable as a measure of overall Patient Satisfaction?

C. A regulatory body (the Healthcare Commission) carries out an annual assessment of the Service Quality of each hospital (the so-called ‘Annual Health Check’). Performance is assessed against a large number of indicators (standards and targets) and summarised as: ‘Poor’, ‘Fair’, ‘Good’ or ‘Excellent’. These ratings for the hospitals are given in variable SQ. Investigate the relationship between Patient Satisfaction (as seen by patients) and Service Quality (as seen by the Healthcare Commission).

4. Getting ahead (25% of the marks)

A sample of 1819 people responded to a survey conducted by a well-known (American) business magazine. The survey asked them to rate the importance of the variables listed below to ‘getting ahead in life’. The scale was: 1 = essential, 2 = very important, 3 = fairly important, 4 = not very important, 5 = not important at all.

The data is in the file MRes_gettingahead.sav (the other two variables: id and sex record the identity and gender of the respondent respectively). Use factor analysis to explore and interpret the structure of this data. Does there seem to be a difference between the responses of males and females? (NB You will note that as well as the values 1…5 mentioned above there are also the values 0, 8 and 9 specified as Missing Values (0 = “NAP” (Not Applicable), 8 = “CANT CHOOSE” (i.e. Don’t Know), 9 = “NA” (Not Available). Some rows have 0 in every field apart from id and sex! Why bother entering “non-data”? Well this data is a subset of a larger dataset so that the rows which appear empty here aren’t really empty. Just ignore the missing data for your analysis.)

Question

相关推荐