Measuring Quality of Life in Nursing Homes: The Search Continues M. Powell Lawton Philadelphia Geriatric Center In order to talk meaningfully about defining and measuring QOL in nursing homes, I have to begin with an extended appreciation of the work of Rudy Moos and his colleagues. Moos was one of the most creative psychologists of our time. So perceptive and omnivorous was his mind that he was able to create a basic model of the personenvironment system that fit settings as diverse as mental hospitals, community programs for substance abusers, and residential environments for elders. Although most environmental psychologists are no doubt familiar with his work, I shall describe it in some detail because it was such an important influence on what I shall present today. I'll then explain why his Multiphasic Environmental Assessment Procedure, despite its many strengths, did not meet the need for a nursing home evaluation procedure to use in the new millennium. In the process of describing the procedure on which my colleagues and I are still at work, I shall emphasize the Moos theme that person and environment constitute a system whose operation is composed of multiple facets of the person and the physical, social, and organizational environment, [p. 87-88 of PL et al.] [need examples of RESEF, SCES, & Rating Sc] Invited address, Divisions 34, 5, and 20, annual meeting of the American Psychological Association, Boston, August 23, 1999. 1 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) Every comprehensive environmental assessment package must necessarily build on the MEAP. There are a number of reasons, however, why it is not suitable for all purposes, including that of serving as a measure of QOL in nursing homes. The first reason is that the MEAP is not completely evaluative in its approach; pure description without the implication of evaluation composes some of the content. The RESIF (show slide again), is almost completely descriptive, as is much of the PAF. The mixture of evaluative and descriptive modes occurs because the MEAP was designed to describe person-environment processes that are more general than simply evaluative. By contrast, the task taken on by me and my colleagues is to design an evaluative system that may be used to regulate quality on a national scale. I suggest that the stake for all actors in the maintenance of quality in nursing homes is great enough to demand a new look in assessment. The specific research now in process is being done under a contract from the Health Care Finance Administration to Rosalie Kane, Robert Kane, and myself to design a QOL assessment procedure that can be used by state Medicare and Medicaid surveyors in their regular rounds of certifying nursing homes. This procedure is required every 2 years and more often if serious deficiencies have been identified. All deficiencies ("citations") must be corrected by the next inspection and every home's last performance is a matter of public record. Surveyors are usually nurses trained for this task; some work full time and others only occasionally. Although the survey process often locates serious life- and health-threatening infractions, few homes are closed, presumably because of the shortage of subsidized beds 2 C:\WPWIN60\XLAWTON\DAYCAREl\MeasuringQOLinNursingHomes,doc (8/19/99) and consequent hardship to residents who must be relocated. Probationary status and many of the citable deficiencies, however, put owners in substantial financial jeopardy, both through bad public relations and the direct costs of correcting the deficiencies. Another consideration makes the survey process threatening to high-quality facilities: Although deficiencies result in loss of licensure, no "points" are given for attributes that exceed the minimum. Some such innovative features may actually improve quality but not conform with the letter of the regulations. There is thus often a disincentive for innovation; the argument is often advanced that the direct penalty system should be replaced by one that accommodates some degree of tradeoff whereby above-average quality in some attributes may compensate for minor failures in other attributes. In the present certification process, failures are potentially damaging and a major burden is thus placed on the surveyor to recognize and document clear infractions of the regulations. The result is that criteria emphasize concrete aspects of the physical environment such as dirt or safety hazards, and aspects of the care environment such as documentation in medical records or presence of decubiti: (Slide) Here is an excerpt from the list of federal regulations to which institutions must conform. You can see that most refer to matters of administrative record. In HCFA's defense, however, QOL considerations are very prominent in the surveyor's guide to resident interviews (slide). The problem is that this interview is not standardized and is required to be done on too few residents to make any tabulation useful. Real congratulations are due HCFA for their realization of the importance of the "softer"~i.e., psychological and social-aspects of QOL and their willingness to take a major step in developing criteria usable in the regular survey process. The goal of their 3 C:\V/PWIN60\XLAWTON\DAYCAREl\MeasuringQOLinNursingHomes.doc (8/19/99) contract is to enhance the entire survey process with whatever can reflect quality beyond sanitation, safety, and documentation. The relevance of this project to environmental psychology is obvious. It affords one of the first opportunities to apply our environmental assessment technologies to a large sample of nursing homes, with a chance of strongly influencing the future development of American nursing homes. This opportunity also demands a macro perspective on person-environmental relationships, one that does its best to recognize the indivisibility of person, social environment, and physical environment. Rudy Moos' example made this task much easier. I'll begin with the enumeration of a set of basic problems facing the research team and then move on to describe the early portion of this case study of measurement. The study is still being designed and the only data available are from a small pilot sample of residents and facility-level data from two nursing homes. Problems (slide) 1. Documenting indicators of quality. If assessing quality were to be limited to what is measurable in c.g.s. and therefore easily documented, our measures of quality would be impoverished. We thus must move from such objectivity to an expanded concept of quality defined by consensus. 2. Incorrect citations have a high cost. The survey process must protect the Surveyors consumer, the professional, and the sponsor while it monitors quality. themselves may need to be recruited selectively and must be trained in new concepts such as quality defined by consensus if QOL is to be represented in a broader fashion. 3. The absence of an obvious ultimate validity criterion. "Performance criteria" for medical caire facilities traditionally include death rate, morbidity rate, and the 4 C:\WPWIN60\XLAWTON\DAYCAREWeasuringQOLinNursingHomes.doc (8/19/99) use of appropriate types of treatment. Most other criteria are "process criteria" in Donabedian's terms. Can expert judgments of quality be devised which will be more satisfactory than performance criteria simply imported from acute-care settings? 4. By whose perspective is quality to be judged? Recent thinking in both professional and legislative circles has emphasized the consumer--the nursing home resident--as the ultimate judge of quality. In this view QOL is seen as an idiosyncratic judgment that can be made only by an individual judging by his or her subjective standards. Another possible view is that QOL may also be an attribute of an environment or context, where the criterion is expressible variously in c.g.s. terms, expert judgments, or aggregated consensual judgments, under a probabilistic hypothesis that higher contextual quality will result in greater prevalence of high individual QOL. QOL thus may be estimated for the individual or for the aggregate. 5. Representing both positive and negative quality of life. Most of the survey process is built around identifying deficits, risks, and noxious aspects of care. The equal and opposite positive valence is the essence of QOL. Improvements above the adequate or ordinary level are the neglected aspect of nursing home life. 6. Does quality vary by user group? As with any large group, residents in Do differences characterizing separate subgroups The major recent nursing homes are not all alike. correspond with different indicators of QOL for each group? development of special care units for people with dementia is an example of a possible result of the conviction that QOL may differ between cognitively intact and impaired residents. Although dementing illness defines the major such subgroup, one may ask 5 C:\WP WIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) similar questions about short-stay residents, younger brain-damaged people, or people with severe mental illness. THE CURRENT HCFA PROJECT These 6 problems have been ever-present during the year in which we have been engaged in this work. Let me begin with an overview of our strategy, which will be followed by as much measurement detail as I can fit into our time. Virtually everything may change, however, because HCFA staff have not yet signed off on our preliminary drafts. We hope to begin gathering data on 40 facilities in October and test instrument revisions on another 40 facilities a year later. Thus my purpose today is to walk through some of our thought processes rather than to present a final method. The structure of the research (slide). Evaluative quality. There is much in the environment and in human behavior that is stylistic, preferentially-determined, or otherwise neutral from a value-judgmental perspective. All subsumed under QOL is defined as evaluative, with clearly positive and negative poles. Need. Our research began with the decision to organize the entire process around the concept of the major needs that the nursing home might serve. In the environmental gerontology arena, we had many predecessors, with fair consensus among investigators (slide). We also went back to Murray, Edwards and others to enlarge the range of needs beyond those especially relevant to the physical environment, as in Figure 1 (slide). "Needs" constitute one of those open-ended lists. "Dignity" and "spiritual well-being" are 2 of those requested by HCFA that might not have appeared on a list that we constructed, but the others seem congruent with psychology's usual short lists of needs. 6 C:\WPWIN60\XIAWTON\DAYCARE\MeasuringQOLinNursingHomes.doc (8/19/99) Person and environment. The need list is phrased in terms of complementary environmental press, that is, what the environment provides the client that may be relevant to each need. We thus seek parallel elements in person and environment. (Structure slide again). Multiple perspectives. Every method for assessing quality will be seen to have shortcomings. Therefore, to the extent possible, multiple perspectives will be used to assess each quality indicator. Subjective judgment and observation. The subjective-objective duality is operationalized in terms of attributes that individuals must judge (subjective) and those that may be observed. The physical and social environment are conceived and measured separately for heuristic purposes. Use of these assumptions and strategies resulted in SEVEN SOURCES OF QUALITY OF LIFE ASSESSMENT DATA (slide), are divided into the aggregate (facility or unit) and the individual (resident) level. I shall move quickly through each of these approaches, noting their special character and how they deal with or fail to deal with the problematic features noted. QUALITY IN THE AGGREGATE A. Physical Environment. The best known recent environmental measure is Philip Sloane's Therapeutic Environment Screening Scale (TESS). The full TESS contains much nonevaluative contact, and it was not based on any theoretical model of person-environment transaction. The approach we are taking began with each personal need and developed an exhaustive 7 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) list of environmental features serving each need, as gleaned from the literature of several decades of published clinical experience and research. That theoretically-based list of environmental features nested under needs was then organized by physical space types for use of observation. The needs to which indicator is suggested to relate are in the code at the top of Figure 2 (slide). All that is included is observable, but some items nonetheless require observer judgments ("unsightly trash," "attractive"), thus requiring explicit criteria to be used in training researchers and in establishing reliability. Six such pages plus 4 for each unit are estimated to require about 1.5 hours for a single systematic walk-through. B. Psvchosocial Observation. (Moos' term is "social climate"). For the most part, the relevant indicators are usually individual or collective staff behaviors or resident behavior reflecting particular aspects of the milieu. This section is by far the most problematic and at the same time the most central to the construct. Although all indicators are observable (examples for Autonomy shown in Fig. 3), many require subjective judgments of their meaning. Every list of indicators for the 11 need domains is open-ended in giving only examples of members of a larger class which are meant to be augmented by new indicators that the researcher judges to fall into a need class. A quota of time- and place-specific observations are specified, with incidences of indicators like these being tallied. The researcher will use this information plus all other data from the other instruments to make the bottom line subjective 10-point ratings (Fig. 4) at the end of her stay. Because these observations are both highly subjective and highly salient, all actors in the system-- HCFA, the surveyors, and the owners-are very concerned about the possibility that 8 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) devastating errors might be made and the softness of the documentation that might arise from this approach. Yet most experts individually feel that they are capable of perceiving such "social climate" as they visit facilities. Once again, the confidence of all needs to be fortified by a rigorous psychometric treatment of such material. These 10-point ratings will constitute one of 3 major facets of overall quality that result from a facility survey. C. Psvchosocial Environment (informant and archival data) Another aspect of the psychosocial environment is not easily revealed by direct observation but affords indicators that are objective in the sense of being stated on paper, in training materials, or as administrative policies and practices (Fig. 5). Although some, such as payroll hours or printed modules used in staff training programs, are truly objective, a great many are of the type where actual practice differs from what is espoused in policies and published statements of practice, an obvious source of error. A particularly bothersome issue is the social-desirability aspect of reported policies and practices, which in turn lead to what has been called "gaming," deliberate manicuring of information known to be relevant to inspection standards or reimbursement levels. The examples shown in Fig. 5, like those from section B, illustrate for the researcher what needs to be asked about, with open-end notes being taken for ultimate use in making the 10-point needs-meeting ratings shown in the earlier slide. QUALITY FOR THE INDIVIDUAL D. Staff ratings of each resident are accounted for partially by the regular, mandated Minimum Data Set (MDS), which among other features, provides measures of cognitive impairment, functional health, and behavioral disturbances. Although nurses, social workers, and activity staff are sought for some other ratings (Fig. 6), the certified 9 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) nursing assistant (CNA) is the primary source of such information. All staff time is in very short supply and an evaluative procedure that demands large amounts of their time would fail. This section thus consists of a very short section (about 5 minutes), which does not systematically evaluate each need; it is to be completed for every subject. A longer section will be used only for residents who are too cognitively impaired to respond to the resident interview (about 20 minutes). To utilize individual resident attributes such as depression, activity involvement, or positive affect as indicators of QOL makes the assumption that the care setting contributes to such outcomes over and above the effects of such stable influences as lifelong personality or irreversible pathology. One way in which this distinction between what may and may not be affected by the type of care given is to look at change over time, rather than at these attributes on a single occasion. E. Resident Observation. (a) Physical Environment (Fig. 7). The resident's own room and furnishings are surveyed while the researcher administers the resident interview (Section F), in the same fashion that domain A was observed for the shared environmental features. (b) Observed Behavior. We begin with 3 specified 10-minute observation periods. One is done unobtrusively wherever the resident happens to be at a time immediately preceding the researcher's introduction to the resident and request for an individual interview. Another is done at a meal, and the third in a setting and time yet to be determined. (Fig. 8 shows one section of the mealtime observation). It is unlikely that behavior relevant to all needs will occur; thus only a few of the domains will be attempted. One behavior sample that can be obtained across all residents in every setting, 10 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) however, also has the great advantage of representing both positive and negative states: The Philadelphia Geriatric Center Apparent Affect Rating Scale (Fig. 9), which has demonstrable reliability and validity when used with both intact and cognitively impaired residents. F. The Resident Interview (Fig. 10) attempts to elicit what many believe to be the heart of QOL, the evaluation of each need area by individual residents. Although current ideology often presents the consumer's point of view as the gold standard against which other quality indicators should be validated, there are a number of considerations that limit the usefulness of such a conclusion. First, people in general have a positive response bias. Second, nursing home residents in particular are inclined to give rosy responses as the result of their relative powerlessness in relation to administration and staff. Third, some types of knowledge about the relationship between type of care and outcome are in the domain of expert knowledge and not necessarily known to residents. Finally, with 50-60% of nursing home residents experiencing major cognitive loss, a large percentage are unable to perform the cognitive reporting and judgmental tasks required by a consumer survey. All of these problems may be ameliorated to some extent by careful choice of phrasing and use of the frequency distributions obtained by other investigators to use questions that provide relatively broad response distributions. Uman et al. (in press), for example, designed an interview requiring only yes or no answers, and reported that 70% of her nursing home sample was able to respond to this format. Such a dichotomous format, however, both reduces variance and also alienates many relatively intact people. We have therefore elected to use 4-point "never" to "often" frequency ratings, which have 11 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) worked in about two-thirds of our pilot subjects. We have also decided to reduce 2 basic measures used for intact residents as 4-point scales (Fig. 11) to dichotomous ratings for the less-intact. Whatever the method, some sizable minority will be unable to respond. Is it possible to interview a family member as a proxy respondent for the resident? G. Family Questionnaire. The literature is not very affirming of the ability of either a family member or professional to agree with the evaluations elicited from relatively intact residents. It thus seems inappropriate to assume that such proxy responding would be any more accurate for people with dementia. On the other hand, many family members are important members of the care system and are in a position to make independent estimates of quality. (Fig. 12) We therefore will use a modified version of the Resident Interview to obtain an additional perspective on quality. VALIDITY AND THE CONVERGENCE OF EVALUATIONS. Two of the problems mentioned initially require further discussion following this presentation of the structure of the QOL assessment: Validity and convergence. Validity. Having rejected the use of all residents' aggregated judgments as an ultimate validity criterion, what other possibilities remain? One possibility is to use the subset of the highest cognitive-functioning residents as experts to define ultimate quality. Such an approach has the fatal defect of assuming that what is best for the intact is also best for all, including the cognitively impaired. A more promising approach would be to use the traditional psychometric procedure of seeking criterion judgments of each facility from independent experts, against which all other facility-level data and aggregated resident data would be correlated. Specifically, a researcher with knowledge of nursing homes and long-term care issues will be sought in each state, and paid to spend 3 days in each 12 CAWPWINeO^LAWTONXDAYCAREMVleasuringQOLinNursingHomes.doc (8/19/99) facility to observe and interview residents and staff in open-ended style, without awareness of the forms of data being gathered for the research evaluation. The final task would be for the expert to complete the 10-point rating scales where each facility receives a rating on each domain back to (Fig. 4). Objections to this method are, first, that some experts may have some knowledge (either first-hand or reputational) of facilities predating their on-site observations, thus risking rater bias. Second, many experts in the policy and advocacy networks object to the use of an ultimate criterion that ignores consumer judgments completely. Convergence. Does the approach of convergent validity offer an alternative to a single-criterion validity? A relatively high degree of convergence was built into the research by organizing all sources of data around 11 need domains. Exact resident-level instrument replications were built into the resident interview, the family questionnaire, and a few staff ratings (back to Fig. 11. This consists of a single question (4-point, poor to excellent) about each domain. Parallel facility-level 10-point domain ratings will be generated by multiple researchers at the end of their on-site time and by the outside experts (Fig. 4). Theoretically, then, each facility can be represented by a combination of sources: The means of the resident-level 11 domain ratings, the single mean of 4 to 8 researchers1 ratings, and the one expert's rating, all across 11 domains. If these 3 perspectives were simply weighted equally and merged, we should have an unusually broad convergent measure of quality. Item analysis of the specific indicators Realistically, however, when translated into the ongoing survey process, what I have described is far too detailed, long, and undoubtedly cluttered with poor items from a psychometric point of view. Therefore, for ultimate use in the survey process, some form 13 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) of shortening and culling of items for quality is required. I'd be anxious to have your opinion on how you would proceed in this task. On the resident level, we could use the domain-specific assessments as dependent variables for item analysis within the resident interview, the family interview, and the staff-rating data sources, thereby identifying a shorter set of most-discriminating resident-level items. A similar process could be used on the facility level, where a combination of the 3 sets of domain ratings produced by residents, researchers, and experts, are the dependent variables against which the items are analyzed. Knowing the realities of empirical data, convergence is certain to be incomplete. What to do in computing a facility-level validity criterion when the relative rankings done by residents, researchers, and experts disagree is uncertain. As alternatives to the mathematical combination of resident, researcher, and expert judgments, 2 other approaches could be taken. One is to use only the resident aggregate ratings when there is no convergence among the data sources, thereby giving precedence to the consumer's judgments. The other is to accept the poorest domain rating among the 3 sources, on the theory that identifying what needs to be improved is the most important task. I could go on with a book full of interesting issues (One example: Some would like us to produce a single number indicating the QOL in a facility. What do we do with that one?) Instead I shall end by going beyond behavioral science to a policy issue. Is a QOL audit of the kind described here likely to improve the QOL of nursing homes? If our project is successful, surveyors might become equipped to diagnose QOL deficiencies and cite them to the point of removal of licensure. To me, this appears problematic because QOL has so many subjective aspects. I predict that surveyors will be reluctant to 14 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99) make such citations and that political pressure from owners will minimize the legal clout of such citations. One possible outcome is that the present survey process will continue to be used to correct the most egregious lapses in minimum quality of care. Beyond the merely adequate level attained by correcting basic deficiencies, improvement in QOL up through the positive to excellent ranges may be a matter better controlled by the market place than by legal enforcement. Our research results could lead toward intra-facility self-assessment, staff-training, and growth, articulating possible avenues for improvement on which administration and staff could work proactively. The greatest weakness of the marketplace hypothesis, however, is that the present market is responsive primarily to the upper income range of client families. If market-driven improvement in QOL is to occur, we should have to see greater equalization of opportunity and increasing competition for the patronage of Medicaid level residents as well as those who pay in full for their care. Despite such difficulties, we should leave room for the possibility that improved QOL in nursing homes may emerge better with indigenous rather than legal motivation. 15 C:\WPWIN60\XLAWTON\DAYCARE'\MeasuringQOLinNursingHomes.doc (8/19/99)