Eric Grosch, Letter to Dr. Morgenstern on LOR - Semmelweis ...

DR. ERIC N GROSCH - A new approach <strong>to</strong> LOR 

Dear <strong>Dr</strong>. <strong>Morgenstern</strong>: 

I read your article[1] with interest. You wrote: 

In considering a new approach <strong>to</strong> the pediatric LOR, COMSEP and the APPD welcome the input 

of the broader pediatric community. 

Thanks for expressing an interest in receiving my comments, since I'm an internist, not a 

pediatrician. I'm pleased <strong>to</strong> contribute <strong>to</strong> the dialogue, which I think is important. I apologize, in 

advance for the length of my text but I think the <strong>to</strong>pic warrants it. Quoted text is indented and my 

own unindented. 

The LOR and its close relative, the performance-appraisal, are sacred cows in medical education, 

training and job-placement. They purport <strong>to</strong> provide means of communicating a candidate's traits 

among men<strong>to</strong>rs -- the performance-appraisal among men<strong>to</strong>rs within an institution; the LOR from 

men<strong>to</strong>rs in one institution <strong>to</strong> those in another. The approach for each is much the same. 

That purpose seems analogous <strong>to</strong> the medical record in patient-care, which provides a means of 

communicating a patient's disease-traits among the patient's physicians. The analogy is false for 

reasons I cite, in the chart, below: 

Clinical chart LOR/performance-appraisal 

appraisal and appropriate appraisal summative, end of rotation presented usually action at least 

daily after the fact, <strong>to</strong>o late for improvement 

goal is improvement of the patient goal varies between promotion of the candidate <strong>to</strong> his 

elimination from consideration 

Relies on objective evidence for Often relies on rumor, innuendo, scuttlebutt for decision-making 

decision-making 

Documentation is as long as is Documentation is as brief as possible <strong>to</strong> save 

necessary reading time 

Documentation is in terms of specific Documentation is in terms of unsubstantiated 

clinical events opinions, couched in generalities, of men<strong>to</strong>rs, peers, etc. 

On the evidence that I've examined, I believe that there are better ways than the LOR and the 

performance-appraisal <strong>to</strong> accomplish the mission. I don't consider that a flippant belief. I've 

arrived at my opposition <strong>to</strong> the LOR/performance-appraisal through considerable thought, reading

and anecdotal experience, both as an author and subject of LORs. Most of what I say here is in 

the public domain and obvious. I get the impression that nobody ever puts it <strong>to</strong>gether, so I've 

done that, though, perhaps incompletely. If you have any objections <strong>to</strong> what I've said here, please 

let me know them. 

I divide my reasons in<strong>to</strong> generic sections: 

1. Golden Rule: Do un<strong>to</strong> others as you would have others do un<strong>to</strong> you. The Golden Rule[2], 

alone, should persuade anyone with any insight in<strong>to</strong> the treatment that he would ideally prefer for 

himself, that the performance-appraisal/LOR can never work. 

2. Comparisons are always odious. 

3. Deleterious effect: The idea of performance-appraisal is fundamentally flawed, even 

dysfunctional, because of the often deleterious effect it has on those trainees that appraisers rate 

as less than the very best, even though quality of performance is a lottery, governed in large part 

by random chance. Accordingly, rating people who are of the system makes no sense. 

4. Improper substitute for “where do I stand”: LORs and performance-appraisals serve the 

organization or institution, not the individual appraised. 

5. Inaccuracy: 

a. misapplication of the Likert-scale principle 

b. inevitability of rating-inflation 

c. popularity-contest 

d. mismeasure of “excellence” 

e. men<strong>to</strong>r-inattention: Men<strong>to</strong>rs, who are supposed <strong>to</strong> do the evaluations and LORs, don't pay 

enough attention <strong>to</strong> their trainees' performance <strong>to</strong> fulfill that function adequately because their 

contact with trainees is minimal and sporadic, so their appraisal of the performance of their 

trainees is most often inaccurate and may even reverse the reality on the ground. 

f. self-fulfilling prophecy 

g. absence of evidence-basis 

h. glittering generalities: Even if men<strong>to</strong>rs paid attention <strong>to</strong> trainees' performance, the rating 

systems that they use address only glittering generalities, such as “general medical knowledge,” 

require presentation of no supporting evidence and rarely <strong>to</strong> never address the only index of 

work-performance in medicine, namely, clinical outcomes of patients under the trainees' care.

i. dis<strong>to</strong>rtion from “confidentiality,” under perpetual tension 

6. The ultimate goal: communion of “<strong>to</strong>p” talent in “<strong>to</strong>p” institutions 

7. Illustrative anecdote which is more typical than it should be 

1. Golden Rule[2] 

The performance-appraisal and the LOR are exercises in disregarding the needs of others and in 

attributing <strong>to</strong> others the character and nature of objects. The psychic mechanisms that prompt 

those in authority <strong>to</strong> impose performance-appraisal/LOR on others -- what they would not want 

for themselves -- are obscure but the most likely reason seems <strong>to</strong> be that the very act of 

imposition may, in and of itself, provide a pleasurable and ego-boosting exercise of arbitrary 

authority. 

Whatever the psychic mechanisms, the proof of the observation appears in the contrast between 

the AMA's consistent endorsement of peer-review for physicians they presume, generically, <strong>to</strong> be 

“bad doc<strong>to</strong>rs and its disparagment of Professional Standards Review Organizations (PSROs). 

For example, it is a matter of record that the AMA was a strong supporter of enactment of the 

Health Care Quality Improvement Act of 1986 (HCQIA), which codified peer-review provisions 

from hospital bylaws in<strong>to</strong> federal law. The AMA initially supported the HCQIA because it 

supports peer-review of “bad doc<strong>to</strong>rs” but opposed one feature of that act, the National 

Practitioners Data Bank (NPDB), and withdrew its support al<strong>to</strong>gether from the HCQIA over that 

issue, but it eventually obtained a quid pro quo: NPDB as well as absolute immunity from liability 

for hospital-level peer-reviewers, a provision that has led <strong>to</strong> the proliferation of bad-fath 

peer-review.[3] 

At the same time, JAMA and other journals have published articles that have impugned the 

accuracy of the findings of PSROs, the agents of which may conduct peer-review on any 

practicing physician, including one whom the AMA would presume <strong>to</strong> be a “good doc<strong>to</strong>r.” 

Neither JAMA nor any other medical journal has published even one article that has examined the 

accuracy or validity of hospital-based peer-review, which the AMA enthusiastically approves 

because statutes render such peer-review privileged/confidential. 

2. Comparisons are always odious 

Farrell[4] noted the emotional effect of sex-reversed beauty-contests among men. The winner 

was, of course, ecstatic, enthusiastic and high in self-esteem but the runner-ups felt devastated at 

the relative rejection and they experienced an epiphany: why women (at least those of runner-up 

grade [or less] physical appearance) dislike cus<strong>to</strong>mary beauty-contests. Comparisons are always 

odious because the comparison-game is a zero-sum proposition. Performance-appraisal is an 

appraisal of something that the evaluee can control <strong>to</strong> some extent by his conscious will, as 

opposed <strong>to</strong> appearance, which he can't, so it's marginally less pernicious than a beauty-contest but 

not much. The fact remains that the better one individual rates, the worse others do. It's

inescapable. 

The performance-appraisal/LOR in most fields, including medical education, always hinges on 

comparing one trainee, by various criteria, with others. The comparison may appear as a 

class-rank or as a comparison of the subject's rating with an ideal rating, e.g., 6 of a possible 10 

points, 4 of a possible 5 points, etc. The message is always: “You don't measure up.” 

That message is especially demoralizing <strong>to</strong> the usual medical trainee, since the very fact that he's 

survived <strong>to</strong> the stage of medical training means that he has already survived very stringent 

selection/exclusion filters and thus become accus<strong>to</strong>med <strong>to</strong> superlative accolades in his early 

education, through schooling and his undergraduate years. Thrown in the midst of similar 

high-achievers, the normative performance is likely <strong>to</strong> be uniformly high and he may rate merely 

“average.” 

3. Deleterious effect: 

Deming lists the performance-appraisal (and, by implication, also the cus<strong>to</strong>mary LOR) among the 

deadly diseases of business-organizations. The same ideas apply, in spades <strong>to</strong> medical 

organizations and <strong>to</strong> LORs, which are retrospective, summative performance-appraisals, frozen 

and immutable, in perpetuity: 

...the deadly diseases... 

3. Evaluation of performance, merit rating...Many companies in America have systems by which 

everyone...receives from his superior...a rating...(101) Management by objective leads <strong>to</strong> the same 

evil...Management by fear would be a better name...(Deming 1986 107) 

Fair rating is impossible. A common fallacy is the supposition that it is possible <strong>to</strong> rate people; <strong>to</strong> 

put them in rank order of performance for next year, based on performance last year. 

The performance of anybody is the result of a combination of many forces -- the person himself, 

the people that he works with, the job, the material that he works on, his equipment, his cus<strong>to</strong>mer, 

his management, his supervision, environmental conditions (noise, confusion, poor food in the 

company's cafeteria). (109) These forces...[which]...arise almost entirely from action of the 

system...will produce...large differences between people....A man not promoted is unable <strong>to</strong> 

understand why his performance is lower than someone else's. No wonder; his rating was the 

result of a lottery. Unfortunately, he takes his rating seriously...(Deming 1986, 110) 

The effect is devastating: 

It nourishes short-term performance, annihilates long-term planning, builds fear, demolishes 

teamwork, nourishes rivalry and politics. 

It leaves people bitter, crushed, bruised, battered, desolate, despondent, dejected, feeling inferior, 

some even depressed, unfit for work for weeks after receipt of rating unable <strong>to</strong> comprehend why

they were inferior. It is unfair, as it ascribes <strong>to</strong> the people in a group differences that may be 

caused <strong>to</strong>tally by the system that they work in. 

...what is wrong is that the performance appraisal or merit rating focuses on the end product, at 

the end of the stream, not on leadership <strong>to</strong> help people. This is a way <strong>to</strong> avoid the problems of 

people. A manager becomes, in effect, manager of defects. 

The idea of merit rating is alluring. The sound of the words captivates the imagination: pay for 

what you get; get what you pay for; motivate people <strong>to</strong> do their best, for their own good. 

The effect is exactly the opposite of what the words promise. Everyone propels himself forward, 

or tries <strong>to</strong>, for his own good, on his own life preserver. The organization is the loser. 

Merit rating rewards people that do well in the system. It does not reward attempts <strong>to</strong> improve 

the system. Don't rock the boat. 

...a merit rating is meaningless as a predic<strong>to</strong>r of performance, except for someone that falls 

outside the limits of dif- (102) ferences attributable <strong>to</strong> the system that the people work in... 

Traditional appraisal systems increase the variability of performance of people. The trouble lies in 

the implied preciseness of rating schemes...Somebody is rated below average, takes a look at 

people that are rated above average; naturally wonders why the difference exists. He tries <strong>to</strong> 

emulate people above average. The result is impairment of performance. (103) 

...The problem lies in the difficulty <strong>to</strong> define a meaningful measure of performance. The only 

verifiable measure is a short-term count of some kind...(Deming 1986, 103) 

Degeneration <strong>to</strong> counting. One of the main effects of evaluation of performance is nourishment of 

short-term thinking and short-time performance...(103) A man must have something <strong>to</strong> show. His 

superior is forced in<strong>to</strong> numerics. It is easy <strong>to</strong> count. Counts relieve management of the necessity 

<strong>to</strong> contrive a measure with meaning. 

...people that are measured by counting are deprived of pride of workmanship. Number of designs 

that an engineer turns out in a period of time would be an example of an index that provides no 

chance for pride of workmanship. He dare not take time <strong>to</strong> study and amend the design just 

completed. To do so would decrease his output. (105) 

A good rating for work on new product and new service that may generate new business five or 

eight years hence, and provide better material living, requires enlightened management. He that 

engages in such work would study changes in education, changes in style of living, migration in 

and out of urban areas. He would attend meetings of the American Sociological society, the 

Business Section of the American Statistical Association, the American Marketing Association. 

He would write professional papers <strong>to</strong> deliver at such meetings, all of which are necessary for the 

planning of product and service of the future. He would not for years have anything <strong>to</strong> show for 

his labors. Meanwhile, in the absence of enlightened management, other people getting good

atings on short-run projects would leave him behind. (Deming 1986, 106) 

Stifling teamwork. Evaluation of performance explains...why it is difficult for staff areas <strong>to</strong> work 

<strong>to</strong>gether for the good of the company. They work instead as prima donnas, <strong>to</strong> the defeat of the 

company. Good performance on a team helps the company but leads <strong>to</strong> less tangible results <strong>to</strong> 

count for the individual. The problem on a team is: who did what? 

How could the people in the purchasing department, under the present system of evaluation, take 

an interest in improvement of quality of materials for production, service, <strong>to</strong>ols, and other 

materials for nonproductive purposes? This would require cooperation with manufacturing. It 

would impede productivity in the purchasing department, which is often measured by the number 

of contracts negotiated per man-year, without regard <strong>to</strong> performance of materials or services 

purchased. If there be an accomplishment <strong>to</strong> boast about the people in manufacturing might get 

the credit, not the people in purchasing. Or, it could be the other way around. Thus...teamwork so 

highly desirable, can not thrive under the annual rating. Fear grips everyone. Be careful; don't take 

a risk; go along. 

Heard in a seminar. One gets a good rating for fighting a fire. The result is visible: can be 

quantified. If you do it right the first time, you are invisible. You satisfied the requirements. That 

is your job. Mess it up, and correct it later, you become a hero. 

Two chemists work <strong>to</strong>gether on a project, and write up their work as a scientific paper. The paper 

is accepted for a meeting in Hamburg...only one of the pair may go <strong>to</strong> Hamburg <strong>to</strong> deliver the 

paper -- viz., the one with the higher rating. The one with the lower rating vows never again <strong>to</strong> 

work close with anyone else. 

Result: every man for himself. 

Evaluation of performance nourishes fear. People are afraid <strong>to</strong> ask questions that might indicate 

any possible doubt about the boss's ideas and decisions, or about his logic. The game becomes 

one of politics. Keep on the good side of the boss. Anyone that presents another point of view or 

asks questions runs the risk of being called disloyal, not a team player, trying <strong>to</strong> push himself 

ahead. Be a yes man. 

Top levels of salaries and bonuses are in many American companies sky-high. It is human nature 

for a young man <strong>to</strong> aspire...<strong>to</strong>...one of these positions. The only chance <strong>to</strong> reach a high level is by 

consistent, unfailing promotion, year after year. The aspiring man's quest is not how <strong>to</strong> serve the 

company with whatever knowledge he has, but how <strong>to</strong> get a good rating. Miss one raise, you 

won't make it: Someone else will. (108) 

A man dare not take a risk. Don't change a procedure. Change might not work well. What would 

happen <strong>to</strong> him that changed it? He must guard his own security. It is safer <strong>to</strong> stay in line. 

The manager, under the review system, like the people that he manages, works as an individual 

for his own advancement, not for the company. He must make a good showing for himself.

Another Irving Langmuir? Can American his<strong>to</strong>ry, under handicap of the annual rating, produce 

another Irving Langmuir, a Nobel Prize winner, or another W. D. Coolidge? Both these men were 

with the General Electric Company. Could the Siemens company produce another Ernst Werner 

von Siemens? 

...It is worthy of note that the 80 American Nobel prize winners all had tenure, security. They 

were answerable only <strong>to</strong> themselves. (Deming 1986, 109) 

“It can't be all bad.”...<strong>to</strong>p management delay[s abolition]...of the annual rating of performance...by 

refuge in the...corollary that “It can't be all bad. It put me in<strong>to</strong> this position.”...He reached this 

position by coming out on <strong>to</strong>p in every annual rating, at the ruination of the lives of a score of 

other men. There is a better way. 

Modern Principles of Leadership...will replace the annual performance review. The first step...will 

be <strong>to</strong> provide education in leadership. The annual perfor- (116) mance review may then be 

abolished. Leadership will take its place... 

The annual performance review sneaked in and became popular because it does not require 

anyone <strong>to</strong> face the problems of people. It is easier <strong>to</strong> rate them; focus on the outcome...Western 

industry needs...methods that will improve the outcome. Suggestions follow. 

1. Institute education in leadership; obligations, principles, and methods. 

2. More careful selection of the people in the first place.[5] 

It seems difficult <strong>to</strong> imagine selecting medical trainees by methods any more careful than current 

ones. 

3. Better training and education after selection. 

4. A leader, instead of being a judge, will be a colleague, counseling and leading his people on a 

day-<strong>to</strong>-day basis, learning from them and with them. Everybody must be on a team <strong>to</strong> work for 

improvement of quality in the four steps of the Shewhart cycle:... 

In the absence of numerical data, a leader must make subjective judgment. A leader will spend 

hours with every one of his people.. They will know what kind of help they need. There will 

sometimes be incontrovertible evidence of excellent performance, such as patients, publication of 

papers, invitations <strong>to</strong> give lectures. 

People that are on the poor side of the sytem will require individual help...(Deming 1986, 117) 

a. What could be the most important accomplishments of this team? What changes might be 

desirable? What data are available? Are new observations needed? If yes, plan a change or test. 

Decide how <strong>to</strong> use the observations.

. Carry out the change or test decided upon, preferably on a small scale. 

c. Observe the effects of the change or test. 

d. Study the results. What did we learn? What can we predict?...(Deming 1986, 88) 

5. A leader will discover who if any of his people is (a) outside the system on the good side, (b) 

outside on the poor side, (c) belonging <strong>to</strong> the system. The calculations required...are...simple if 

numbers are used for measures of performance. Ranking of people...that belong <strong>to</strong> the system 

violates scientific logic and is ruinous as a policy,... 

In the absence of numerical data, a leader must make subjective judgment. A leader will spend 

hours with every one of his people. They will know what kind of help they need... 

People...on the poor side of the system will require individual help....(Deming 1986 117)... 

7. Hold a long interview...three or four hours, at least...not for criticism, but for help 

and...everybody[‘s]...better understanding... 

8. Figures on performance should be used not <strong>to</strong> rank the people...that fall within the system, but 

<strong>to</strong> assist the leader <strong>to</strong> accomplish improvement of the system...(118) 

...Running a company on visible figures alone (counting the money). One can not be successful on 

visible figures alone...he that would run his company on visible figures alone will in time have 

neither company nor figures. 

...the most important figures...are unknown and unknowable..., but successful management must 

nevertheless take account of them. Examples. 

Fallacies of reward for winning in a lottery. A man in the personnel department of a large 

company came forth with an idea, held as brilliant...<strong>to</strong> reward the <strong>to</strong>p (274) man of the month on 

a certain production line (the man that made the lowest proportion defective over the month) with 

a citation. There would be a small party on the job in his honor, and he would get half a day off. 

This might be a great idea if he were indeed an unusual performer for the month. There were 50 

men on the production line. 

Do the results of inspection of their work form a statistical system...? If the work of the group 

forms a statistical system, then the prize would be merely a lottery...if the <strong>to</strong>p man is a special 

cause on the side of low proportion defective, then he is indeed outstanding. He would deserve 

recognition, and he could be a focal point for teaching men how <strong>to</strong> do the job. 

There is no harm in a lottery...provided it is called a lottery. To call it an award of merit when the 

selection is merely a lottery...is <strong>to</strong> demoralize the whole force, prize winners included. Everybody 

will suppose that there are good reasons for the selection and will be trying <strong>to</strong> explain and reduce 

differences between men. This would be a futile exercise when the only differences are random

deviations, as is the case when the performance of the 50 men form[s] a statistical system. 

(Deming 1986 275) [5] 

In a similar vein, Ierodiakonou and Vandenbroucke term medicine a s<strong>to</strong>chastic art: 

Ancient Greek philosophers thought that medicine was an art with peculiar characteristics, and 

they called medicine a s<strong>to</strong>chastic art. A doc<strong>to</strong>r might treat a patient conscientiously according <strong>to</strong> 

all learned precepts; yet the patients' condition might deteriorate. Another patient might be treated 

rather carelessly by another doc<strong>to</strong>r; yet the patient might regain full health. Thus, in medicine 

there exists unpredictability between means and ends. By contrast with other arts a diligent 

execution of the tasks does not guarantee a good outcome, and vice versa... 

...we have long witnessed a debate on the right way <strong>to</strong> measure the quality of medical care: should 

we use outcome or process criteria?...For instance, a few years ago, (542) a series of outcome 

investigations was started in the USA. Presumably, some administra<strong>to</strong>rs had been convinced that 

even in health care, quality of performance should be measured according <strong>to</strong> strict outcome 

criteria, as is practised in the Japanese car industry, for example. The simplest outcome measure 

was mortality in hospital. Third party payers such as the Health Care Financing Administration, 

which administrates Medicare, started <strong>to</strong> rank hospitals according <strong>to</strong> mortality rates for specific 

procedures. The mere idea sent ripples of alarm through the American Medical Association 

(AMA). Do not we intuitively know that medical centres with the highest reputation attract 

patients whose illnesses are close <strong>to</strong> being beyond rescue? Advanced epidemiological techniques 

have proved that differences in hospital mortality can be explained away by adjustment for 

differences in patient mix. To use outcome as a means <strong>to</strong> moni<strong>to</strong>r quality would necessitate 

continuous evaluation of all individual patient characteristics. This process would be a gigantic 

research effort, close <strong>to</strong> the examination of treatments in randomised controlled trials, and would 

defy all realistic efforts at quality assurance. The whole armamentarium of epidemiology and 

statistics, such as randomisation, matching, blinding, placebo-procedures, strict selection criteria, 

and modelling, aims at mastering the s<strong>to</strong>chastic elements that confound our judgment...[6] 

Why should the performance of trainees be deterministic, not s<strong>to</strong>chastic, inasmuch as the 

s<strong>to</strong>chastic vagaries of patient-characteristics (case-mix) must influence it, in any individual 

instance? The difference is that hospital-administra<strong>to</strong>rs have an influential voice in such 

decision-making. The individual trainee does not. 

Under pervasive fear of rating, the trainee dares not ask a question that he thinks that the rater 

might think foolish. He has <strong>to</strong> confine his questions only <strong>to</strong> what he considers “intelligent,” posed 

with the purpose in mind of impressing the rater with his insight and wisdom, beyond his years. 

The good regard of the rater is the trainee's life-preserver, on a s<strong>to</strong>rmy sea of insecurity. The 

trainee walks on eggshells, fearing that his every move, his every word is an element in a 

cumulative body of chit-marks that may eventually <strong>to</strong>rpedo his reputation. If he should get a black 

mark against him, for any reason, and the rater learns of it, the trainee shall thereby have lost the 

rater's support and enter career free-fall. Accordingly, he pre-edits all questions for acceptability 

before letting them out of his mouth. If he can't think of a zinger of a question, he'll most likely 

stay mum and live in ignorance about a broad variety of subjects, out of fear of asking a “stupid

question.” 

MIT Cal Tech and other high-prestige institutions have experimented with a pass-fail grading 

system because they unders<strong>to</strong>od the inherent absurdity of grading-systems, with their arbitrary 

cut-off points for each letter-designation. That represented a rejection of the very notion of 

grading. I'm not certain of the status at those institutions at the moment. Maybe their graduates 

have had difficulty translating their academic performance in<strong>to</strong> terms that other institutions, that 

recognize grading, understand, so maybe they've gone back <strong>to</strong> grading. 

4. Improper substitute for “where do I stand” 

Coens and Jenkins deplore performance-appraisals and, by implication, also LORs: 

At a recent quality conference, a CEO was questioned as <strong>to</strong> why his organization continued <strong>to</strong> use 

appraisals after shifting <strong>to</strong> a quality management culture of system and process 

improvement...“We think we owe it <strong>to</strong> people <strong>to</strong> let them know where they stand.”...(27) 

What people really want is access <strong>to</strong> the knowledge and information that influences the 

organization's pay, promotion, and status systems and how these affect or apply <strong>to</strong> them...People 

are insatiably curious about Where do I stand? because, in most organizations, this query is 

decided with a maze of unspoken rules, inscrutable political influences and other dynamics of 

organizational life. Appraisal is not the system that drives pay, careers, and status; it is an 

incidental effect of those dynamic systems. Appraisal is...the paper-shuffling that sanctifies 

decisions already made.(28)[7] 

The cognate of pay and promotion, in the corporate setting, is gaining acceptance in<strong>to</strong> a “<strong>to</strong>p” 

(whatever that might mean) training program, in the medical-educational setting. 

Too often, a trainee finds out, <strong>to</strong> his surprise or shock, where he stands only when he reads his 

retrospective performance-appraisal/LOR and, by then, it's <strong>to</strong>o late <strong>to</strong> do anything about it. LORs 

and performance-appraisals have an especially pernicious affect on medical students, at that 

vulnerable stage in their development, but they're bad for any trainee and for any person. 

5. Inaccuracy: 

a. misapplication of the Likert-scale 

“Likert” seems an unlikely choice for naming the method, since Likert used the scale in canvassing 

members of population-samples <strong>to</strong> obtain aggregate ratings of their attitudes in his 

1932-article[8], the presumed basis of the eponym. Likert, himself, had the good sense not <strong>to</strong> 

apply such rating scales <strong>to</strong> important matters that could affect people's livelihoods, though his 

predecessors already had and his successors still do. Likert[9] credited prior authors, Fechner and 

Gal<strong>to</strong>n, without citing a reference, for the origination of such questionnaires, circa 1888. Scott 

introduced the system <strong>to</strong> the United States Army in the early part of the last (20th) century[10]. 

Paterson, an employee of the Scott-Company, described a later adaptation of Scott's

method[11,12] for “objective” evaluation of job-performance, the purpose of interest here. 

With no apparent insight in<strong>to</strong> the inherent vagueness of the method, Paterson claimed <strong>to</strong> 

distinguish objective from subjective qualities without doing so: 

objective qualities . . . “efficiency,” “originality,” “perseverance,” and “quickness” . . . subjective 

qualities . . . “courage,” “cheerfulness” and “kindliness.” . . .[12] 

The criteria he cited for rating workers were similarly vague: 

Ability <strong>to</strong> Learn, Quantity of Work, Quality of Work, Industry, Initiative, Co-operativeness, 

Knowledge of Work[13] 

Strangely, Paterson disregarded the opportunity for evidence-based assessment of the criterion 

most amenable <strong>to</strong> objective evaluation, namely quantity of work, in terms, say, of number of units 

the worker produces per unit-time. Instead, the instructions bade the rater give a worker on that 

criterion a rating-score, presumably <strong>to</strong> foster “uniformity”[14] with ratings of other criteria, not 

amenable <strong>to</strong> objective assessment. 

Other authors described errors and pitfalls inherent in the method. Thorndike first described the 

halo effect as a 

. . . constant error <strong>to</strong>ward suffusing ratings of special features with a halo belonging <strong>to</strong> the 

individual as a whole[15] 

He found even the most capable rater 

unable <strong>to</strong> treat an individual as a compound of separate qualities and <strong>to</strong> assign a magnitude of 

each . . . in independence of the others.[16] 

As a countermeasure <strong>to</strong> minimize the halo-effect, he exhorted: 

. . . the observer should report the evidence, not a rating, and the rating should be given on the 

evidence <strong>to</strong> each quality separately without knowledge of the evidence concerning any other 

quality in the same individual.[16] 

Thorndike did not explain how a rater could avoid having knowledge of ratings he had given on 

other criteria listed on the same form. The “evidence” Thorndike had in mind consisted in vague 

descriptive adjectives, similar <strong>to</strong> those that Paterson cited,[17] that the rater had a general 

impression might apply <strong>to</strong> the ratee. 

Kingsbury addressed accuracy: 

. . . ratings as ordinarily made are . . . unreliable, and . . . only under what may be called ideally 

favorable conditions will they approximate accuracy, even on a scale so gross as one of five

divisions. (18) 

Kingsbury enumerated those allegedly ideal conditions: 

Ratings, <strong>to</strong> be reliable, necessitate (1) averaging three independent ratings, each made on an 

objective scale; (2) these scales must be comparable and equivalent, made in conference under 

expert supervision; (3) the three raters must be competent <strong>to</strong> rate.[18] 

Paterson joined, with slightly different wording, in affirming Kingsbury's ideal conditions (2) and 

(3) and in thus implicitly alluding <strong>to</strong> pitfalls of the method: 

. . . Ratings should be accepted and filed for use only from those who have proved themselves 

capable of accurately judging human qualities. . . a rating scheme will not work au<strong>to</strong>matically. It 

must be closely supervised preferably by trained personnel research workers who must continually 

subject the ratings <strong>to</strong> critical analysis and assist in training executives in proper use of the method. 

There is no escape from this requirement.[19] 

Paterson and Kingsbury omitted mention of what specifics the training they proposed for the 

personnel-research workers should comprise and accomplish but they presumably intended, 

among other things, that the trained supervisors should somehow ensure separate evaluation of 

labeled traits <strong>to</strong> exclude Thorndike's halo-effect; then, by averaging, fine-tuning, adjustment and 

manipulation of the scores from at least three raters, all of whom knew how <strong>to</strong> provide accurate 

ratings (presumably assessed by the raters' mutual agreement on each candidate's score on each 

criterion) obtain a set of ratings consistent with the aggregate global impression each candidate 

made on the raters (the candidate's halo). The circularity of the rationale seems inescapable. 

Prior <strong>to</strong> receiving requests <strong>to</strong> fill out forms consisting of Likert-scale ratings on others' 

performance, I have never received any of the extensive training or testing <strong>to</strong> prove myself 

“capable of accurately judging human qualities,” nor, I daresay, has any appraiser of my 

performance received such training and testing, <strong>to</strong> my knowledge. The origina<strong>to</strong>rs of such forms 

seemed <strong>to</strong> assume that the rating scemes would “work au<strong>to</strong>matically,” contrary <strong>to</strong> Kingsbury's 

admonition. 

Rugg may have had more insight: 

. . . The unordered -- yes, the chaotic -- character of the judgments appears, irrespective of what 

traits are considered or of what kinds of scales are compared. I now believe that the evidence 

establishes the futility of obtaining single “ratings” on point scales of such dynamic qualities as 

“intelligence,” “personal qualities,” “general work,” and the like.[20] 

Paterson cautioned and predicted: 

These rating methods should not be looked upon as perfect or final. Further research is necessary, 

and industry will profit . . . as progressive, experimentally minded executives realize the scope of 

the problem and engage in the necessary research . . . <strong>to</strong> develop newer and more reliable methods

than we now possess.[21] 

The progress Patterson envisioned has been slow in developing, as the medical-education 

evaluation-literature amply shows [22- 28]. The Likert-scale remains alive, well and unimproved 

since Paterson, Kingsbury and Thorndike fretted over it and <strong>to</strong>rtured it and since Rugg dismissed 

it as inherently invalid over eighty years ago. 

Rating-criteria in medical education continue <strong>to</strong> be as vague as Paterson's, e.g., “general medical 

knowledge (1-5),” “procedural skill (1-5),” “rapport with patients (1-5),” “rapport with nurses 

(1-5),” “overall general impression (1-5)” (the most global “halo”-criterion of all) and the like. 

Many,[22-28] though not all[29,30] current users and discussants of the Likert-scale treat it as an 

axiomatically good and self-explana<strong>to</strong>ry scheme. 

In current medical-education usage, men<strong>to</strong>rs rate trainees and trainees rate men<strong>to</strong>rs without any 

expert supervision - in disregard of Kingsbury's ideal conditions[18], Paterson's precautions[19] 

and Rugg's skepticism[20] -- perhaps in imitation of Likert[8], who may have felt justified in 

ignoring the precautions, conditions and invalidity of the method for objective evaluation because 

he pursued only subjective attitudes rather than purportedly objective traits. Yet, those who 

publish studies based on Likert-scale “data” apply the numerical scores derived as if they were 

facts and manipulate them with parametric statistics as if they were not ordinal[ 22-28] . 

Guilford[30] appears <strong>to</strong> equate Likert-scales with formal psychometric tests by including them in 

his book, entitled, “Psychometric Methods.” Worse, many seem <strong>to</strong> follow Thorndike[9] in 

attributing traits <strong>to</strong> ratings, a tendency Tryon deplores, even for psychological tests, which are 

more formal than Likert-ratings, yet purveyors of Likert -ratings attribute traits <strong>to</strong> them: 

The test-trait fallacy [consists in presuming] that test scores provide measures of enduring and 

generalized characteristics of the person, called traits. . . 

The test-trait fallacy begins with the assumption that test scores are trait measures. The second 

assumption is that trait measures are basic properties of the person. It easily follows that test 

scores reflect basic properties of the person. . . hence a measurement is reified in<strong>to</strong> a causal force. 

. . the unsound logic of drawing inferences about ability on the basis of observed performance is 

integral <strong>to</strong> the test-trait fallacy. . .[32] 

Traits are alluring because they are . . . compatible with the stimulus-organism-response paradigm 

<strong>to</strong> which virtually all psychologists subscribe. . . To presume that psychological tests . . . measure 

organismic traits and <strong>to</strong> further presume that such traits are the basic properties that cause 

behavior is <strong>to</strong> place the psychologist in an attractively powerful theoretical and clinical position. 

The volume of psychological tests . . . is evidence of their allure for clinicians and researchers 

alike.[33] 

Authors even apply statistical methods <strong>to</strong> aggregate number-scores from a group of raters, 

compute inter-observer correlations and the like. Literature-approval of Likert-scale “data” 

encourages decision-makers <strong>to</strong> attach unwarranted worth <strong>to</strong> Likert-scale merit-ratings and

serenely <strong>to</strong> apply them in life-altering decisions <strong>to</strong>uching subordinate trainees,[22] such as 

recommendation for certifying examinations, and employment, and even in promoting 

faculty-members[25]. 

Albanes[34] suggests that “real life” ratings, presumably of qualified physicians, are objective and 

based on outcomes, yet Carey[35] asserts that evaluations of physician-faculty must be subjective. 

Codman[36] and his spiritual successors[37-41] have called for outcome-based rating of 

performance and, by extension, of competence, but physicians and hospitals have pointed the 

deficits of that method and prevented its spread, <strong>to</strong> date, by citing the multiplicity of fac<strong>to</strong>rs, 

unrelated <strong>to</strong> institutional or physician-competence, that determine outcome.[42] 

The champions of rating attribute two roles <strong>to</strong> it, evaluative or summative (entailing punitive and 

deterrent purposes) and formative.[42] Paterson <strong>to</strong>uted the formative purpose: 

I. Rating methods have been developed because of a recognition of the educational value of 

ratings . . . 

a. . . . on those who make the ratings. . . insures the analysis of subordinates in terms of the traits 

essential for success in the work. 

b. . . . on the employee. . . encourages self-analysis and provides an incentive for 

self-improvement in . . . traits in which he is weakest.[35] 

As educational feedback, rating fails <strong>to</strong> fulfill Ziegenfuss' proposed criteria for adequacy and 

efficacy: 

. . . the art of feeding back quality-related data is a critical point of quality improvement work. . . 

Feedback is effective when the following conditions are met: 

1. Clarity of Purpose. Data can be used for development or for rendering judgment (formative 

versus summative . . .). . . for . . . organizational development, . . . the purpose is . . . formative . . 

. Learning and change <strong>to</strong> improve processes is the goal. A judgmental purpose (summative) offers 

a . . . grade of pass or fail and is designed for accountability. . .[45] 

Since “accountability” entails punishment[46], it does not belong in any workplace.[4] In 

education, by definition, the only appropriate purpose of feedback is the formative one. The 

Likert-rating, in its cus<strong>to</strong>mary application, succeeds in the summative, punitive goal of criterion 1 

but fails in its formative goal. 

2. Clear and Specific Data. Data . . . must be . . . relevant <strong>to</strong> the . . . recipient.[45] 

The vague expression, “general medical knowledge, 3” (or any other number) is unclear and 

non-specific, so rating fails criterion 2 and is not relevant <strong>to</strong> the recipient (see criterion 5).

3. Descriptive, Not Evaluative. Useful feedback describes what is happening but does not offer an 

evaluative judgment (unless that is the intended purpose). The presenters must not rush <strong>to</strong> 

judgment without some interactive discussion with the audience.[45] 

The Likert-scale rating substitutes for relevant evidence and thus fails criterion 3. 

4. Timely. How close <strong>to</strong> the action . . . reviewed are the data describing the events? The golden 

rule is quick feedback . . . Old data is useful for his<strong>to</strong>rical and longitudinal purposes but is not 

supportive of behavior change in the near term.[45] 

The team, employed long-term, may <strong>to</strong>lerate a monthly, quarterly or semi-annual feedback-cycle. 

The medical student or other trainee, who often has monthly rotations in clinical departments, 

needs a shorter feedback-cycle. Feedback should be continuous, its formal aspect should be at 

least weekly, and, preferably, daily. The commonest Likert-rating comes <strong>to</strong> the ratee's attention as 

a summative, end-of-rotation event, delivered <strong>to</strong>o late for him <strong>to</strong> implement improvement, so it 

fails criterion 4. 

5. Limited. How great is the scope of the data? . . . tailor and focus the data <strong>to</strong> fit the specific, 

targeted needs of users. . .[45] 

A Likert-rating, e.g., “general medical knowledge, 3,” is <strong>to</strong>o vague and global <strong>to</strong> serve a ratee's 

needs. It invites the so-called halo-effect and fails criterion 5. 

6. Comparative. . . To leave out comparative information is <strong>to</strong> deprive the recipients of 

knowledge about their progress or lack thereof. . .[45] 

Albanes[34] deplored the rater's “failure <strong>to</strong> discriminate” among trainees in awarding them equal 

marks. He thereby pursued a similar goal of making distinctions for distinctions' sake alone and 

disregarded the “lottery”-nature of rating people who operate “within the system,”[5] 

Kingsbury likewise suggested: 

. . . we do have <strong>to</strong> make distinctions between people . . . 

. . . and the rater should realize that it is not so disastrous <strong>to</strong> make some employees 2 who are not 

much worse than some he marks 3, as it is <strong>to</strong> mark them all alike <strong>to</strong> avoid seeming <strong>to</strong> magnify the 

difference. . .[47] 

As Deming eloquently explains,[5] it's disastrous for an individual <strong>to</strong> suffer a low rating. A low 

rating may be especially crushing <strong>to</strong> a medical student, accus<strong>to</strong>med such a tender soul often is, 

from the experience of a lifetime, <strong>to</strong> high academic ratings. 

If two or more employees or trainees perform equally well and very well, say, 5 of 5, they would 

deserve equal marks because equality of their performance reflects truth. The company, <strong>to</strong> which 

marking two or more employees alike, e.g. 5 of 5, may seem disastrous, can stand the gaff more

easily than an individual arbitrarily marked down, despite his best effort, merely <strong>to</strong> “make 

distinctions between people.” Neither Albanes[34] nor Kingsbury[47] justified the need <strong>to</strong> make 

such distinctions. He presumably considered the principle axiomatic and self-evident. 

Since the end-of-rotation Likert-scale rating provides no progressive comparisons and since 

men<strong>to</strong>rs might balk at the administrative burden of completing Likert-scale ratings more often 

than once monthly, it provides no sequential comparison and fails criterion 6. 

7. Participative Interpretation. . . . final . . . analysis can[not] be conducted without audience 

involvement. Joint interpretation is consistent with the developmental/formative purpose, as 

<strong>to</strong>gether we discuss meaning and follow-up action . . .[45] 

In the medical-education context, the Likert-scale rater rarely discusses his rating with his ratee(s) 

prior <strong>to</strong> entering it. It comes most often <strong>to</strong> the recipient's attention as a fait accompli, <strong>to</strong>o late for 

him <strong>to</strong> improve it. The Likert-scale rating fails criterion 7. 

8. Safety and Security. Receiving performance feedback is . . . technical and . . . psychological . . . 

We need first <strong>to</strong> have the data correct (technical). . . Presenters must be sensitive <strong>to</strong> the 

psychology of the process and offer language and behavior that protect the recipients.[45] 

The Likert-scale rating inherently fails the technical criterion, since it consists of a set of numerical 

scores which obscures the evidence that purports <strong>to</strong> form its basis. Various errors, <strong>to</strong> wit, the 

halo-effect (supra) and tendency <strong>to</strong>ward the mean[48,49] inhere in the Likert-scale. 

As applied, it most often fails the psychological criterion since the social-control function, which 

Albanes[34] advocated is crucial <strong>to</strong> its deterrent/punitive function. To pull a punch at the moment 

of delivery would diminish or annihilate the crushing impact the rater can otherwise accomplish. 

9. Practical and Action Oriented. To be useful, the data should suggest some followup action and 

should be practical enough <strong>to</strong> be used by professionals in the field. . . [32,50] 

Having received a rating of, e.g., “general medical knowledge, 3,” the recipient can discern no 

idea from the rating how <strong>to</strong> improve. The Likert-rating fails criterion 9. 

The evidence seems clear that ratings fail all of Ziegenfuss's rational criteria for effective feedback. 

b. inevitability of rating-inflation 

A universal human conceit holds that everybody's a fool and a moral pervert except for thee and 

me and I'm not so sure about thee. The individual expects others <strong>to</strong> rate him in a manner 

consonant with the intrinsic, superlative characteristics that he attributes <strong>to</strong> himself. When 

men<strong>to</strong>rs, in a medical-education setting, rate him harshly, he feels helpless and often non-plussed 

and feels an urge <strong>to</strong> press his raters <strong>to</strong> improve his rating. 

Some years ago, Sissela Bok, philospher and wife of Derek Bok, former President of Harvard

University, addressed merit-ratings on “fitness-reports” in the US Army. Her context was “lying” 

and her example of a liar was the supervisor who rated his subordinates <strong>to</strong>o highly on traits, such 

as “leadership,” “appearance,” etc., which are at least as nebulous as entities that raters in 

medicine attempt <strong>to</strong> address, e.g., “general medical knowledge,” “rapport with staff,” etc. They're 

all manifestations of the great tendency <strong>to</strong> generalize from skillful execution of a narrow scope of 

activities, such as getting high scores on tests, <strong>to</strong> global “excellence,” “outstandingness” or 

“bestness,” in general. 

Bok's description shows that your observation that “excellent” is a third-tier rating has a his<strong>to</strong>ry: 

...Those who rate officers are asked <strong>to</strong> give them scores of “outstanding,” “superior,' “excellent,” 

“effective,” “marginal,” and “inadequate.” Raters know...that those who are ranked anything less 

than “outstanding” (say “superior” or “excellent”) are then at a great disadvantage, and become 

likely candidates for discharge...superficial verbal harmlessness combines with the harsh realities 

of the competition for advancement and job retention <strong>to</strong> produce an inflated set of standards <strong>to</strong> 

which most feel bound <strong>to</strong> conform. (Bok 73) 

...The US Army tried <strong>to</strong> scale down evaluations by publishing the evaluation report...cited. It 

suggested mean scores for the different ranks, but few felt free <strong>to</strong> follow these means in individual 

cases, for fear of hurting the persons being rated. As a result, the suggested mean scores once 

again lost all value. (Bok 74) [51] 

Professor Bok, writing as a member of the establishment. LORs and performance-evaluations 

cause little <strong>to</strong> no worry <strong>to</strong> her and her husband, who have made it <strong>to</strong> the <strong>to</strong>p of the academic 

heap, from which pinnacle, they may comment on us, herebelow: 

In elite . . . organizations, the evaluation model tends <strong>to</strong> be elitism. Two lines of argument are 

involved. First, since the organizations have selected the best people, evaluation of performance is 

irrelevant. After all, if the best people could not succeed, who could do better? Second, since the 

quality of the organizations and their output is determined primarily by the equality of their 

people, attention <strong>to</strong> system, methods, or management is inconsequential. It follows that, if the 

organizations already have the best people, “the opportunities for increased productivity in them 

are small and come slowly.” Finally, . . . elitism tends <strong>to</strong> create self-perpetuating closed circles 

whose members are exempt from review except by peers within. 

Converting work problems in<strong>to</strong> people problems is a process of denying organizational 

accountability. It is a process of establishing a hierarchy of special privilege and immunity <strong>to</strong> rank 

with the hierarchy of authority. It is a process of maintaining the status quo; it denies both the 

need for change and the possibility. (27)[52] 

Accordingly, Professor Bok focused on the “lies” perpetrated <strong>to</strong> help the plebeian but omitted any 

mention of organization dishonesty: the rumor-grapevines, chiefly by telephone, which leave no 

paper-trail, and which circumvent and subvert the normal channels of committed, transparent, 

written communication, <strong>to</strong> which subjects of ratings on LORs might obtain access.[53] 

Personnel-managers use such underhanded means <strong>to</strong> evade legal liability for defamation of

character <strong>to</strong> find out from former employers “what applicants are really like.” 

You wrote in your article[1] in nearly identical terms of the inevitable tendency <strong>to</strong>ward rating 

inflation, your “hierarchy of superlatives,” the Lake Wobegon effect, in which everybody is 

“above average,” and the tendency <strong>to</strong>ward rating-fragmentation <strong>to</strong> permit raters <strong>to</strong> distinguish the 

"is one of the finest medical students of the year," "...one of the best medical students I have ever 

worked with," "richly deserves the honors awarded in the rotation," or "receives my highest 

recommendation",[54] from among the best and those who are the very best in the past year, the 

best ever, etc., etc. Speer et al cited grade-inflation in internal medicine as well: 

. . . a significant number of clerkship direc<strong>to</strong>rs (43%) felt that we are unable <strong>to</strong> appropriately 

identify students with failing performances. The implication for our ability <strong>to</strong> certify students as 

clinically competent is concerning. . . (116)[55] 

That's evidently not their concern. They express more concern with labeling trainees clinically in 

competent. 

. . . faculty were the key <strong>to</strong> both the cause and solution. (116)[55] 

That is a truer statement than Speer et al perhaps realized, though faculty would probably prefer 

<strong>to</strong> blame the trainee-victims. 

Yet, clinical medicine simply doesn't contain tasks of sufficient sophistication that trainees could 

perform that would enable a trainee could distinguish himself from his fellows <strong>to</strong> the extent 

depicted in all the finely nuanced and ever mounting expressions of enthusiasm. The difficulty 

would be quite similar <strong>to</strong> the difficulty of rating a patient in similar terms, according <strong>to</strong> his 

response <strong>to</strong> treatment. Objectively, he either gets better, stays the same or gets worse. It's difficult 

<strong>to</strong> imagine that an evalua<strong>to</strong>r of patients could find rational criteria for appraising a patient's 

recovery as “excellent,” “outstanding,” one of the best on the ward,” “one of the best in the past 

year,” “the best ever,” etc. If a rater can't do it for a patient, how can he do it for a trainee? 

Gould attributed the fallacy of confusing objects with labels <strong>to</strong> John Stuart Mill: 

The tendency has always been strong <strong>to</strong> believe that whatever received a name must be an entity 

or being, having an independent existence of its own. And if no real entity answering <strong>to</strong> the name 

could be found, men did not for that reason suppose that none existed, but imagined that it was 

something peculiarly abstruse and mysterious.[56] 

Gould cited the fallacy in noting that Benet, origina<strong>to</strong>r of IQ, intended none of the social elitism <strong>to</strong> 

which it has given rise.[56] Such reification of jargon is a prominent feature also of rating 

practice. 

c. popularity-contest 

What feats of clinical derring-do can a trainee, at any level, perform that would make him so much

etter than any of his contemporaries that he would qualify for such sterling and distinctive 

accolades as "is one of the finest medical students of the year," "is one of the best medical 

students I have ever worked with," "richly deserves the honors awarded in the rotation," or 

"receives my highest recommendation",[54] in contradistinction <strong>to</strong> his fellows, whose 

performance might rate a mere “excellent”? 

Did pediatric resident A miraculously heal a girl with Friedrich's Ataxia so she never progressed 

and even achieved a normal gait? If so, how did he do it? By Divine Intervention? By Black 

Magic? By weird science? Miracle-healing, if accomplished, would obviously exceed cus<strong>to</strong>mary 

expectations and be well the upper control-limit of performance that Deming defines as “within 

the system.” Miracle-healing may thus warrant the highest accolades but even outstanding 

residents rarely <strong>to</strong> never perform it. 

The only realistic answer that comes <strong>to</strong> my mind is that the highly regarded trainee manufactures 

his high regard by ingratiating himself, through force of intrinsic personality or insidious, political 

means, in<strong>to</strong> the rater's favor. The rater then comes <strong>to</strong> like the trainee personally so much that he's 

willing <strong>to</strong> go out on a limb for him with various superlative terms of enthusiasm, presumably 

assuming his performance be at least adequate. In other words, rating of trainees for 

personnel-records and the LOR are popularity-contests. Who has the bubbliest personality? Who 

is the most “well liked?”[57] 

Such a system could select for those who go along <strong>to</strong> get along and who may rate pleasing their 

administrative superiors <strong>to</strong> enhance the chance of their own advancement as more important than 

performing what's right for a patient, perhaps contrary <strong>to</strong> the will of his superiors. Such disregard 

of objectively correct performance may lead <strong>to</strong> deterioration of quality of patient-care, ostensibly 

the opposite of rational goals for a health-care system. 

d. mismeasure of “excellence” 

Mere prattle without practice doesn't necessarily tranfer well <strong>to</strong> good real-world outcomes. 

Howard Zinn spoke of “the best and the brightest”: 

The New York Times did a survey of high-school students <strong>to</strong> see how much his<strong>to</strong>ry they knew. 

They do this every few years. They do a survey of young people <strong>to</strong> prove how dumb they are and 

<strong>to</strong> prove how smart are the givers of the tests and so they gave this test <strong>to</strong> high-school seniors and 

corroborated what they thought. Young people don't know anything about his<strong>to</strong>ry. They asked 

questions like, “Who was the President during the War of 1812?” “Who was the President during 

the Mexican War?”...We're in a great quiz-culture...“What came first the Homestead Act or the 

Civil Service Act?” You recognize questions like that because those are the questions that appear 

on tests which enable you <strong>to</strong> get in<strong>to</strong> graduate-school. You can go very far if you know enough of 

those answers. You'll be Phi Beta Kappa. You'll become an advisor <strong>to</strong> the President of the United 

States. You remember the book, The Best and the Brightest, which was precisely about that 

point, that the people surrounding the President were...the people who got the highest scores. 

They were Phi Beta Kappa and they were the architects of the War in Vietnam.[58]

Holman cited an analogous problem related <strong>to</strong> inflated self-esteem, the ‘excellence' deception in 

medicine[59]. 

Simpson addressed the examination-system but his remarks apply at least as well <strong>to</strong> any rating 

system: 

...the traditional examination system...achieves...pseudo-precision, for it has chosen the accurate 

measurement of the barely relevant in preference <strong>to</strong> the less precise measurement of the most 

highly relevant...our cultural bias <strong>to</strong>wards believing that anything expressed in numbers must be 

significantly more true than the same thing expressed in words...allows the student <strong>to</strong> accumulate 

a sequence of numerical ascriptions and grades, often of very dubious reliability and 

validity...added <strong>to</strong>gether and averaged <strong>to</strong> help us guess at whether he is fit <strong>to</strong> leave medical 

school. This is as logical as making a pre-operative surgical assessment by adding and averaging 

your patient's haemoblobin, potassium, urea and blood sugar levels. It produces results...of little 

or no predictive validity and...neither tell the student who has passed the exam why he has done 

well (so that we can be reasonably sure he can do it again) nor tell the student who has failed 

anything of much use <strong>to</strong> him in avoiding further failure...[60] 

e. men<strong>to</strong>r-inattention 

The descriptions of how recipients of LORs perpetrate Mill's reification-fallacy in an attempt <strong>to</strong> 

attach specific meanings <strong>to</strong> various phrases that the phrases themselves don't necessarily 

denote[61], seems especially anomalous in a context in which the author may be a 

department-chairman who may even concede that he has never had any contact with the trainees, 

about whom he has a duty <strong>to</strong> write LORs, not have even what Albanes called 

The episodic, fragmented, and...small amount of contact that clinical faculty have with 

students...(Albanes 653)[34] 

Albanes claimed that that circumstance 

...leaves them [raters] reluctant <strong>to</strong> make ratings that would call attention <strong>to</strong> students' performance 

deficits...(Albanes 653)[34] 

In those circumstances, faculty-members' reluctance <strong>to</strong> make ratings of any sort , at all, would 

bespeak their simple honesty. Yet, somehow, most faculty-members, whether in good conscience 

or not, rate their students and other trainees after clinical rotations and later when they write 

LORs for them. 

Kefalides affirms faculty-expectations and complains that insurance-rules newly require 

faculty-members <strong>to</strong> take care of patients and thus provide golden opportunities for clinical 

teaching, which he seems <strong>to</strong> disparage.[62] 

Cydulka et al present time-cosuming, close observation of trainees as a startling new 

departure.[63]

In the industrial setting, in which TQM arose, nobody could ever confuse the manufactured 

product with the worker whose efforts produce it. In another article,[64] Albanes did just that. He 

attempted <strong>to</strong> apply TQM <strong>to</strong> medical education but, in the process, he conflated students as human 

beings with students as objects, products of the education-process and got his ideas twisted. As a 

result, in one section of his article, grading is good, while in another, it's bad. The very fact that 

Academic Medicine published his article indicates the likelihood that the thinking, among many 

academics, about rating and evaluating the performance of trainees and others is confused. 

f. self-fulfilling prophecy 

Bosk noted: 

One striking feature of the clinical judgment of residents is how easily the whole process may turn 

in<strong>to</strong> self-fulfilling prophesy ( sic )....good reputations exercise a protective or deviance-reducing 

effect while bad ones generate a destructive or deviance-amplifying one. If a resident is considered 

trustworthy, moni<strong>to</strong>ring by attendings is decreased. Therefore, deficiencies are less likely <strong>to</strong> be 

discovered. Conversely, if a resident is suspect, moni<strong>to</strong>ring increases. Convinced that they are 

there for the finding, an attending is more likely <strong>to</strong> find evidence of sloppy work. When found, 

these only increase surveillance, which again increases the probability of mistakes. Clearly 

suspicion does not create residents who are unfit-- after all, something creates the suspicion. 

Nonetheless, being suspect is for a resident a very vulnerable and demoralizing position. Not only 

that, being above suspicion gives a fair amount of protection, especially when mistakes need not 

be seen as innocent error. Given these dynamics, it is not surprising that those who fall on the 

short end of evaluation (or their at<strong>to</strong>rneys) often characterize it as arbitrary and capricious.[65 

Strangely, when a physician so abuses ancillary personnel that they lose their self-confidence in an 

analogous manner, he becomes a “disruptive physician,”[66] fit only for expulsion. Yet, in the 

setting of medical education, such abuse is <strong>to</strong>lerable, even cus<strong>to</strong>mary. 

g. Absence of evidence-basis 

Rating/evaluation is particularly vulnerable <strong>to</strong> charges of resting on an inadequate evidence-basis: 

...In perusing the folders of the residents in the training program that I studied, I found only one 

evaluation that mentioned a specific incident. This leads me <strong>to</strong> suspect that residents who are 

dismissed from programs could easily argue that their “due-process rights” were violated, which 

raises a very thorny issue. Surgery <strong>to</strong> a large degree rests on peer trust, and it is unclear what 

degree of formal, concrete evaluation is consistent with that trust. (12)[67] 

Two pediatric core-curricula have come out for emergency-pediatrics,[68,69] one for pediatric 

interventional cardiology,[70] one core-content inven<strong>to</strong>ry for adult emergency-medicine[71] and a 

retrospective inven<strong>to</strong>ry of diagnoses encountered in internal-medicine residency.[72] 

Other core-curricula may exist in other specialties, yet, in no specialty, do recommendations for 

the LOR relate in any manner <strong>to</strong> specific elements of any defined core-curriculum. If the LOR is

supposed <strong>to</strong> reflect job-performance, what justification is there for omitting any mention of 

job-performance criteria, delineated in national core-curricula, core-content statements or 

otherwise? 

In all the literature on LORs and evaluations, none that I've seen suggest including the cumulative 

statistics on clinical outcomes of patients under the care of the subject of the LOR. Yet, without 

such evidence of actual job-performance, in terms of numbers and proportions of patients saved, 

lost and improved, the rest is nothing. 

The medical literature is replete with accounts of physicians' inaccurate performance-appraisals of 

their colleagues and of trainees.[73-86] Those accounts render the idea of entrusting 

performance-appraisal of anyone <strong>to</strong> physicians patently absurd. 

Perhaps the most concrete, objectively verifiable category is “procedural skills.” The trainee either 

succeeds at the lumbar puncture by obtaining CSF or not, succeeds in intubating a patient or not. 

No performance-evaluation I've ever seen has any space devoted <strong>to</strong> citing the specific number of 

procedures that the men<strong>to</strong>r observed the subject performing, far less a score-card that documents 

how many he performed successfully and in how many he failed. What would be the distinction in 

a rating of 3/10 vs. a rating of 7/10 in the category, “procedural skills?” One might imagine that 

the evaluee succeeded in 30% or 70%, respectively, of the procedures he performed during a 

clinical rotation. Did a month-long rotation provide even ten opportunities for each of, say three 

trainees, <strong>to</strong> perform lumbar punctures or intubations? It seems unlikely. 

If the trainee's score was low, where is the documentation of the help that the men<strong>to</strong>r provided <strong>to</strong> 

the trainee <strong>to</strong> improve his performance? I've never seen it and the proposed standard <strong>Letter</strong> of 

Recommendation (SLOR), in emergency-medicine, omits mention of anything like it.[54,63] 

Where is the documentation of the progress in the trainee's score during the month? Was his score 

2 of 10 at the beginning of his rotation and 8 of 10 at the end? I've never seen anything like that, 

either, possibly because a trainee may have one opportunity <strong>to</strong> perform one clinical procedure in a 

month, if he's lucky. The cus<strong>to</strong>mary evaluation is post hoc, delivered as a summative accolade or 

condemnation, long after the trainee can do anything about his scores. 

What was the quality of his performance? What did he do <strong>to</strong> succeed in the procedure, if he 

succeeded? Did he fracture teeth of patients he intubated? If so, how many each? How many of 

the patients on whom he performed a lumbar puncture required a blood-patch afterwards <strong>to</strong> stem 

post-procedure CSF-leakage? I've never seen any such evaluation in writing. 

How many of the procedures that the evaluee performed did the men<strong>to</strong>r personally observe? 

SLOR has no space for any such entry[54,63]. Is the rating based on a “general impression” of the 

evaluee's procedural skill, as an intrinsic trait, derived from rumor? If so, upon what specific 

evidence or criteria did the evalua<strong>to</strong>r base the score that he assigned? 

Is one of the men<strong>to</strong>r's considerations his own anxiety over giving the evaluee a big head? Did the 

evaluee, rather, need a low score <strong>to</strong> give him a harsh dose of “reality?” If so, on what evidentiary

criteria did the evalua<strong>to</strong>r base his concept of “reality,” such that a low rating would give the 

evaluee a dose thereof and, in some sense (what sense?) improve the evaluee's outlook? Did the 

evalua<strong>to</strong>r apply his dose of reality <strong>to</strong> all evaluees consistently? If not, why not? Did he condemn 

those whom he personally disliked (perhaps because they asked him <strong>to</strong>ugh questions <strong>to</strong> which the 

men<strong>to</strong>r felt embarrassed at not knowing the answers) and favor those whom he personally liked 

(perhaps because they never asked him any <strong>to</strong>ugh questions)? If he applied his dose of reality 

consistently, without regard <strong>to</strong> the evaluee's actual performance (which the evalua<strong>to</strong>r may never 

have observed -- my consistent experience, throughout “training”), isn't that practice arbitrary, 

unreasonable and capricious, i.e., a manifestation of chaos and irrationality, in a setting where 

rational thought is supposed <strong>to</strong> prevail? 

Most important, what does the rating score tell the relevant candidate about what he should do <strong>to</strong> 

improve his performance? 

One might argue that success in procedures, like medicine, itself, is a s<strong>to</strong>chastic matter[5,6], i.e., 

that some procedures fail even in the best of hands and some succeed even in the worst of hands, 

in whatever sense of “best” and “worst” one might choose <strong>to</strong> apply. I would reply that that's 

correct. Success in procedures is, at least <strong>to</strong> some extent what Deming terms a lottery,[5] no 

question. Given that truism, what's the point of making “procedural skills a ratable category, in 

the first place? 

h. glittering generalities 

Greenburg et al wrote, in relation <strong>to</strong> LORs: 

Brevity and generality...come across as distinctly negative features, causing the reader <strong>to</strong> 

wonder...whether the writer actually knows the applicant. (197)[87] 

Yet, the evaluation-criteria, upon which LORs are most often based, rely upon brevity and 

generality, presumably in the assumption that evalua<strong>to</strong>rs' general opinions of candidates reflect the 

truth. No evidence supports that proposition and my personal observation is that it is false. If 

brevity and generality be negative features of a LOR, how can the same features be acceptable in 

the underlying evaluation-criteria? 

Bosk terms vague indices of “quality” of the candidate, such as “general medical knowledge,” 

“rapport with staff” and the like “essentially-contested concepts.”[67] They are summative, 

glittering generalities, intended <strong>to</strong> make the evaluation-form brief, that have no necessary 

evidentiary relation, either <strong>to</strong> the subject-physician's actual performance, clinical acumen or <strong>to</strong> 

clinical outcomes of his patients. 

In the field of academic emergency-medicine, Harwood et al referred <strong>to</strong> various elements of 

evaluative jargon: 

Of the applicants submitting SLORs <strong>to</strong> our EM residency program, 49% or more received the 

superlative response in the categories of "commitment," "work ethic," and "personality." In

contrast, only 35% of the applicants received the superlative response regarding their "differential 

diagnosis ability." The "global assessment" operated similarly, with 37% of the applicants 

receiving the superlative response. The least common superlative response was the "match 

rating," with only 23% of the applicants receiving a "guaranteed match." 

These data can serve as a reference for both interpreting and writing SLORs. The data show that 

EM applicants least commonly receive the superlative response in the categories of "differential 

diagnosis ability," "global assessment," and "match rating," making these key categories for 

residency selection committees. These results suggest that authors can justifiably evaluate most 

applicants in the highest categories of personal traits, but that they should be more discerning with 

assessing "differential diagnosis ability," "global assessment," and "match rating."[88] 

Harwood et al seem <strong>to</strong> pretend as if ratings were objective facts, rather than what they are, 

subjective appraisals based on the author's claimed but unverifiable (and probably negligible) 

familiarity with the trainee. 

In the foregoing passage, Harwood et al urged institutional authors of SLORs (standardized 

letters of reference) <strong>to</strong> manipulate their performance-appraisals in various sections of the SLOR, 

on the premise that the evidentiary basis of such appraisals don't matter, with a view <strong>to</strong> pandering 

<strong>to</strong> the selection-committees for emergency-medicine residencies and manipulating the outcomes 

of their deliberations over trainee-selection. Harwood et al seem <strong>to</strong> ignore the possibility that 

fewer authors rate trainees highly in the glittering-generality categories of "differential diagnosis 

ability," "global assessment," and "match rating" than in the glittering-generality categories, 

"commitment," "work ethic," and "personality" because the authors could be inappropriately 

ungenerous with their ratings in the first three categories, most likely because those categories are 

the clinically oriented ones and authors would very likely believe that they weren't performing 

their watchdog/gatekeeper function properly (<strong>to</strong> keep bad doc<strong>to</strong>rs from practicing 

emergency-medicine) unless they had condemned a certain quota of trainee-candidates with each 

batch that left their respective institutions. Hea<strong>to</strong>n depicts that practice in terms of what he calls 

the “basic process”: 

The Basic Process 

The basic process of an individual in a hierarchy is <strong>to</strong> avoid mistakes. . . individuals are rated by 

their errors, for their tasks are predetermined. There is no premium for achievement outside 

assigned hierarchical tasks but there are penalties for every shortfall from perfection. 

The normal distribution in a hierarchy includes a percentage of failures, so grading on a curve 

means that students making the most mistakes are given failing grades. . . When failing students 

are eliminated, those next above them succeed <strong>to</strong> the failing category. The rule of thumb is for 

one-third <strong>to</strong> leave between the fifth and twelfth grades, . . . The next third become 

failure-threatened, declining in rank regardless of effort or improvement. Apprehension then 

blocks learning so there can only be unskilled repetition. Thus this middle third is taught 

submission and place within the hierarchy. . . (32)

Is it true or false that for every winner there has <strong>to</strong> be a loser? False – there has <strong>to</strong> be a continuing 

supply of losers if a winner is <strong>to</strong> keep on winning. In schools, grading on a curve . . . means that 

the A student needs an F student at the other end of the normal distribution; then annually or 

more often, when the F student is eliminated or drops out, another student must be pushed in<strong>to</strong> 

the failing position. . . companies seem <strong>to</strong> survive only by establishing a large pool of marginal 

workers who can be picked up when needed and dropped when business is slow. . . 

. . . Schools in exclusive suburbs do not produce so many failures . . . Instead they assume their 

students are mostly in the upper half of a normal (59) distribution. . . there are schools which 

assume their students are mostly in the lower half of a normal distribution. In one vocational high 

school in New York, no teacher could give a grade above C without special approval by the 

principal. In a ghet<strong>to</strong> high school a department head <strong>to</strong>ld me that only one student in a . . . class of 

twenty was capable of learning. I knew the students were capable and interested, but sure enough, 

nineteen dropped out and failed. . . grading in schools is a process that produces failures and 

accomplishes rejecting. 

Winners are cus<strong>to</strong>m-made, but losers are mass-produced. . . (62)[52] 

Pursuing a similar line of “reasoning,” raters in medical education may believe that they can 

enhance the reputation and credibility of their respective institutions by making a big show of 

being “<strong>to</strong>ugh graders” and the clinically oriented rating criteria are the most attractive targets for 

that sort of behavior. 

i. dis<strong>to</strong>rtion from “confidentiality,” under perpetual tension. 

In both edi<strong>to</strong>rial peer-review and performance-appraisal/LOR, the thesis is that the rater cannot 

deliver an “honest and accurate”[89] rating unless he labors under the protection of 

“confidentiality,”[61,89,90] meaning that everybody except for the subject, gets <strong>to</strong> see the rating. 

Decades of organizational oppression, in which ratees had <strong>to</strong> <strong>to</strong>lerate the in<strong>to</strong>lerable, finally 

prompted the US Congress <strong>to</strong> enact the enlightened Buckley Amendment, a federal law that 

requires schools that receive federal funding <strong>to</strong> make student records available for viewing by 

parents and the students themselves if they are 18 or older.[89,91] Accordingly, even though 

federal law mandates that the trainee should be able <strong>to</strong> see his rating, those in medical education, 

prefer the old oppression. They recommend that the organization should compel the trainee <strong>to</strong> 

“waive” his legal right, under the Buckley-Amendment, <strong>to</strong> see his rating, in the interest of 

“honesty”,[1,90] “authenticity,”[89] and “objectivity” (read freedom of the rater <strong>to</strong> give an 

adverse rating with the security of knowing that the ratee cannot learn of it and therefore not have 

grounds for retaliation) of the letter and of its “value”[1] <strong>to</strong> the receiving institution. 

In edi<strong>to</strong>rial peer-review, the author receives the rating but not the identity of the rater. In 

performance-appraisal, the trainee knows the identity of the rater but, ideally, not the rating. 

Yet, dissenting voices resist such organizational oppression, and for good reason, in my view:

...One of our wisest and most experienced faculty members, <strong>Dr</strong>. Douglas Lindsey, offers <strong>to</strong> write 

letters for every medical student. He writes them honestly. He then shows the student the letter. It 

is up <strong>to</strong> the student <strong>to</strong> decide wether it is sent. This is an excellent policy of a great teacher. 

Unfortunately, it is probably unique. (320) 

A few students have been asked <strong>to</strong> sign statements that they have not seen their reference letters. 

This is ridiculous and unenforceable. Don't sign...it is common practice for students <strong>to</strong> be asked <strong>to</strong> 

sign a waiver of their right <strong>to</strong> request <strong>to</strong> see referee letters. If you are forced in<strong>to</strong> this type of 

situation, you may have <strong>to</strong> sign it and hope for good letters. If possible you do want <strong>to</strong> see those 

letters before they go out. (321)[53] 

The practice of “confidentiality,” under compulsion and under false color of “honesty,” in the 

rater, may thus spawn duplicity and dishonesty in the ratee. 

On the subject of so-called honesty, one naturally wonders whether the “honesty” will be 

even-handed or biased. A few obvious questions spring <strong>to</strong> mind: 

Will the rater be as “honest” about how he himself prioritized the needs of trainees lower than his 

own personal needs and therefore devoted insufficient time <strong>to</strong> the those in need of guidance <strong>to</strong> 

foster their improvement as he claims <strong>to</strong> be about the shortcomings of those trainees, whom the 

rater thus abandoned? Will he be honest about his own failures <strong>to</strong> implement and incorporate core 

content of his specialty (e.g., emergency-medicine) in training and rating his trainees? Will he be 

honest about his own failure <strong>to</strong> provide daily feedback <strong>to</strong> trainees <strong>to</strong> keep them informed of what 

specific performances they needed <strong>to</strong> demonstrate the following day <strong>to</strong> show improvement? Will 

the rater be honest about his own failure <strong>to</strong> document daily or weekly improvement or otherwise 

and reasons therefor in his rating-comments? Will the rater be honest about his own failure <strong>to</strong> 

define behavioral educational objectives,[92] <strong>to</strong>ward which the trainees might strive? Will the 

rater be honest about how he exchanged gossip with other faculty about various trainees and 

thereby formed a collective, united, homogenized opinion of trainees, insteade of expressing his 

own opinion, based on his personal observations? Will the rater be honest about casting the 

evaluation in terms only of the trainee's failures, not in terms of systematic failures of the 

institution? 

Tonesk provides a twisted view of objectivity vs. subjectivity and authority-relationships in 

medical education.[93] 

In the realm of edi<strong>to</strong>rial peer-review, Walsh et al found referees more considerate and courteous 

<strong>to</strong>ward authors if their names attached <strong>to</strong> their reports[94]. What's wrong, therefore, with 

accountability in LORs? 

Flacks wrote: 

. . . maintaining the confidentiality of the contents of evaluations and letters of reference would 

[not] improve the quality of such assessments. On the contrary, . . . I've become convinced . . . 

that the reverse is true. New state laws and university regulations have opened the process . . --

and the results . . . have been good. Faculty members and departments now have the opportunity 

<strong>to</strong> respond <strong>to</strong> negative reviews . . . timely . . . and with some understanding of the arguments that 

may merit rebuttal. The review process is now more cumbersome, but it is . . . less Kafkaesque. . . 

A new law that would require full disclosure has passed the legislature but is being contested in 

the courts by the University. I am quite sure that the . . . motivation for the University's resistance 

is not so much <strong>to</strong> protect the quality of the review process as it is <strong>to</strong> protect the discretionary 

powers of the administration. 

. . . The need for open evaluations is not simply that such openness promotes due process. The 

due process argument applies <strong>to</strong> all institutions in their treatment of workers. . . open access helps 

ensure that each member can benefit from critical feedback and also ensures that criticisms are 

made in a way that is responsible <strong>to</strong> canons of scholarly objectivity. . .[95] 

Fashing wrote: 

. . . If we allow people <strong>to</strong> require anonymity as the price for the exercise of candor and 

professional responsibility, then surely we encourage a pernicious form of cowardice. Are our 

sensibilities so delicate that they cannot contend with the requirement <strong>to</strong> render our negative 

judgments openly and honestly with whatever risk that entails? And if they are, should we 

continue <strong>to</strong> encourage such delicacy or should we begin <strong>to</strong> require a modicum of courage <strong>to</strong> go 

with our “candor”? I for one believe we should. . . the requirement of anonymity raises serious 

questions of credibility in its own right. Why should we believe that anonymity is the price of 

honesty any more than that it is an opportunity for dishonesty? . . . 

. . . there are compelling reasons for confronting intellectual, professional, and . . . personal 

differences as a minimal requirement for the development of any serious sense of community. This 

will no doubt produce some unpleasant moments in the context of whatever conflicts surface, but 

what group that constitutes a serious community, or perhaps more importantly, a community <strong>to</strong> 

be taken seriously, especially in intellectual terms, is without conflict? That consensus about all 

issues is unnecessary <strong>to</strong> the maintenance of a healthy community is recognized by all but the most 

resolutely conservative members of the academy. To address such differences and <strong>to</strong> resolve 

them, or in the case of intellectual differences, <strong>to</strong> provide a climate in which debate and conflict of 

opposing ideas are a catalyst for intellectual growth and creativity, strikes me as the essence of 

academic community and a primary requirement for intellectual and academic freedom. In this 

sense disclosure should promote rather than retard intellectual excellence. (222)[96] 

6. The ultimate goal: communion of “<strong>to</strong>p” talent in “<strong>to</strong>p” institutions 

The counter-argument <strong>to</strong> the foregoing is that the most competitive programs have <strong>to</strong> select the 

most competitive trainees. 

Why? Even assuming that the selection-process be valid, a dubious proposition, what ultimate 

utility is there in aid of quality of patient-care in concentrating “<strong>to</strong>p talent” in “<strong>to</strong>p institutions?” 

Isn't that just elitism run amock? What about spreading the wealth, if that's what it is (a dubious 

proposition), around a little? Wouldn't the “non-competitive” trainees gain from exposure <strong>to</strong> “<strong>to</strong>p

institutions” and wouldn't “competitive trainees,” if they offer any genuine advantage over 

non-competitive trainees, be able <strong>to</strong> work their magic in institutions in more humble locations? 

I've interacted with finished physicians from a broad range of institutions and I'm constantly 

impressed with how alike they are. Physicians from Harvard, Yale and other Ivy League 

institutions are no great shakes and some of the most impressive come from the hinterlands. What 

was all the fuss about during education and training, then? 

7. Illustrative anecdote which is more typical than it should be 

When I worked as a civilian in the ER of the military hospital, Fort Stewart, GA, my military 

supervisor, a Major in the Army Medical Corps, liked me pretty well at first but seemed <strong>to</strong> dislike 

me more and more as time went on, evidently because of conflicts that swirled around me. 

He criticized my handwriting, so I brought in a word-processor <strong>to</strong> write up my charts and make 

them optimally legible. He didn't s<strong>to</strong>p me from doing that but, long after I'd left there, I obtained 

copies of my personnel-records, including documentation of his commentary on the episode. 

Without explaining what he intended, he put an exclamation after the statement, “he brought in a 

word-processor!” I gather he disapproved of my constructive response <strong>to</strong> his criticism, yet he 

suggested no other alternative. What did he want from me? Did he expect me suddenly <strong>to</strong> develop 

handwriting like his? He never explained. 

In perhaps the emblematic episode of my tenure there, I pissed off one of his fellow Army-officers 

by calling him in at night <strong>to</strong> attend a female patient of his by admitting for her evaluation and 

moni<strong>to</strong>ring of her chest-pain that I suspected had a cardiac origin. He chewed me out for 

disturbing his sleep and wanted me <strong>to</strong> release her home without forcing him <strong>to</strong> come in and 

examine her. He claimed <strong>to</strong> know her so well that he KNEW that her chest-pain was not cardiac 

but, instead, was from her COPD. The rules, not of my making, required him <strong>to</strong> come in and 

examine a patient whom the ER-physician suspected of requiring admission. Under protest, he 

came in, chewed me out some more in front of nurses and other personnel and released her home. 

A few weeks later, her cardiac catheterization at Fort Gordon revealed severe coronary artery 

disease. I had committed an unpardonable sin: being right when an army-doc<strong>to</strong>r was wrong. 

It's not as if this were a diagnostic coup. It could hardly have been more stereotypical. She had 

chest-pain, reminiscent of cardiac chest-pain. It was bread-and-butter medicine. She needed 

admission for the sake of safety. The officer fulfilled his paper-duty under protest by getting out 

of bed and examining the patient. He failed in his duty <strong>to</strong> admit her for moni<strong>to</strong>ring. 

I pissed off a pediatrician by calling him in at night a few times <strong>to</strong> attend febrile infants who I 

thought might need admission, as a posted directive required me <strong>to</strong> do. Whether the patient's 

condition is serious enough <strong>to</strong> warrant admission is a matter of judgment and, if I think the patient 

needs admission, the pediatrician may disagree. I assumed that <strong>to</strong> be in the realm of disagreement 

among reasonable people. He evidently disagreed, even with that principle, probably because he 

was the pediatrician on call and fulfilling his duty required him <strong>to</strong> exert unwelcome effort. He 

impugned my “judgment,” as a tactic in his campaign. He sent all the patients I referred <strong>to</strong> him

home, possibly as a way of accumulating incompetence-points against me. Those incidents 

illustrate the principle, universal, in my observation, that hospital-personnel pay abundant 

lip-service <strong>to</strong> concern for quality of patient-care but their actions bespeak only concern for their 

own convenience. 

Thus, I accumulated “complaints” against me but the hospital never preferred any charges against 

me or offered me a peer-review hearing for me <strong>to</strong> rebut such charges, presumably because the 

notion would have been absurd, even <strong>to</strong> Army-brass. 

Hypothetical charge 1: diagnosing chest-pain as cardiac which later proved <strong>to</strong> be cardiac but 

pissing off Army-Officer in the meantime by calling him in at night <strong>to</strong> do his duty. Charge 2: 

complying with posted hospital-directive by calling in Army-Officers in relevant specialties 

“unnecessarily,” and thereby pissing them off, on nights when they're on call <strong>to</strong> attend patients, 

possibly appropriate for admission, and <strong>to</strong> render their opinions. 

Instead of taking a formal route, they chose a typical bureaucratic route: my supervisor completed 

consecutive evaluation-reports in secret and never discussed them with me. The personnel-records 

I obtained years later, exhibited an unmistakeable halo-effect: In all components, from “medical 

knowledge” and “rapport with staff” <strong>to</strong> “health” and “appearance,” the ratings descended in 

parallel from 9 or 10 of 10, steadily downward, <strong>to</strong> end at about 3 or 4 out of 10, under the 

influence of multiple complaints of pissing off Army-physicians by asking them <strong>to</strong> do their duty. 

That is, each evaluation-cycle, my supervisor assigned all components the same rating: all 9s, all 

8s, all 7s, all 6s and so forth. Yet, my “appearance” and “health” were verifiably the same 

throughout that time: fine and stable. He presented not a scintilla of evidence of my deteriorating 

health, for example, yet he “documented” its deterioration in his numerical ratings. This person 

had an MD-degree! 

Thereupon, enough poor pseudo-ratings had accumulated against me <strong>to</strong> “justify” my termination 

and <strong>to</strong> provide an ironclad “paper-trail,” in case I should have decided, at some point, <strong>to</strong> contest 

my termination legally. 

LORs that I requested from Fort Stewart stated only the dates of my employment there but made 

no mention whatever of my performance, e.g., my thoroughness and my diligence, for the benefit 

of patients, against the odds of dysfunctional military-bureaucratic obfuscation. Those LORs 

illustrate a fundamental principle of all LORs: LORs accommodate the needs of the ambient 

power-hierarchy, not of the subject thereof. That makes them inherently inaccurate. If the 

academic is honest with himself, he will concede that academic power-hierarchies exhibit similar 

manifestations. 

I could provide other anecdotes with similar import but I've gone on far <strong>to</strong>o long already, so I'll 

s<strong>to</strong>p. 

When will decision-makers cop themselves on <strong>to</strong> the inherent unfeasibility of rating human 

beings?

Eric Grosch, Letter to Dr. Morgenstern on LOR - Semmelweis ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?