If people carry out this task one pair at a time and are not able to go back and adjust prior preferences, often they wind up producing a chain of preferences that violates transitivity. When questioned about this they often will "correct" some items. When asked the cause of the "inconsistency" they will often say something in the following spirit. The person prefers sweeter things to less sweet things, but the person also prefers "firmer" fruit to less firm fruit, and when one "factors" in what may be "conflicting" ways of viewing the same objects from different points of view, they realize they were not "consistent" about the whole collection of fruits.

Human behavior is complex to analyze, and the motivations humans have for their behaviors are complex. Often the way that information is framed about a decision that has to be made results in different reactions. Not only have people interested in mathematical psychology studied "utility" and preference behavior from a theoretical and behavioral point of view, they have also looked at game theory and the way that people play games.

No mathematician has done more to obtain insight into these matters than R. Duncan Luce Luce was a student at MIT where he received a doctorate degree in algebra and went on to publish over papers on a wide range of topics related to the interface of mathematics and psychology. One of his most important publications was a joint book with Howard Raiffa called Games and Decisions.

This book, though written in , remains a wonderful exposition and survey of game theory and decision theory as well as other topics related to mathematical psychology. While lots of progress has been made since the book was written, it is still a remarkably cogent and valuable entry point to these fields. When I first read this book it was like reading a novel, filled with exciting ideas which dealt with matters that I had never been exposed to or thought about before. Although the theory of games has antecedents in the famous work of Von Neumann and Oskar Morgenstern, relatively soon after this seminal work the ideas were latched on to by researchers in both political science and psychology.

Psychology after all is concerned in part with what behavior people exhibit in various situations. Since there are games which have paradoxical properties, perhaps experiments with games that have paradoxical aspects would provide a window into the way rational people behave when faced with choices that "stress" rationality. So let me begin with an example of a paradoxical game, or more precisely a family of games, often called Prisoner's Dilemma. The game above has two players Row and Column who independently make a decision to play the actions they have available to them.

The entries in the table are the payoffs, which are known to both players, who must make their choices without consulting each other. How should Row play? The troubling feature of this game is that it seems whatever Row's opponent does, it makes sense to play Row II, because for either choice by Column, Row's payoff is higher. Because of the symmetry of the game Column reasons equivalently. If both Row and Column both reason in this way they play Row II and Column 2 with the seemingly pleasant result that they always win.

However, they would win much more if they always played Row I and Column 1 respectively. The "paradoxical" aspects are strengthened when the payoffs in Table 1 are altered to the values shown in Table 2. As before, "rationality" suggests Row play Row II and Column play Column 2 because doing this does better for each regardless of the choice of their opponent. However, now rational play yields regular losses if this game is played over and over again for both players when a sizable gain can accrue for playing Row I by Row and Column 1 by Column. However, suppose after 33 plays of the game Column reasons this way: If my opponent Row plays Row I on this last play, and by now he "trusts" that I will play Column 1, I can get a modest improvement in payoff, and "punish" my opponent by playing Column 2.

Some players will reason this way. This suggests that behavior in playing the game in Table 2 may depend on: a. How much the players "trust" each other b. How many times the players play the game, which may be once, exactly K times where K is more than 1, or a finite number of times but with no specific value after which play stops. Thus, the way people behave when faced with games such as the ones in Tables 1 and Table 2 may depend greatly on the many "variables" going beyond the actual numbers in the tables.

A psychologist who was a pioneer in using the way people play games as a laboratory to understand behavior was Anatol Rapoport He with coauthors wrote the books The 2x2 Game and Prisoner's Dilemma. These books looked at the classification of two-person games, where each player had two choices of actions when the game was played, and discussed the results of experiments when these games were played under different circumstances.

For example, would men play Prisoner's Dilemma differently when playing against other men as compared with when they played against women? How did the relative sizes of the payoffs, and whether some of them were negative, affect the play of the game? There is a large literature, including some funded by the U. Defense Department during the "Cold War" that tried to get insight into the way paradoxical games such as Prisoner's Dilemma and the even more "volatile" game known as Chicken which can be used as a model for various confrontation games that occur in "real life" play out.

There have been many attempts to convey measurement information using "scales" of different kinds. The person who popularized and theorized about this approach was the psychologist Stanley Smith Stevens who tried to organize a "hierarchy' of scales for "measuring" which showed that richer information was obtainable for the "stronger scales" in the hierarchy.

The scales Stevens called attention to were: nominal, ordinal, interval, and ratio scales. His work was to some extent a reaction to work by the psychologist N. Campbell argued that psychology could never carry out "measurement" in the same way that was done in a science like physics. Stevens reacted to this by trying to show that there were different "levels" of measurement. After more mathematical insights into measurement axiomatic approaches there continues to be discussion about "over selling" at what level various quantities in psychology can be "measured.

Here in informal language are some of the commonly discussed "levels" of measurements. The underlying idea is to assign either a non-negative real number or a real number to some object that one encounters in the world outside of mathematics in the hope of using these numbers to better understand the phenomenon objects one is studying.

So one might want to measure spiciness, length, intelligence, time, artistic ability, temperature, strength of a hurricane, or mass. Nominal scale Numbers are used as names only and don't connote size information. Example: Human chromosomes are named using numbers, hence, chromosomes 6 and 8. Ordinal scale Numbers are used to indicate size order but one can't divide the numbers or subtract them with any meaning. One also has to be careful if higher numbers should be interpreted to mean a "stronger" or "weaker" signal. Example: Saffir-Simpson Hurricane Wind Scale In the Saffir-Simpson Hurricane Wind Scale a storm at the 1 level is less severe than a storm with higher numbers, In some scales designed, for example to have people rank movies, one uses from 1 to 5 stars, the higher number of stars indicating a "better" movie.

So typically regular moviegoers know that 4-star movies have been rated as "better" than 2-star movies. In some situations, however, number "1" is the best. Interval scale An interval scale is one in which the numbers used are such that the difference between the numbers at different values in the scale can be properly compared. Example: The most familiar example of an interval scale is temperature.

### Numéros en texte intégral

Whether one uses degrees Fahrenheit or degrees Centigrade, when working with Fahrenheit degrees, there is the same temperature difference between 5 and 11 degrees as there is between 70 and 76 degrees. However, the degree temperature is not 14 times as large as the 5-degree temperature.

When an interval scale is used, the 0 point on the scale is chosen arbitrarily. Ratio scale A ratio scale is one in which the numbers used are such that not only are the differences between the numbers at different points of the scale comparable, but one can also compare the results of dividing the numbers. Thus, 80 kilograms is twice 40 kilograms and 30 kilograms is twice 15 kilograms. Example: Mass and length are examples of things that can be measured on a ratio scale.

Although it might not seem apparent at first glance, scale type has an important relationship with statistics. Suppose that one writes down the number of the chromosome that a list of genes associated with the development of cancers of different kinds is located on. If one computes the mean of these numbers what has one found out?

The answer is nothing. Since the number used for chromosomes is from a nominal scale, it makes no sense to compute the mean of these numbers. One can compute the mode for these numbers and this may provide some "insight. Thus, statisticians worry about what scale type the numbers they are being asked to obtain insight about are drawn from. Students are often asked to evaluate their teachers on a scale that runs from 1 to 5 with 5 being a better rating.

Does it make sense for one to compute the arithmetic mean of these ratings to determine "teacher quality? For ordinal scales one can "legally" use the median of the scores but not the arithmetic mean. A teacher who receive 1's from half the class and 5's from the other half is different from a teacher who receives 3's from an entire class. When mathematicians see below for some of these individuals started to investigate scale type, it was realized that what Stevens had done was not complete, and that it helped to clarify whether the numbers in the scale were all real numbers or only non-negative real numbers.

Another issue that mathematics called attention to was what functions that mapped the scale values to themselves were allowed. The controversy about scale types has emerged because building on Stevens' scale types, some individuals have argued that when one has modeled some behavioral phenomenon e. Interval scales allow one to use the arithmetic mean and standard deviation while the ordinal scale does not. For many people this blurring of the use of ordinal and interval scales has caused much confusion and allowed policy decisions regarding educational practices to be made on what looks like scientific evidence when, in fact, the critics argue that what has been done is not scientific.

The essential step in doing this rigorously was to show that the addition for real numbers corresponded to something that one could do with masses in the real world that behaved in a way that one could model the process of combining masses in the real world with an "operation" that corresponded to adding the real numbers associated with the masses.

This process can be done using the idea of a balance scale. In the discussion here, I am using English words in an informal way, and sometimes in mathematics these words are used in a technical way that gets blurred in an informal discussion. One will often hear a phrase like: we have a "test" that measures, say, artistic ability.

This connotes using numbers to measure ability in the same way that numbers measure mass. But no one seems to have successfully showed how to measure a quantity like artistic ability so that what one gets is an interval scale. To do this we would have to be able to say that if Jack, after taking a certain training session, increased his artistic ability from 30 to 40 while Mary increased her artistic ability form 50 to 60, these point growths represent equal accomplishments.

However, no one has been able to show how to interpret growth of artistic ability in situations of this kind as numbers that can be added in a way that mirrors addition of real numbers. Using words like "intelligence" or "mathematical ability" or "being career-ready" may make some people believe that what is being done is on the same footing as measuring the distance from the earth to Mars compared with measuring the distance of the earth to the Moon.

Physics works in the sense that the time when an eclipse of the Moon will start in Providence, Rhode Island can be calculated with extraordinary accuracy. However, some people may be tempted to use a "measure of college readiness" with the same assurance that they can use "the eclipse will occur at p. Measurement theory as a special subdivision of mathematics is assigned the value 91C05 a nominal scale in the classification taxonomy of mathematics that is used by the American Mathematical Society.

Duncan Luce in the late s.

It is noteworthy that the backgrounds of these individuals was very diverse: logic, philosophy, educational philosophy, and algebra and game theory. The culmination of this line of research was in the three-volume series of books by David Krantz, R. Luce, and P. David Krantz was an undergraduate mathematics major at Yale but got his doctorate in psychology from the University of Pennsylvania. More recently, important contributions to mathematical psychology have been made by Fred Roberts and Jean-Claude Falmagne.

In this look at the way mathematics and psychology have interacted I have just scratched the surface. I have not even mentioned whole areas on concern such as the way learning occurs. Much work has been done of this topic, notably by William K. Other important contributors were R. Bush and Fredrick Mosteller. One way to get a good idea of the richness of what is going on in this field is to look at the contents of the articles in two premier journals of the field: Journal of Mathematical Psychology , and British Journal of Mathematical and Statistical Psychology.

The fruitful collaboration between mathematics and statistics will no doubt grow and continue to get stronger. A rich collection of articles by R. Dowling, C, and F. Roberts, P. Theuns, eds. Psychology Press, Edwards, W. Boston: Kluwer. Ellsberg, D. Risk, ambiguity and the Savage axioms. Quarterly Journal of Economics, 75 Hardcastle, G. Stevens and the origins of operationism.

Philosophy of Science 62 — Kahneman, D. Krantz, D. Foundations of measurement, Vol. I: Additive and polynomial representations, Lord, F. Statistical theories of mental test scores. Luce, R. Journal of Mathematical Psychology, 30 — British Journal of Psychology, 88 — Journal of Mathematical Psychology, 45 81— Krantz, P. Suppes, and A. Academic Press, Tukey, Simultaneous conjoint measurement: a new scale type of fundamental measurement.

Journal of Mathematical Psychology, 1 1— Michell, J. Psychological Bulletin, 3 — Cambridge: Cambridge University Press, Measurement — Interdisciplinary Research and Perspectives, 6 Narens, L. A general theory of ratio scalability with remarks about the measurement-theoretic concept of meaningfulness. Theory and Decision, 13, 1— On the scales of measurement. Journal of Mathematical Psychology, 24, — Narens, Louis, and R. Duncan Luce. Roberts, F. Measurement theory, Cambridge U. Press, Cambridge, Rozeboom, W..

Scaling theory and the nature of measurement. Synthese 16 — Stevens, S. Stevens Ed. New York: Wiley, Suppes, P, and D. This, as it happens, it not always obviously true; this point will be revisited in a later section. In some cases, alternative methods may be explicitly included as design elements in a testing situation. Seen this way, the use of different methods to assess a trait represents an attempt at triangulation, or, alternatively put, an attempt to establish that inferences about a targeted attribute are robust i.

Campbell and Fiske gave a related but slightly different motivation for the deliberate inclusion of different methods in their seminal article on multi-trait multi-method MTMM studies. If such a trait has truly been identified, and is indeed invariant to a variety of transformations in the incidentals of its assessment, different methods of observation should give consistent results. Conversely, when a particular method of observation is applied to the assessment of distinct traits, the fact that they share a method in common should not inflate their apparent association.

Thus convergent validity is seen as the extent to which different methods agree on the trait values of individuals, and discriminant validity is the extent to which different traits are empirically distinguishable, even when they are assessed via the same method. Clearly, the second of these concerns is unique to measurement situations in which more than one attribute is of interest i. Such situations are more common in some fields, such as personality, organizational, management, and marketing research, than in others, such as educational testing and experimental psychology. Within the former collection of fields, the potential biasing of observed associations among theoretically distinct traits seems to be thought of as the primary reason why one should be concerned with method effects e.

For example, in situations in which multiple raters score performances of individuals, the specific rater or raters to which one was assigned is generally considered an incidental, rather than essential, feature of the testing procedure.

## Mind the gap: the history and philosophy of health psychology and mindfulness

Though steps are usually taken to maximize the interchangeability of raters e. Discussion of rater effects often takes place under the umbrella concept of facets of a testing procedure e. Within the context of generalizability theory Cronbach et al. In some other contexts, such as in many-facets Rasch measurement Linacre, , the concern is sometimes worded in terms of accounting for the fixed effects of facets on item difficulty e. The concept of LID , commonly encountered in the literature on IRT, is also related to the concept of a method effect. Specifically, on a unidimensional test 2 , LID is said to occur when responses to individual items share more in common with one another than just that their probabilities jointly depend on the latent variable.

- What are Method Effects, and Why are They Important?.
- Navigation menu?
- Wednesday, June 18.
- Log in to Wiley Online Library;
- Books Measurement in Psychology: A Critical History of a Methodological Concept (Ideas in Context);
- Featured channels!

The main concern that is usually given regarding LID is that failing to adequately model it will lead to overestimation of measurement precision reflected in, for example, upwardly biased reliability estimates and downwardly biased standard errors; e. Finally, the concept of measurement invariance e. Although invariance studies more commonly examine such things as the invariance of parameters across groups of persons e.

Here, concerns about methods may be worded in terms of construct bias, or the idea that a test may measure something different depending on its mode of delivery if group invariance does not hold. This hints at the possibility that methods may do more than simply introduce variance in observations over and above what can be attributed to variance in the measured attribute: they may, in fact, change the interpretation of the measured attribute.

Thus there is considerable variation in the conceptual vocabulary surrounding the concept of method-related dependencies in observations, and the motivations given for attending to such issues. This variation is intertwined with the semantics of the statistical techniques commonly employed in different fields to model method effects.

These models and their semantics now deserve a closer inspection. Distinct methodological traditions have arisen in various areas of research in the human sciences. Included in these traditions are often strongly institutionalized preferences for particular sorts of statistical models. Although recent advances in generalized latent variable modeling e.

By way of illustration, two popular classes of models will be described here, with particular attention to both their formal semantics and the manners in which they are commonly interpreted: MTMM CFA models and random-effects IRT testlet models. The CTCM model is commonly presented as follows:. Under the assumption that T j , M j , and e j are mutually uncorrelated, the implied covariance structure is:.

This model, as a special case of linear confirmatory factor-analytic models more generally, models the conditional mean of the indicator variables, making it appropriate when said indicator variables are continuous.

### Featured channels

When estimated using maximum likelihood ML techniques as is common , it must be additionally assumed that these variables are normally distributed. Since responses to individual test questions can rarely be scored continuously, item-level data are almost never modeled using CTCM models.

This model, again like factor models more generally, also assumes that the latent variables are continuous. Formally, the CTCM model states that population-level variance 4 in each observed variable has three causes, each of which is modeled as a person-level random variable: the trait and method dimensions, and a specific factor. There are no formal semantic differences in the modeling of the trait and method causes other than their specific patterns of loadings the fact that T j and M j are separated out in Eq.

However, in practice, it is common for such dimensions to be interpreted primarily as nuisance dimensions, where only population-level parameters are of interest similarly to how unique factors are commonly interpreted. Partly in reaction to this, modified forms of the CTCM model have been proposed, such as models with orthogonal methods factors correlated-trait uncorrelated-method CTUM.

The method factor in the CTUM model could only be interpreted as denoting those attributes of persons that are truly specific to a particular method of measurement, and in many cases it may not be clear exactly what such attributes would be. Another model proposed as an alternative to the CTCM model is the CU model Marsh and Grayson, , which drops the method factors entirely and allows the disturbances of observed variables that share a common method to correlate. There are a number of within-item multidimensional IRT models that have been developed that could be considered models for method effects.

As mentioned previously, the concept of LID shares a conceptual relation with the concept of method effects: LID occurs when variation in some subset of item responses shares more in common than just their common cause represented by the primary latent variable, and methodological similarities among items are an obvious possible source of such shared variance. Thus it could be said that method effects are one possible cause of LID, and, therefore, that models developed for LID may be used to model method effects. Various constrained versions of full information bi-factor models Gibbons and Hedeker, ; see also Holzinger and Swineford, have been proposed to model tests with testlet structure, one of the more famous of which is Bradlow et al.

Rijmen, testlet response model. The bi-factor model and Bradlow et al. Their model is as follows:. A separate equation for the covariance structure analogous to Eq. In the case where slopes are estimated, there are various constraints that can be placed on the model for purposes of identification and interpretability: Bradlow et al.

The logit link function makes these models appropriate for ordinal indicator variables. The indicator variables are typically assumed to have Bernoulli distributions if dichotomous, and multinomial distributions if polytomous. As with all IRT models, it is assumed that the latent variables are continuous.

If the items within a testlet loaded onto more than one substantive dimension, this model would be equivalent to the CTUM model discussed earlier, albeit with a logit rather than identity link. Wang and Wilson b also refer to Eq. In terms of its formal semantics, the model is even more general than that. A random testlet or facet effect is equivalent to a second dimension of individual differences that causes variation in a subset of item responses, and thus, induces stronger dependence amongst those items than would be expected due to their primary common cause s.

There is no reason why the logic of this cannot extend to other sources of variation in specific subsets of items on a test, such as the fact that different subsets of items represent different sources of information e. The previous two sections have illustrated how the formal semantics of models for method effects depend on the particulars of model specifications in terms of constraints, numbers of dimensions, choice of link functions, etc.

Informally, as discussed previously, interpretations of method effects depend largely on differences among research traditions in vocabulary, in the subject matter typically dealt with, and in the motivations commonly given for why method effects or LID are worthy of attention; these differences, combined with differences in the way the models are commonly presented illustrated by the different choices of symbols and the switch between vector and scalar notation between Eqs 1 and 3 , as well as the differences in baseline assumptions concerning the link function and the number of substantive dimensions in the model , may give rise to the perception that these models and their associated semantics are entirely dissimilar.

This is, however, not the case: the models share a high degree of commonality at both the syntactic and semantic levels. It is easier to see the connections between the models if one starts with a more general model, and then derives the earlier models. Using the notation of Skrondal and Rabe-Hesketh , a generalized latent variable model can be formulated thusly:. This response model can be combined with a structural model:. Although this account leaves out many details, it summarizes the essence of a generalized latent variable model.

On the other hand, the random-effects Rasch testlet model Eq. Path diagrams such as these are traditionally silent as to the nature of the link function represented by the arrows — in classical CFA models, the arrows represent linear effects i. These two omissions aside, however, it can be seen that in both cases variance in each indicator is influenced by two primary dimensions of individual differences, one of which is typically interpreted as denoting an attribute that the test was designed to measure, and the other of which denotes sources of variation associated with a particular method.

Once one is aware that a there is no upper bound on the number of indicator variables that load onto each dimension and b there can be multiple substantive dimensions in IRT models just as in linear CFA models, it becomes clear that both models can easily be represented by the same path diagrams, with the exception of the absence of indicator-specific unique factors or error terms in the model with a non-linear link function.

Thus, perhaps despite appearances to the contrary, latent variable models employed in different research traditions share deep syntactic connections, and, accordingly, share much of their formal semantics as well. In addition to the norms of statistical and interpretive practice associated with particular research traditions, thinking about method effects is also affected by beliefs many of which may not be explicitly recognized by researchers regarding the meaning of measurement itself. There is not a single consensus definition of measurement accepted by all human scientists, or indeed by all physical scientists, and debates over the meaning of measurement will likely not see resolution any time soon.

Obviously, unclear semantics about measurement can propagate to unclear semantics about any measurement-related concept, including but not limited to method effects. It is worth reviewing some of the most influential lines of thought concerning measurement, and exploring how each of them has contributed to discourse on method effects, sometimes in contradictory ways. The various ways of thinking about measurement can broadly be categorized as either empiricist or realist.

The term empiricism can refer to a broad range of philosophical positions; they share in common a commitment to direct observation as the basis for knowledge though what counts as observation is a perennially unsettled issue. Empiricism has been a major force in shaping Western thinking about science and natural philosophy since at least as far back as Aristotle, and standard accounts of the history of Western science emphasize how, over the centuries, empiricist lines of thinking have dovetailed with other views in epistemology particularly those based on rationalism.

In the early twentieth century, the movement known as logical positivism synthesized many ideas from classical empiricism along with then-current advances in the philosophy of language and mathematics. Positivism was associated with a strong emphasis on direct observation as the basis for knowledge and a category rejection of metaphysics; statements regarding unobservable theoretical entities or forces were only regarded as meaningful if such statements could be linked to observations in a clear and consistent manner.

There are two major strands of thought on measurement that are consistent with much of positivist thinking. The first is representational measurement theory RMT , which is characterized by the stance that measurement is the construction of morphisms between numerical relations and empirical relations e.

## Method Effects and the Meaning of Measurement

The second is operationalism , which is characterized by the stance that the meaning of any theoretical concept is exhausted by the operations undertaken to measure instances of the concept Bridgman, Representationalism is regularly described as the mainstream position on measurement in the general literature the philosophy of science. It has also had a significant influence on thinking about measurement in the human sciences; however, with the exception of the relatively small body of literature in mathematical psychology from which the theory originated, most of this influence has been indirect.

Representational measurement theory holds that to measure is to construct a representation of an empirical relational system via a numerical relational system. On this view, the starting point for measurement is the determination of empirical relations amongst objects e. Michell, ; Borsboom, Once empirical relations are determined, numbers are assigned to empirical entities in such a way as to preserve the qualities of their empirical relations.

Relational systems can possess different sorts of structures, and the particular sort of mapping of empirical onto numerical relations determines the scale properties. This is an example of the aforementioned indirect influence of RMT on thinking about measurement in the human sciences. One of the principal reasons that RMT has not been more widely influential in the human sciences is that standard accounts of the theory have difficulty accounting for the role of measurement error.

RMT holds that relations must be directly observable; in contrast, statistical models employed in the human sciences such as those discussed in the previous section take observations to be error-prone reflections of latent variables with idealized structures. One could formulate this hypothesis in at least two ways. In the first case, the method acts as a perfect conduit running from true relations in the world to sensed relations. In this case, one could either hold that the world does not exist apart from our perceptions of it, or that its existence is simply irrelevant.

Where, then, do methods play a role in RMT? Without an account for how observations can contain error, it seems the only answer can be that either a the method of measurement plays a trivial role in being a perfect conduit from the real to the sensed world, or b the very concept of a method of measurement is unnecessary, as measurement is simply the mapping of directly experienced relations onto numerical relations. In either case, if two different methods of measurement yield two different relational systems, they cannot be said to be measuring the same attribute.

- The Friends of Peace: Anti-War Liberalism in England 1793-1815.
- The Economics of Innovation (Critical Concepts in Economics);
- The Car Bodywork Repair Manual: A Do-it-yourself Guide to Car Bodywork Repair, Renovations and Painting (A Foulis motoring book).
- [PDF] Measurement in Psychology: A Critical History of a Methodological Concept (Ideas in Context)?
- Browse more videos?

Operationalism or operationism; Bridgman, shares with RMT a focus on observables as the basis of knowledge and a rejection of metaphysics. Operationalism was proposed as a semantic doctrine about the meaning of theoretical terms rather than a theory of measurement per se : operationalism holds that the meaning of theoretical terms is exhausted by the particular operations undertaken to observe them, which means that the results of a particular set of operations or measurement procedure are interpreted as measurements simply by fiat.

Operationalism was originally proposed as a form of extreme epistemic humility in reaction to the upending of seemingly basic concepts such as length by the special theory of relativity: Bridgman felt that one of the reasons that it had been so difficult to see that the Newtonian notion of absolute time and space was flawed was that our theoretical terms came with too much baggage. Thus, asking why the lengths of objects seemed to be different depending on the speed with which they were traveling was already an ill-formed question, in that hidden within it was a false assumption about the nature of space.

Operationalism has since been almost uniformly rejected as irreconcilable with general scientific practice and vocabulary. Following the collapse of logical positivism and an associated general retreat from extreme forms of empiricism, many scholars became increasingly willing to accept that the interpretation of concepts like temperature and intelligence outrun their associated measurement procedures — and, in fact, it is very difficult to make sense of both scientific and lay discourse about such concepts without this belief.

Operationalism had a strong influence on psychology and in particular, behaviorism , especially through Boring e. More generally, and again like RMT, the concept of measurement error is ill-fitting with operationalism: if the results of applying a procedure are by definition a measurement of the theoretical term, what is there to be in error about?

If one were willing to accept that repeated applications of the same measurement procedure under the same conditions could yield different results, and one were willing to accept a definition of the theoretical term in terms of the average of a series of replications of a procedure rather than the outcome of a single application of that procedure, one could define measurement error as random deviations from a true long-run average; in fact, this is exactly how measurement error is defined in Classical Test Theory, a point argued by Borsboom Moreover, given our lack of access to the true counterfactual of running the same procedure under the same conditions, it is unclear why results should actually be expected to differ over identical applications.

If results differ because the conditions are themselves different, then, according to the doctrine of operationalism, one does not have measurement error — one has distinct theoretical concepts. Thus, at least in their original, strict forms, the two major lines of empiricist thought on measurement have little room for the concept of a method effect, as it is commonly interpreted in human science measurement.

As soon as one has formulated the idea that an attribute of an object or person can be observed in more than one way, it seems one has also assigned an independent identity to the attribute, and embraced at least some version of a realist stance on measurement. The term realism also refers to a broad range of positions; what they share in common is the belief that a natural world exists independently of observation. Scientific realism further proposes that at least one aim of science is to promote the acquisition of knowledge about this natural world.

In the context of measurement e. It should be noted that while the strict forms of empiricism discussed in the previous section are either antirealist or simply arealist, there is nothing inherently contradictory about a commitment to observation as the basis of knowledge and the belief that a natural world exists independently of observation; thus, realist philosophies are often compatible with more moderated forms of empiricism.

There are various possible ways to conceive of the relationship between a measured attribute and the outcomes of a measurement procedure.

Borsboom et al. For example, a mercury thermometer measures temperature because variation in temperature the attribute causally produces variation in the expansion of mercury in precisely calibrated glass tubes the observations. The link of causality from the attribute to the outcomes of the procedure justifies the inference from those observed outcomes back to the unobserved attribute.

The validity of such a procedure is clearly threatened to the extent to which anything besides the targeted attribute can causally produce variation in the outcomes of the procedure. That is, if there is some other attribute of objects e. Another source of such variance would be actual variance in methods, insofar each the outcomes of different methods applied to the same objects may have different expectations.

Decoupling method-specific variance from attribute variance under a realist framework thus requires nothing more than knowing what attribute is the target of measurement, and how variance in this attribute is transmitted to variance in the outcomes of the measurement procedure. If it is possible to give a complete account of the causal processes leading from variation in the attribute to variation in observations, any additional causes of variation in observations can be clearly identified as attribute-irrelevant, and threats to the validity of the measurement procedure.

To the extent to which such additional sources of variation are associated with the particular method of observation used, they could be termed method effects. This account raises an important conceptual point about method effects: inherent in the idea of a method effect is that, at least in principle, more than one measurement procedure i. However, it may not always be clear to what extent an attribute is conceptually independent of the methods of measurement, especially in human science applications.

The definition of temperature as an attribute of objects or systems is now very precise, and thermodynamic theory can specify the causal mechanisms that lead from variation in temperature to variation in the outcomes of the application of a range of specific measurement procedures including but not limited to the aforementioned mercury thermometer in a great amount of detail.

Arguably, there are no cognitive theories so precisely developed, and the causal mechanisms that link attributes to observations are rarely if ever specified in such detail. More generally, it is not always clear to what extent the method of observation is truly attribute-irrelevant, and to what extent the methods of observation help inform or even construct the meaning of the attribute. Though such interpretive difficulties have been acknowledged by a number of scholars, including Cronbach e. In part, this may be because researchers are intuitively working from a metaphysical position that might be termed constructive-realism rather than a stricter form of realism that holds that attributes exist fully independently of human-designed measurement procedures.

The concept of realism applied to psychological attributes is often taken to imply that the attributes in question are hypothesized to exist independently of human intentionality. That is, stating that an attribute exists or is real is taken to imply that it exists in observer-independent ontologically objective fashion, just like supposedly physical attributes such as temperature and mass. This, in turn, is often interpreted as implying physical i. However, it is not necessary for psychological attributes to be ontologically objective for them to be real components of the natural world.

Briefly, psychological attributes can be a to some extent ontologically subjective, in that they involve conscious phenomena with subjective first-person ontology, and b to some extent be composites delineated by contextually and pragmatically driven linguistic frames of reference, rather than being natural kinds or natural attributes, as the case may be in the classic sense e.

From this perspective, what constitutes a method effect is a contextualized and pragmatic issue, and methodological features of the very same procedure may be considered method effects or not relative to the conception of the attribute s being measured by the test. A contemporary example comes from the renewed interest on the part of the U.

On performance tasks, students may be asked to for example perform short experiments or produce specified products. Suppose that there is some degree of disparity between the results of performance events and multiple-choice items concerning the relative levels of knowledge of the students. It could be said that the two testing modalities each require a different set of method-specific ancillary skills in addition to the attribute intended to be measured e.

Interestingly, much of the past and current rhetoric around the use of performance events in educational assessments is consistent with both possibilities cf. Still more troublingly, both possibilities could be true at once, in which case method-specific ancillary skills would be inseparable from attributes of interest. More generally, it may often be the case that a method of measurement is completely confounded with an intended target attribute. For example, any instance in which an attitudinal or motivational attribute is assessed entirely via positively worded Likert items is a situation in which each item response is potentially caused by both the attribute and by any ancillary person characteristics that influence how the person responds to positively worded Likert items e.

Under typical circumstances 10 , it will not be possible to estimate such a model; thus, a unidimensional model will likely be fit to the data, and it is likely that the unidimensional model will produce inflated estimates of the degree of dependence of the observations on the latent variable, insofar as there is an additional method-related source of dependence amongst the items being confounded with dependence due to the common causal attribute. Such situations are rarely discussed in terms of method effects, perhaps largely because they usually cannot be modeled as such, but the conceptual problem with method effects is very much present, and all the more intractable for being un-modelable.

Thus, in any given measurement situation, clarity regarding method effects depends on clarity regarding the ontological status of the attribute being measured. As was illustrated in the previous section, if an attribute is not taken to exist independently of a measurement procedure, the very concept of a method effect is incoherent; on the other hand, if it is taken to have independent existence, a coherent account of how the choice of methods influences the outcomes of the measurement procedure, and the selection of appropriate psychometric models, depends on being able to specify a priori what the attribute itself is and is not, and how the methods of measurement serve to transmit variance in the attribute into variance in observations.

Borsboom , p. However, the formal semantics of models employed in different human science research traditions are in fact quite similar. Such a hypothesis may be compatible with a range of shades of realism, including versions of constructive-realism that allow for the possibility that the existence of an attribute is not independent of human intentionality. However, strictly interpreted antirealist theories, such as those derived from severe forms of empiricism popular in the early twentieth century, are not compatible with the concept of a method effect.

If theoretical attributes were truly nothing more than the operations undertaken to measure them operationalism , or if measurement were nothing more than the construction of morphisms from directly experienced empirical relations to numerical relations representationalism , or indeed, if measurement were truly nothing more than the assignment of numerals to objects according to a rule Stevens, , method effects would be a non-issue. Measured attributes must exist independently of a specific measurement procedure if method effects are defined as sources of variance beyond what is attributable to the measured attribute.