TY - JOUR AU1 - Biedermann,, Alex AU2 - Bozza,, Silvia AU3 - Taroni,, Franco AU4 - Garbolino,, Paolo AB - Abstract In this article, we review and analyse common understandings of the degree to which forensic inference of source—also called identification or individualization—can be approached with statistics and is referred to, increasingly often, as a decision. We also consider this topic from the strongly empirical perspective of PCAST (2016) in its recent review of forensic science practice. We will point out why and how these views of forensic identification as a decision, and empirical approaches to it (namely experiments by multiple experts under controlled conditions), provide only descriptive measures of expert performance and of general scientific validity regarding particular forensic branches (e.g. fingermark examination). Although relevant to help assess whether the identification practice of a given forensic field can be trusted, these empirical accounts do not address the separate question of what ought to be a sensible, or ‘good’ in some sense, (identification-)decision to make in a particular case. The latter question, as we will argue, requires additional considerations, such as decision-making goals. We will point out that a formal approach to qualifying and quantifying the relative merit of competing forensic decisions can be considered within an extended view of statistics in which data analysis and inference are a necessary but not sufficient preliminary. It is not uncommon to hear of nonsense about conclusions being ‘scientifically proved’. —Chernoff and Moses (1959, p. 10) 1. Introduction 1.1 Is forensic identification (only) a case of statistics? The mindset of many forensic scientists associates forensic identification of source with statistics. According to this view, forensic identification of source—also sometimes referred to more shortly as identification or individualization (Champod, 2000)1—can benefit from statistics. A more stringent view would consider that forensic identification is, by its nature, a statistical procedure. This, however, might be criticized as being a restrictive view. It is interesting to note in this context that at the first ICFIS conference, held in Edinburgh (2–4 April 1990), then running under the shorter title International Conference on Forensic Statistics, Dennis Lindley noted: ‘Our topic might better be described as “forensic inference”, rather than “forensic statistics”, since not all evidence is based on statistics (or data)’ (Lindley, 1991, p. 83). According to Lindley’s account, the questions that arise in forensic science invoke, but do not reduce to, statistics. This connects to the view expressed by Evett according to whom ‘people call this [the interpretation of evidence] statistics, but it is not actually statistics, it is inference, it is reasonable reasoning in the face of uncertainty’.2 In the above quote, Lindley uses the term ‘statistics’ in the focused sense of analysing data and using them to help people revise their beliefs about propositions about which they are uncertain. On other occasions, Lindley favoured a broader definition that involves data analysis only as a first step: ‘… the first task of a statistician is to develop a (probability) model to embrace the client’s interests and uncertainties. It will include the data and any parameters that are judged necessary. Once accomplished, the mechanics of the calculus take over and the required inference is made’ (Lindley, 2000, p. 318). Inference, however, is not the end of the matter; Lindley also noted that ‘[i]f decisions are involved, the model needs to be extended to include utilities, followed by another mechanical operation of maximizing expected utility. One attractive feature is that the whole procedure is well defined and there is little need for ad hoc assumptions’ (Lindley, 2000, p. 318). Lindley refers here to the logical extension of probability to decision theory, which is a point that is systematically found in his writings (e.g. Lindley, 1985, 2014). In legal literature, decision theory has already been raised some 50 years ago (Kaplan, 1968). At the time, decision theory was in the course of becoming a core area in statistics, as noted in Chernoff and Moses (1959): Years ago a statistician might have claimed that statistics deals with the processing of data. As a result of relatively recent formulations of statistical theory, today’s statistician will be more likely to say that statistics is concerned with decision making in the face of uncertainty. Its applicability ranges from almost all inductive sciences to many situations that people face in everyday life when it is not perfectly obvious what they should do. (p. 1) In this article, we will discuss forensic individualization from the point of view of this extended, decision-oriented definition of statistics. The reasons for this are twofold. The first reason is that forensic individualization cannot be resolved by inference alone, as argued previously by Stoney (1991).3 The second reason is that the consideration of forensic individualization as a decision—using elements of decision theory—is a novel, contemporary development and raises new points of discussion (Cole, 2014), mainly regarding applicability. We will consider several of these discussion points in this article, but leave more technical developments (e.g. Biedermann et al., 2008, 2016) aside, except for short summaries in Appendices A and B. Specifically, we will analyse and discuss the conceptual insights that derive from the proposed decision-theoretic perspective with respect to conclusions and recommendations of PCAST (2016) in its recent review of current forensic science practice, especially PCAST’s call for an empirical approach to scientific validity—a call reiterated recently in an AAAS report (Thompson et al., 2017). This article is organized as follows. Section 2 will address the following points: the distinction between formal properties of decision theory and accounts that focus on the empirical study of how people actually make decisions (Section 2.1); formal approaches to decision analysis in individual cases with and without taking into account uncertainty through probability (Sections 2.2 and 2.3). This outline will allow us to state and comment on the formal properties of decision theory—the theory focusing on the logic of decision—in a broad perspective, clarifying conceptual demarcations with respect to other decision approaches. In particular, we will point out in what sense these properties allow one to define precisely, qualitatively and quantitatively, what may generically be called the ‘goodness’ of decisions. The latter term is, otherwise, often used in a more colloquial sense, and hence left undefined. Discussion and conclusions are presented in Section 3, with an emphasis on the questions that arise when relating the formal concepts of decision theory to the applied context of forensic individualization. We will argue that our decision-theoretic account of forensic individualization is not intended to constrain forensic practice, but to provide all discussants with a powerful analytic instrument to favour insight, bettering the understanding and help articulate what is at stake in the forensic individualization process. 2. Formal properties of forensic decisions and forensic decision analysis 2.1 Distinguishing decision empiricism from the logic of decision The notion of decision has become a popular word in forensic science, not least because some branches have given it the prominent role of labelling its conclusions. For example, experts in forensic identification or individualization (e.g. of fingermarks) now refer to their conclusions as individualization decisions.4 While this provides evidence of the field’s continuous endeavour to rethink its principles and practice, critical commentators (e.g. Cole, 2014) argue that the renaming of friction ridge mark examiners’ conclusions as decisions was a mere change of label, leaving aside fundamental developments in the understanding of expert conclusions in formal decision theoretic terms (e.g. Biedermann et al., 2008). Although this is a trenchant criticism, it should not be interpreted as a discouragement of the promotion of research in this area. We can distinguish two aspects related to forensic uses of the word decision, referred here to as decision empiricism and the logic of decision, respectively. Below, we provide a brief sketch of them in turn (see also Table 1 for a summary). Decision empiricism, as it is understood here, places a strong focus on, first, recognizing that what forensic expert reporting ultimately amounts to is decision-making. Secondly, decision empiricism focuses on studying—by observation—how human experts actually make decisions. The ‘how’ of observable decision making is explored, however, in variable depths and often, the study reduces to merely recording factual output without actually trying to understand how the scientists arrived at given outputs (i.e. their reported conclusions). For example, in the most basic form, the empirical study involves the ‘testing’ of a procedure (under controlled conditions), as applied by a scientist, and record the scientist’s performance during this task. This is typically done when developing a new method or a technique. On a more elaborate level, decision empiricism extends to a behavioural perspective that has led to novel avenues of interdisciplinary research with areas such as cognitive (neuro-)science (Dror, 2015), generating new and valuable knowledge about the conditions and factors that influence human decision-making. Broadly speaking, such research aims to improve decision-making in the sense of helping people making the ‘right’ decisions, and hopefully more often. An idea underlying this perspective is that the expert decision tasks refer to determinations5 (e.g. individualizing or not a person of interest [POI]) for which there is a ground truth against which particular expert conclusions can be compared (Ulery et al., 2011). The crucial point here is that such research strives to characterize the performance of experts6 across multiple instances—under controlled conditions—of a particular forensic expert task (e.g. fingermark examination).7 Clearly, such information is valuable to assess questions such as whether experts, including the methods and techniques they use, are able to do what they claim to do, and whether experts are better in their area of expertise than lay persons.8 The answer to such questions will also help assess whether an expert ought to be trusted and be assigned to a particular scientific task in a given case of interest. However, and this is an essential point, the overall performance of an expert—as monitored across repeated exercises (tests) under controlled conditions—is not a direct measure of the ‘goodness’ of a particular decision (to be) made in a given case at hand. Stated otherwise, general performance indicators of an expert, including any technical system or device that is used during the analysis process, do not answer the question of how ‘good’—or accurate (when an underlying truth-state can be considered)—an individual decision is. Expert performance measures also do not answer the question, or instruct how to actually make a ‘good’ decision in any individual case that an examiner may face. These questions fall into the domain of the logic of decision, as outlined below. Table 1 Summary of main differences between the notions of decision empiricism and the logic of decision Decision empiricism . Logic of decision . Empirical study of human decision behaviour Comparison with ground truth (controlled conditions) Focus on performance across multiple exercises (decisions) External descriptive viewpoint on experts (or a method) Focus on the logical structure of the decision problem Unknown state of nature (ground truth) Focus on the relative ‘merit’ of the individual decision Analytical viewpoint of the individual decision-maker Decision empiricism . Logic of decision . Empirical study of human decision behaviour Comparison with ground truth (controlled conditions) Focus on performance across multiple exercises (decisions) External descriptive viewpoint on experts (or a method) Focus on the logical structure of the decision problem Unknown state of nature (ground truth) Focus on the relative ‘merit’ of the individual decision Analytical viewpoint of the individual decision-maker Table 1 Summary of main differences between the notions of decision empiricism and the logic of decision Decision empiricism . Logic of decision . Empirical study of human decision behaviour Comparison with ground truth (controlled conditions) Focus on performance across multiple exercises (decisions) External descriptive viewpoint on experts (or a method) Focus on the logical structure of the decision problem Unknown state of nature (ground truth) Focus on the relative ‘merit’ of the individual decision Analytical viewpoint of the individual decision-maker Decision empiricism . Logic of decision . Empirical study of human decision behaviour Comparison with ground truth (controlled conditions) Focus on performance across multiple exercises (decisions) External descriptive viewpoint on experts (or a method) Focus on the logical structure of the decision problem Unknown state of nature (ground truth) Focus on the relative ‘merit’ of the individual decision Analytical viewpoint of the individual decision-maker The logic of decision is the second aspect related to the forensic use of the word decision that we consider here (Table 1). Instead of taking an external point of view as in the empirical approach mentioned above (i.e. looking at how people behave in the face of particular decision tasks), the logic of decision reverses the viewpoint and focuses on a personalized perspective: it asks you (the reader, us, or any other person in their own way) to imagine yourself in the position of an individual decision-maker facing the problem of choosing one of a list of feasible decisions, also sometimes called alternative courses of action. Stated otherwise, instead of looking at multiple decisions made by one or several experts, and comparing each of these decisions (e.g. expert conclusions in fingermark examination) with the respective ground truth, the logic of decision relates to decision-making under uncertainty9 in the individual case. It asks the question: given the basic elements of your (ours, anybody’s) decision problem, what is the best possible course of action to choose? A defining feature of the kind of situations dealt with here is that the true state of affairs is unknown, and not observable, which introduces uncertainty about the decision outcome to be obtained. For example, in real cases, it cannot be known with certainty whether a given POI, or an unknown person, is the source of an examined fingermark. By making an individualization decision in such a situation we must accept that, potentially, our conclusion amounts to a false identification. The question of interest thus becomes: how can a decision-maker—in any given case—qualify10 and compare the relative ‘goodness’, or merit, of rival courses of action when their outcomes cannot be known with certainty? Later sections of this article will further elaborate on this question. We will present and compare formal ways to approach it. The above two accounts of the notion of decision are neither original nor new. They are elementary and largely uncontested, yet—interestingly—ongoing discussions in forensic science and the law, in many instances, do not properly distinguish between the two perspectives. Much of current controversies over the thoroughness (e.g. validity) of forensic science, as evidenced, e.g. by the reports of President's Council of Advisors on Science and Technology (PCAST, 2016), and previously the NAS (National Research Council, 2009), focus on aspects of decision empiricism—notably performance in multiple experiments under controlled conditions. Posing this challenge to forensic science theory and practice is laudable and necessary, but it is not the end of the matter, because it deals only with a general measure of how trustworthy an expert and/or an employed method are/is. As noted above, it is a completely different matter to devise a coherent approach to decision-making in an individual case, and to qualify the relative merit of rival decisions available in a particular instance. Forensic literature on this topic is rather scarce.11 Throughout this article, we will argue that decision empiricism and the logic of decision are related, and that the former is an essential,12 but not a sufficient preliminary for the latter. Sections 2.2 and 2.3 outline stepwise features of standard approaches for decision analysis in the individual case. 2.2 Decision-making in the individual case (I): deciding without considering uncertainty about states of nature Recognizing that the study of how to coherently approach decision-making in the individual case is a topic worthy of investigation, we now turn our attention to different decision methods. As a first approach, consider one that invokes a limited number of elements, namely decisions, states of nature and a measure for the desirability or undesirability of decision consequences. Note, however, that for the time being, we leave aside an important feature of our decision problem, i.e. uncertainty—expressed by probability13—about the states of natures. The method to be considered in this section is known as the minimax method (e.g. Chernoff and Moses, 1959). This decision rule has wide developments in game theory but they will not be considered here, mainly because the assumption of an active adversarial player—who will react upon the decision maker’s choice—does not correspond to a feature of our decision problem. Here, we consider only the nonprobabilistic minimax decision rule for individual decisions. While we do not endorse this method for the problem of individualization, for reasons explained in due course, the method is useful to provide a formal description of some but not all features of a decision problem as seen from the viewpoint of the individual decision-maker. It will help us make a series of prima facie reasonable formalizations that we can use to blend the reality of forensic decision-making. We use the term ‘state of nature’ here as a synonym for proposition or, more generally, one version of an event of interest. Suppose two propositions, denoted by θ1 and θ2, short for the propositions that a given POI is the source of a trace found on a crime scene, and that an unknown person is the source, respectively. Let the decisions (i.e. available courses of action) be d1, individualize the POI, and d2, do not individualize the POI.14 This leads to the basic representation in terms of a decision matrix (Table 2), with Cij denoting the consequence of deciding di when the state of nature θj holds. Table 2 Decision and loss matrices with d1 and d2 denoting, respectively, the decisions to individualize and not to individualize a POI. The states of nature θ1 and θ2 denote, respectively, the proposition that the POI is the source of the trace and that an unknown person is the source of the trace. The notation L(Cij) with i, j = {1, 2} represents the loss associated with the consequence Cij, i.e. when taking decision di and the state of nature θj holds . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Decision matrix: individualize (d1) C11 C12 do not individualize (d2) C21 C22 Loss matrix: individualize (d1) L(C11) L(C12) do not individualize (d2) L(C21) L(C22) . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Decision matrix: individualize (d1) C11 C12 do not individualize (d2) C21 C22 Loss matrix: individualize (d1) L(C11) L(C12) do not individualize (d2) L(C21) L(C22) Table 2 Decision and loss matrices with d1 and d2 denoting, respectively, the decisions to individualize and not to individualize a POI. The states of nature θ1 and θ2 denote, respectively, the proposition that the POI is the source of the trace and that an unknown person is the source of the trace. The notation L(Cij) with i, j = {1, 2} represents the loss associated with the consequence Cij, i.e. when taking decision di and the state of nature θj holds . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Decision matrix: individualize (d1) C11 C12 do not individualize (d2) C21 C22 Loss matrix: individualize (d1) L(C11) L(C12) do not individualize (d2) L(C21) L(C22) . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Decision matrix: individualize (d1) C11 C12 do not individualize (d2) C21 C22 Loss matrix: individualize (d1) L(C11) L(C12) do not individualize (d2) L(C21) L(C22) As a last ingredient of the analysis, qualify decision consequences Cij in terms of their undesirabilities, by using the concept of loss, written L(Cij) for short. In particular, assume what is known as a 0–1 loss function, and assign 0 losses to a correct individualization (i.e. consequence C11) and a correct nonindividualization (i.e. consequence C22). This translates the idea that nothing is ‘lost’ by making accurate decisions (e.g. deciding d1, individualizing the POI when in fact this person is the source of the trace, θ1). Further, assign the loss 1 to the worst consequence15 consequence, C12, i.e. deciding d1, identifying the POI, when in fact θ2 is true (i.e. an unknown person is the source of the trace). This leaves us with the consequence C21, a missed individualization. Assume that this consequence is less adverse than a false individualization (C12), so that we can assign a loss L(C12) < 1. A qualitative assignment of this kind will suffice to pursue our discussion.16Table 3 summarises all loss assignments. Table 3 Loss matrix with d1 and d2 denoting, respectively, the decisions to individualize and not to individualize a POI. The states of nature θ1 and θ2 denote, respectively, the proposition that the POI is the source of the trace and that an unknown person is the source of the trace. Maximum losses for each decision are highlighted with a circle . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Loss matrix: individualize (d1) 0 do not individualize (d2) 0 . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Loss matrix: individualize (d1) 0 do not individualize (d2) 0 Table 3 Loss matrix with d1 and d2 denoting, respectively, the decisions to individualize and not to individualize a POI. The states of nature θ1 and θ2 denote, respectively, the proposition that the POI is the source of the trace and that an unknown person is the source of the trace. Maximum losses for each decision are highlighted with a circle . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Loss matrix: individualize (d1) 0 do not individualize (d2) 0 . . States of nature: The trace comes from . . . … the person of . … an unknown . . Decisions . of interest (θ1) . person (θ2) . Loss matrix: individualize (d1) 0 do not individualize (d2) 0 To select a course of action, the minimax method proceeds as follows. Start by highlighting, for each decision, d1 and d2, the worst consequence. In Table 3, this is shown with circles around the highest loss values that may incur for each decision. In the case of decision d1, individualizing the POI, this is the loss L(C12) = 1 for a false identification. For decision d2, not individualizing the POI, this is the loss L(C21) < 1 for a missed individualization. Next, the minimax method says to choose the decision that minimizes the maximum loss over the different states of nature. Stated otherwise, the method says to look at the worst consequence(s) associated with each decision, and then choose the decision(s) that implies the minimum loss if the worst case occurs. Applying this method to our running example of forensic individualization would mean to always decide d2, not individualizing the POI. Proceeding in this way would ensure that, in case one’s determination is erroneous, the loss incurred would be minimal. It is minimal with respect to the loss associated with the worst case (consequence C12) of the alternative decision, d1, set to 1. Is this method a feasible approach for forensic individualization? Intuitively, it may seem attractive to say that one strives to ensure that decision-making avoids excessive losses, but for forensic purposes, following this principle would be the end of individualization. Completely avoiding the possibility of incurring the worst consequence can only be achieved by never selecting the individualization decision d1. But, this does not map to current decision policies that consider individualizations among the admissible conclusions. Indeed, forensic practitioners do reach individualizations, and they do so—presumably—when there is a high probability that the POI is the source θ1, given the information I available. The standard argument being that if Pr(θ1|I) is high, there is a high ‘prospect’ of the consequence being C11, a correct individualization. Since this consequence has a minimum loss, the decision to individualisse (d1) does not appear to be an unreasonable course of action. Conversely, the argument also says that even though the decision to individualize d1 may lead to a false identification—i.e. the worst consequence (and hence incurring the highest loss)—the decision d1 may be considered acceptable because the probability for the POI not being the source of the trace, Pr(θ2|I) ⁠, is (sufficiently) low. It is readily seen that in the above paragraph, we have extended our considerations about decision-making to a further element that is not part of the decision method considered so far in this section: the probabilities for the states of nature. In essence, probabilities translate the extent of available information about propositions of interest. Thus, we can see that the minimax method would lead us to ignore information and even make the quest for and coherent use of information—which is the heart of forensic science17 and the legal process—meaningless: whatever the amount of information that we could gather in support of the POI being the source of the trace (proposition θ1), compared to the alternative proposition of an unknown person being the source of the trace (proposition θ2), the minimax method would instruct us not to individualize. Thus, as an intermediate conclusion, we can retain that one’s preferences among decision consequences (i.e. what one prefers) are not the only essential feature of decision-making. It is equally essential to consider one’s beliefs about the states of nature (i.e. what one thinks is the case), conditioned by all the information I available at the time when a decision needs to be made. A relevant question, thus, is how we can make this intuition formally precise. The next section will deal with this topic in more detail (see also Appendix A for technical developments). This intermediate conclusion is not meant, however, to dismiss the minimax method in principle. It may find consideration in contexts where forensic scientists either are not able, for practical reasons, to get acquainted with the relevant information for assessing their personal probabilities, or they are not allowed, in principle, to have access to such information, so that they must take a decision under a ‘veil of ignorance’, so to speak (see Appendix B for an example). 2.3 Decision-making in the individual case (II): decision taking into account uncertainty about states of nature In the previous section, we have considered a decision criterion that is based on the highest loss that may incur for each decision. However, we have concluded that this may be unsatisfactory to the extent that there is no clear acknowledgment of the fact that there is more than one potential outcome associated with each decision and that we have case specific probabilities and losses for those outcomes. Taking these elements, a step further leads to the idea of weighing, for each decision, the loss of each consequence (not only a decision’s worst consequence) by the probability of its occurrence, and summing these weighted losses. The result is known as the mathematical concept of expected loss and provides a formal criterion for the ‘goodness’ of a decision: it is a criterion that allows us to quantify the loss we ‘expect’ from a particular decision. In our case, we work with expected losses, rather than utilities (for which one can also compute the expectation), because we chose to characterize decision consequences in terms of losses. Further, since we can consider the expected loss for each decision, we have a criterion that allows us to compare different decisions, and then make a choice based on the result of this comparison. The decision rule here is to select the decision that offers the minimum expected loss. Note that this is a difference with respect to the minimax rule discussed in the previous section, which considers the decision that ensures minimal loss in the worst case. Although the minimax method can be applied easily in the contexts considered here, because it only requires one to look at the loss matrix and sort out the appropriate decision, the expected loss method is slightly more technical, but only at first sight. The reason for this is that it involves sums of probability-weighted losses. These may not be readily obtained without computational effort. However, as shown in the development in Appendix A, by choosing a suitable loss function and keeping the decision space limited (here a 2 × 2 decision table), it is possible to confine the considerations to a comparison between the ratio of probabilities for states of nature (i.e. odds) and the ratio of losses associated with adverse consequences. Specifically, applied to our example of forensic individualization, the criterion given in Appendix A (Equation 3.2) says to choose decision d1 (individualize) if and only if the odds in favour of the POI being the source of the trace are greater than the ratio of the loss of a false identification, L(C12), and the loss of a missed individualization, L(C21). In numerical terms, this criterion thus says that when the odds in favour of the POI, rather than an unknown person, being the source of the trace is, e.g. a thousand, then the decision to individualize (d1) is optimal if and only if the loss associated with a false individualization is less than a thousand times greater than the loss associated with a missed individualization. This way of stating the decision criterion has the advantage of avoiding technicalities deriving from determining expected values. Instead, thinking is directed towards comparing orders of magnitudes of, on the one hand, probabilities, and on the other hand, of losses associated with decision consequences.18 The above decision-theoretic account—also called the Bayesian decision-theoretic criterion (see Appendix A)—can be interpreted in the wider sense of capturing the essence of the common saying ‘the more is at stake the more you ought to be convinced (or, sure)’, identifying ‘stakes’ with the relative losses of adverse decision consequences and ‘being convinced’ with the odds in favour of the proposition of interest (here, the proposition that the POI is the source of the trace). By invoking the Bayesian decision-theoretic criterion, we merely formalize quantities and their relationships that we are already using. It provides us with a framework to clearly articulate, with rigorous and precise terms, the inevitable conceptual issues that underlie forensic individualization. Specifically, this account shows that probabilities regarding the propositions of interest (i.e. the POI versus an unknown person being the source of the trace) are a relevant consideration, though not in isolation, but in an interaction with the decision-maker’s preferences among adverse decision consequences. 3. Discussion and conclusions It has become increasingly popular to refer to forensic conclusions reported by scientists as decisions. For example, the report of the PCAST (2016) uses expressions such as ‘examiners’ decision’ (p. 9), ‘identification decisions’ (p. 100) and ‘experts’ decisions’ (p. 101). In the scholarly work, the term ‘forensic decision making’ (Dror, 2017) can be found. Once it is admitted that expert conclusions are decisions, the natural consequence seems to be that forensic scientists—including the methods and techniques they are using—should be subjected to empirical testing, in order to see how ‘good’ scientists are in their decision-making. Specifically, the PCAST (2016) report concludes: … appropriately designed … studies are required, in which many examiners render decisions about many independent tests (typically, involving ‘questioned’ samples and one or more ‘known’ samples) and the error rates are determined. (p. 143) While we unreservedly agree with the view that such empirical information is valuable for characterizing general scientific validity of a given forensic branch, and of individual examiners, we also emphasize that there is room for a moment of reflection and think about the following question: Is the above understanding of decision and its empirical investigation really the end of the matter? As exemplified below, the forensic community appears to be rather unclear about what ought to be done with data of empirical validation studies. Suppose we had data to determine performance characteristics19 of, e.g. a fingermark examiner, to an extent that we would consider sufficient for that expert to be allowed to give testimony. Can we expect then that people will know what exactly to do with such data? It appears that the answer is negative, as even some academic writers offer only vague replies. For example, Edmond et al. (2014) rightly ask ‘… what could the juror conclude on the basis of the experiment …?’ (p. 10), but then only note that ‘[t]he juror can reason with this information to infer something about the particular case—to deduce the particular from the general’ (p. 10). Strangely, however, these authors leave the term ‘something’ essentially undefined, while there is standard theory—particularly well established in medical literature—on how to use diagnostic performance measures (e.g. sensitivity and specificity) for inference about hypotheses of interest. An example is given by Kaye (1987), and Thompson et al. (2003) provide a discussion in the context of DNA profiles and on how to take into account the potential of error. Yet other authors use empirical data to inform judgement about whether forensic scientists can be regarded as experts (Towler et al., 2018). As mentioned above, this is relevant for general considerations but of quite limited help for decisional questions in the individual case. Turning now to the question of decision, we can ask again: do diagnostic performance measures tell recipients of expert information anything about how to make a ‘good’ decision in a particular case at hand? And similarly, a forensic expert may ask: how can empirically derived performance measures help me make a sound decision (e.g. an identification decision) in the particular case I face? The truth is that general diagnostic performance measures are no direct guides to decisions in the individual case. For example, even though the proportion of false positive answers by an expert—or a group of experts—in a series of experiments under controlled conditions was low, there is no way of asserting then that an identification decision in a particular case at hand is ‘probably’ correct (i.e. the proposition that the potential source of the trace is in fact the source being true), or was the optimal choice to be made. Claiming the contrary would mean to confuse decision empiricism with the logic of decision (see Section 2.1). The reason for this is that standard diagnostic performance measures are defined in terms of the proportion of correct determinations in a series of experiments under controlled conditions (i.e. comparisons with respect to ground truths), whereas the qualification and quantification of the ‘goodness’ of a decision in an individual case—as understood in this article (Section 2.3 and Appendix A)—is measured on the basis of the decision-maker’s case-specific preferences among decision consequences and the assessment of the probability of obtaining the various consequences. We have argued throughout this article that the former, decision empiricism, is an important and necessary, but not sufficient preliminary of the latter—decision logic. The two are connected, though, through inference and statistics, which take on the task of quantifying the probative strength of an expert’s report (e.g. a reported correspondence between a trace and a reference print) with respect to competing propositions of interest (e.g. the POI is the source of the trace versus some unknown person is the source), and taking into account relevant expert performance descriptors (e.g. from controlled empirical studies). This seeks to assist decision-makers—which may be scientists themselves—in revising their beliefs about the main propositions in view of the expert’s report. But then, answering the question of how to decide, in the case at hand, is screened off from the previously considered empirical elements: instead, decision is a consideration of both one’s current belief state, given all the information available at the time when the decision needs to be made, and one’s goals, expressed in terms of preferences among the possible decision consequences. The decision-theoretic framework that we have sketched here (Section 2.3 and Section A) makes this view formally precise, but remains entirely general. For example, the theory only emphasizes (i) that measuring uncertainties by probabilities, and undesirability of adverse consequences by means of losses is crucial, and (ii) how these elements ought to be combined to ensure coherence. But the theory does neither say whose probabilities and losses (utilities) are meant, nor how they ought to be assigned. It is clear, thus, that individualization in this account is not confined to only scientific or statistical considerations, but requires a broader perspective, calling for an active role of other participants in the legal process (Cole, 2014). Funding This research was supported by the Swiss National Science Foundation through grant No. BSSGI0_155809 and the University of Lausanne. Some of the writing of this paper was carried out during a visiting research stay of Alex Biedermann at New York University School of Law. Footnotes " 1 Historically, individualization has been used as a term for the forensic source attribution process (Kirk, 1963), though practice developed a preference for the term identification. Strictly speaking, however, an identification means the assignation of an item to a particular class or set (Kingston, 1965). We will use the terms identification and individualization interchangeably throughout this article, but with a preference for the term individualisation. " 2 Acceptance speech of I. W. Evett for his doctorate honoris causa, received on the occasion of the 100th anniversary of the School of Criminal Justice, University of Lausanne, 24 June 2009. Text in brackets added by the authors. " 3 On the same position, see Champod et al. (2016) for a recent review, and Recommendation 3.7 of the report of the Expert Working Group on Human Factors in Latent Print Analysis (2012): ‘Because empirical evidence and statistical reasoning do not support a source attribution to the exclusion of all other individuals in the world, latent print examiners should not report or testify, directly or by implication, to a source attribution to the exclusion of all others in the world.’ (p. 72) " 4 See, for example, the SWGFAST (Scientific Working Group on Friction Ridge Analysis, Study and Technology) ‘Guideline for the Articulation of the Decision-Making Process for the individualisation in Friction Ridge Examination (Latent/Tenprint)’, https://www.nist.gov/sites/default/files/documents/2016/10/26/swgfast_articulation_1.0_130427_1.pdf (page last visited July 17th 2018): ‘individualisation is the decision by an examiner that there are sufficient features in agreement to conclude that two areas of friction ridge impressions originated from the same source.’ (Paragraph 10.2.2.) The recent AAAS report (Thompson et al., 2017) also systematically refers to examiners’ conclusions as decisions. " 5 There are many problems associated with the traditional practice of reporting individualization conclusions, especially where they amount to categorical conclusions. A discussion of these problems is beyond the scope of this article, but see, for example, Champod (2015) for a recent review. " 6 This may include the whole examination process, extending also to aspects such as the chain of custody. " 7 This leads to data that can be summarized in terms of standard diagnostic performance measures, such as rates of false positives and false negatives, and related notions such as sensitivity and specificity—widely used, e.g. in the medical literature. " 8 Note also that studies ‘… in which many examiners render decisions about many independent tests (typically, involving “questioned” samples and one or more “known” samples) and the error rates are determined’ (PCAST, 2016, p. 5–6) are considered to be part of the requirements to establish scientific validity. " 9 On the notion of uncertainty as it is understood in this paper, see also footnote 13. " 10 Term ‘qualifying’ is used here to express a qualitative preference, which is an assertion that is less strong than one based on quantitative terms. For example, one decision may be preferred to another decision qualitatively, i.e. without quantifying how much one decision is preferable compared to another. The term ‘qualifying’ can also be understood as the attribution of a quality to a decision, e.g. specifying the conditions under which one decision can be qualified as being preferable to (or, in more general language, ‘qualified as better than’) another decision. " 11 We leave aside here categorical conclusion schemes of the kind ‘if A is observed, then conclude B’ because they often violate logical considerations or are unbalanced (i.e., they do not consider at least one alternative proposition). As an example, see applications in forensic toxicology, such as the analysis of alcohol markers in hair (Kintz, 2015), where results of analyses are compared against cut-off values in order to reach a predefined conclusion. " 12 On the importance of general performance measures, see also Lander’s reply to the ANZFSS: ‘The reliability of evidence presented in courts primarily depends on whether the underlying forensic feature-comparison method has been empirically shown to be scientically reliable – and only then on the details of the case.’ (Lander, 2017, p. 2) " 13 Note that in some branches, such as operational research, decision analysis with quantification of uncertainty about states of nature, using probability, is known as decision analysis ‘under risk’, and decision analysis without taking into account probability is referred to as decision ‘under uncertainty’. In this paper, we use the expression ‘decision analysis under uncertainty’ for situations in which probability is taken into account (e.g. Lindley, 2014). The reason for this is that uncertainty is quantified by probability and, in its personalist interpretation, is—by definition—always available to the decision-maker. " 14 In a more general development, decision d2 can be broken down to decisions ‘inconclusive’ and ‘exclusion’ (e.g. Biedermann et al., 2008). " 15 Note that a different scale may be chosen for the loss function. " 16 See, for example, Biedermann et al. (2008, 2016) and Taroni et al. (2010) for further discussion on the elicitation of loss values for a missed individualization. " 17 Especially forensic inference and statistics. " 18 Note, however, that if the decision problem is more complex, in particular, when the number of available courses of action is >2 (e.g., one may consider a third decision d3, such as reporting ‘inconclusive’) and/or a loss function not necessarily bounded between 0 and 1, the decision-theoretic criterion does not simplify to the above-mentioned comparison in terms of odds and loss ratio (see also Appendix A), and some additional computational effort is required. " 19 See, for example, Tangen et al. (2011) on data regarding the proportion of correct and erroneous conclusions of fingermark examiners in a series of experiments conducted under controlled conditions. " 20 See Biedermann et al. (2008) for a development with an additional decision called ‘inconclusive’, alongside with decisions ‘individualization’ and ‘exclusion’. " 21 Note that reporting in this example only means to inform relevant authorities about the observed correspondence between the DNA profile of the trace and the profile of the person of interest. Such a report does not individualize the person of interest or otherwise convey a conclusion regarding inference of source. " 22 This assignment of losses is different from the one considered in Appendix A (and Table 3). References Berger J O . Statistical Decision Theory and Bayesian Analysis . Springer , New York , second edition, 1985 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Bernardo J M , Smith A F M. Bayesian Theory . John Wiley & Sons , Chichester , second edition, 2000 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Biedermann A , Bozza S, Taroni F. Decision theoretic properties of forensic identification: underlying logic and argumentative implications . Forensic Science International , 177 : 120 – 132 , 2008 . Google Scholar Crossref Search ADS PubMed WorldCat Biedermann A , Bozza S, Taroni F. The decisionalization of individualization . Forensic Science International , 266 : 29 – 38 , 2016 . Google Scholar Crossref Search ADS PubMed WorldCat Buckleton J S , Bright J-A, Taylor D. Forensic DNA Evidence Interpretation . CRC Press , Boca Raton, FL , second edition, 2016 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Champod C . Identification/individualisation, overview and meaning of ID. In Siegel J H, Saukko P J, Knupfer G C, editors, Encyclopedia of Forensic Science pages 1077 – 1084 . Academic Press , San Diego , 2000 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Champod C . Fingerprint identification: Advances since the 2009 NAS report . Philosophical Transactions of the Royal Society B: Biological Sciences , 370 : 1 – 10 , 2015 . Google Scholar Crossref Search ADS WorldCat Champod C , Lennard C, Margot P, Stoilovic M. Fingerprints and Other Ridge Skin Impressions . CRC Press , Boca Raton , second edition, 2016 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Cheng E K . Reconceptualizing the burden of proof . The Yale Law Journal , 122 : 1254 – 1279 , 2013 . OpenURL Placeholder Text WorldCat Chernoff H , Moses L E. Elementary Decision Theory . John Wiley & Sons , New York , 1959 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Cole S A . Individualization is dead, long live individualization! Reforms of reporting practices for fingerprint analysis in the United States . Law, Probability and Risk , 13 : 117 – 150 , 2014 . Google Scholar Crossref Search ADS WorldCat Dror I . Cognitive neuroscience in forensic science: understanding and utilizing the human element . Philosophical Transactions of the Royal Society of London B: Biological Sciences , 370 : 1 – 8 , 2015 . Google Scholar Crossref Search ADS WorldCat Dror I . Human expert performance in forensic decision making: Seven different sources of bias . Australian Journal of Forensic Sciences , 49 : 541–547, 2017 . OpenURL Placeholder Text WorldCat Edmond G , Thompson M B, Tangen J M. A guide to interpreting forensic testimony: scientific approaches to fingerprint evidence . Law, Probability and Risk , 13 : 1 – 25 , 2014 . Google Scholar Crossref Search ADS WorldCat Expert Working Group on Human Factors in Latent Print Analysis . Latent print examination and human factors: Improving the practice through a systems approach. Technical report , National Institute of Standards and Technology , U.S. Department of Commerce , 2012 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Friedman R D . The Elements of Evidence. West Academic Publishing , Saint Paul, Minnesota , fourth edition, 2017 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Kaplan J . Decision theory and the factfinding process . Stanford Law Review , 20 : 1065 – 1092 , 1968 . Google Scholar Crossref Search ADS WorldCat Kaye D H . The validity of tests: Caveant omnes . Jurimetrics Journal , 27 : 349 – 361 , 1987 . OpenURL Placeholder Text WorldCat Kingston C R . Application of probability theory in criminalistics . Journal of the American Statistical Association , 60 : 70 – 80 , 1965 . Google Scholar Crossref Search ADS WorldCat Kintz P . 2014 consensus for the use of alcohol markers in hair for assessment of both abstinence and chronic excessive alcohol consumption . Forensic Science International , 249 : A1 – A2 , 2015 . Google Scholar Crossref Search ADS PubMed WorldCat Kirk P L . The ontogeny of criminalistics . Journal of Criminal Law, Criminology and Police Science , 54 : 235 – 238 , 1963 . Google Scholar Crossref Search ADS WorldCat Lander E S . Response to the ANZFSS council statement on the President’s Council Of Advisors On Science And Technology Report . Australian Journal of Forensic Sciences , 49 : 366 – 368 , 2017 . Google Scholar Crossref Search ADS WorldCat Lindley D V . Making Decisions . John Wiley & Sons , Chichester , second edition, 1985 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Lindley D V . Subjective probability, decision analysis and their legal consequences . Journal of the Royal Statistical Society, Series A , 54 : 83 – 92 , 1991 . Google Scholar Crossref Search ADS WorldCat Lindley D V . The philosophy of statistics . The Statistician , 49 : 293 – 337 , 2000 . OpenURL Placeholder Text WorldCat Lindley D V . Understanding Uncertainty . John Wiley & Sons , Hoboken, revised edition , 2014 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Nance D A . The Burdens of Proof. Discriminatory Power, Weight of Evidence, and Tenacity of Belief . Cambridge University Press , Cambridge , 2016 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC National Research Council . Strengthening Forensic Science in the United States: A Path Forward. The National Academies Press , Washington, D.C ., 2009 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Parmigiani G . Modeling in Medical Decision Making, A Bayesian Approach . John Wiley & Sons , Chichester , 2002 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Parmigiani G , Inoue L. Decision Theory: Principles and Approaches . John Wiley & Sons , Chichester , 2009 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC PCAST . President’s Council of Advisors on Science and Technology, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods . Washington, D.C ., 2016 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Stoney D A . What made us ever think we could individualize using statistics? Journal of the Forensic Science Society , 31 : 197 – 199 , 1991 . Google Scholar Crossref Search ADS PubMed WorldCat Tangen J M , Thompson M B, McCarthy D J. Identifying fingerprint expertise . Psychological Science , 22 : 995 – 997 , 2011 . Google Scholar Crossref Search ADS PubMed WorldCat Taroni F , Bozza S, Biedermann A, Garbolino G, Aitken C G G. Data Analysis in Forensic Science: a Bayesian Decision Perspective. Statistics in Practice . John Wiley & Sons , Chichester , 2010 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Thompson W C , Taroni F, Aitken C G G. How the probability of a false positive affects the value of DNA evidence . Journal of Forensic Sciences , 48 : 47 – 54 , 2003 . Google Scholar PubMed OpenURL Placeholder Text WorldCat Thompson W C , Black J, Jain A, Kadane J. Latent Fingerprint Examination. Forensic Science Assessments: A Quality and Gap Analysis . American Association for the Advancement of Science , Washington, D.C ., 2017 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Towler A , White D, Ballantyne K, Searston R A, Martire K A, Kemp R I. Are forensic scientists experts? Journal of Applied Research in Memory and Cognition , 7 : 199 – 208 , 2018 . Google Scholar Crossref Search ADS WorldCat Ulery BT , Hicklin A, Buscaglia JA, Roberts MA. Accuracy and reliability of forensic latent fingerprint decisions . Proceedings of the National Academy of Science of the United States of America , 108 : 7733 – 7738 , 2011 . Google Scholar Crossref Search ADS WorldCat Appendix A: Decision theoretic criterion Denote by Pr(θ1|I) and Pr(θ2|I) the decision-maker’s probability for two states of nature θ1 and θ2, called here propositions, or hypotheses, given the information I. Note that the development can also be given with respect to more than two states of nature without much additional effort. As an example in a forensic science context, θ1 and θ2 could refer to the propositions that a POI is, or is not, respectively, the source of a mark or trace (e.g. a fingermark, blood stain) found on a crime scene, or the person whose handwriting is present on a questioned document, and so on. Denote by d1 and d2 two decisions, one of which must be taken in the light of uncertainty about whether θ1 or θ2 is true. There may also be more than two decisions, as is the case for the states of nature θ.20 In a situation involving forensic individualization, d1 and d2 may refer to, e.g. the decisions of considering or not the POI as the source of a particular trace. Let Cij be the consequence of deciding di when the state of nature θj holds, and define the undesirability of a decision consequence Cij in terms of the loss L(Cij). See Table 2 for a summary. Combining probabilities for states of nature and losses for decision consequences allows one to obtain the expected value, associated with a particular course of action (i.e. decision). Specifically, when qualifying decision consequences in terms of losses, this leads to the notion of expected loss EL of decisions di, defined as follows: EL(di)=∑jL(Cij)Pr(θj|I). The principle of minimizing expected loss then says to choose the decision that has the minimum expected loss. The decision criterion for deciding d1 (individualization) over d2 (not individualize) can thus be written as follows: if L(C11)Pr(θ1|I)+L(C12)Pr(θ2|I)︸EL(d1)L(C12)L(C21).(3.2) In other words, the decision-theoretic criterion amounts to a comparison between the odds in favor of θ1 against θ2, and the relative losses associated with the two possible ways of deciding wrongly, i.e. the ratio of the loss of a false individualization to the loss of a missed individualization. This is valid for any loss function of this type. Note that the above criterion is also known as Bayesian decision-theoretic criterion in situations where the probabilities for the states of nature are conditioned by, e.g. additional information (i.e. evidence) E, written as Pr(θ|I,E) ⁠, obtained through Bayesian updating of Pr(θ|I) ⁠. Equation (3.2) is also discussed in legal literature (e.g. Cheng, 2013). A more elaborate version of the formula for general loss/utility functions is discussed in Friedman (2017) and Nance (2016). Presentations in statistical literature can be found, e.g., in Berger (1985), Bernardo and Smith (2000), Parmigiani (2002) and Parmigiani and Inoue (2009). Suppose further that a (0, 1) scale is chosen, i.e. k = 1. The advantage of this is that the assignment of numerical values to intermediate consequences can be made through a coherent comparison with a standard that is given by two reference consequences (Lindley, 1985). The expected loss of decision d1 then reduces further to EL(d1)=Pr(θ2|I) ⁠, since a false individualization, C12, is the worst consequence, with the maximum loss assigned, i.e. L(C12) = 1. The (Bayesian) decision criterion then further simplifies to if Pr(θ2|I)︸EL(d1)