The Analysis of Data and the Evidential Scope of Neuroimaging Results

The Analysis of Data and the Evidential Scope of Neuroimaging Results Abstract The sceptical positions philosophers have adopted with respect to neuroimaging data are based on detailed evaluations of subtraction, which is one of many data analysis techniques used with neuroimaging data. These positions are undermined when the epistemic implications of the use of a diversity of data analysis techniques are taken into account. I argue that different data analysis techniques reveal different patterns in the data. Through the use of multiple data analysis techniques, researchers can produce results that are locally robust. Thus, the epistemology of neuroimaging must take into consideration the details of the different data analysis techniques that are used to evaluate neuroimaging data, and the specific theoretical aims those techniques are deployed towards. 1 Introduction 2 Scepticism about Neuroimaging 3 Data Analysis and Evidence 4 Deconvolution and Pattern Classification Analysis 4.1 Deconvolution analysis 4.2 Region of interest selection 4.3 Pattern classification analysis 5 The Strength of Multiple Analyses 6 Conclusion 1 Introduction The debate amongst philosophers about the epistemic status of neuroimaging begins with van Orden and Paap’s ([1997]) criticism of the logic of subtraction, the primary technique used to analyse neuroimaging data at the time their paper was published. Philosophers have continued to debate the strengths and weaknesses of neuroimaging as a tool for investigating the relationship between cognitive functions and the brain (Uttal [2001], [2011]; Hardcastle and Stewart [2002]; Roskies [2010a]; Klein [2010a]; Aktunç [2014]). I argue that since most critics have not taken into account the significance of the diversity of data analysis techniques used to analyse neuroimaging data, the scepticism towards neuroimaging technology is misplaced. Many of the sceptical positions are grounded on careful analyses of subtraction and subtraction logic (Uttal [2001]; Hardcastle and Stewart [2002]; Klein [2010a]). While philosophers are rightly critical of the ability of subtraction analyses, on their own, to support claims about the relationship between cognitive functions and the brain, subtraction is only one kind of data analysis technique used to analyse neuroimaging data. Given that the development of new data analysis techniques has been a significant driver of progress in neuroimaging over the last decade and a half, a narrow focus on subtraction is a problem for any argument that aims to shed light on the range of hypotheses that neuroimaging technology can discriminate between.1 Indeed, some recent contributors have noted that the role and impact of multivariate analyses has not been fully appreciated in this debate (Klein [2010b]; Roskies [2010a]). However, while they acknowledge that techniques other than subtraction are important to consider, they do not themselves take up the task of exploring how the use of other analysis techniques changes the evidence available in neuroimaging research. My aim here is to begin to fill this gap by demonstrating that when evaluating the hypotheses and claims that neuroimaging technology can and cannot support, it is important to take into account the contribution of new analysis techniques, such as pattern classification analysis, and to consider how multiple analysis techniques can be brought together to strengthen the evidence provided by neuroimaging technologies. I proceed as follows: In Section 2, I review the debate about the epistemic status of neuroimaging and specify the categories of hypotheses that philosophers claim neuroimaging data can and cannot support. In Section 3, I present a conceptual framework for evaluating the strength and content of evidence produced via a data analysis technique. In Section 4, I apply this conceptual framework to a study that uses multiple analysis techniques to generate evidence in support of a hypothesis and where critics of neuroimaging would argue that the data do not support this hypothesis, I show how it can. The evidence is stronger than it appears because one analysis technique is used to validate a crucial assumption required by the other. In Section 5, I argue that different analysis techniques provide different evidence, and that the use of multiple analysis techniques to examine the same data provides experimental results with a kind of local robustness. 2 Scepticism about Neuroimaging Functional magnetic resonance imaging (fMRI) allows neuroscientists to study the human brain through non-invasive measurements of metabolic activity (see Ashby [2011] for a technical introduction). Experiments using fMRI typically require a participant to perform a cognitive task—such as identifying faces as familiar or unfamiliar (as in Martin et al. [2013])—while the scanner measures changes in the blood oxygenation level dependent (BOLD) signal throughout their brain.2 The scanner does this by dividing the brain into voxels (volumetric pixels), which are 1–3 mm cubes of brain matter, and measuring the BOLD signal in each voxel over time. The value of the BOLD signal is the ratio of oxygenated to deoxygenated haemoglobin in a voxel at the time of scanning. Since it tracks properties of blood flow, the BOLD signal is often referred to as the haemodynamic signal. After a scanning session the investigators will have a data set that consists of BOLD signal values for each voxel, labelled with the task condition that the participant was performing when those data were collected. Neuroimaging data have historically been analysed using subtractive analyses. In the simplest case of subtraction, two sets of neuroimaging data are required, each obtained while the participant performs a different task. The goal of subtractive analysis is to identify the difference in BOLD signal that corresponds with the cognitive difference between the tasks. Roughly, the BOLD signal values in each voxel associated with one task are subtracted from the values in the same voxels associated with the other task. This analysis is classified as univariate because each voxel is treated independently of every other voxel and so the data needs to be corrected for multiple comparisons. The result is a difference map that identifies the voxels (or regions) where brain activity differs significantly between the task conditions. To evaluate if the difference effects can be attributed to the population and not just a given subject in the study, a second-level analysis is carried out (typically random effects analysis; see Friston et al. [1999]). If this analysis shows the difference to be consistent across subjects, then the cognitive difference between the tasks is attributed to the regions of the brain shown to be differentially active. To illustrate the conceptual logic of the process, consider van Orden and Paap’s ([1997]) toy example in which task A consists of reading two words, and task B of reading two words and then judging whether they rhyme. The resulting subtraction between the imaging data obtained during task A and the data obtained during task B is taken to indicate the regions of the brain that are involved in the cognitive process that underlies the rhyming judgement. Van Orden and Paap argue that the subtractive method cannot be used to locate where in the brain cognitive functions ‘reside’ because the reliability of subtractive inferences depends on several assumptions they believe are unlikely to be true. In particular, the reliability of subtraction with respect to localizing cognitive functions to regions of the brain requires that ‘one must begin with a “true” theory of cognition’s components, and assume that the corresponding functional and anatomical modules exist in the brain’ (van Orden and Paap [1997], p. S86). These assumptions, they argue, follow from the fact that a valid subtraction requires that the task-difference precisely isolates a single cognitive component, which can only be the case if the cognitive theory used to design the tasks is accurate (p. S87). Additionally, they argue that functional localization using subtraction further requires that those modules are feed-forward ‘to ensure that the component of interest makes no qualitative changes “upstream” on shared components of experimental and control tasks' and that the contrasted tasks ‘invoke the minimum set of components for successful task performance’ (p. S86). William Uttal ([2001]) engages in a similar kind of sceptical attack on neuroimaging. Building on van Orden and Paap’s critique, Uttal compares neuroimaging to phrenology and argues, among other things, that it requires the false assumption that cognitive processes are managed and maintained by isolable modules of the brain. Valerie Hardcastle and Matthew Stewart ([2002], p. S80) express a similar type of scepticism in arguing that the logic of neuroimaging is viciously circular and conclude that ‘neuroscientists cannot use the data they get to support their claims of function’ because ‘they are assuming local and specific functions prior to gathering appropriate data for the claim’. These critiques all point to a vicious circularity in the inference from the results of subtraction analysis to claims about the localization of cognitive function. Some philosophers have defended cognitive neuroscience from these criticisms. For instance, Landreth and Richardson ([2004]) responded to Uttal’s arguments in part by clarifying the details of how neuroimaging data is processed, analysed, and interpreted. Additionally, Roskies ([2010b]) has rejected van Orden and Paap’s characterization of subtraction. She argues that subtraction results are just one part of a more complex scientific procedure that she calls functional triangulation, whereby ‘information from other task comparisons and other studies is brought to bear on the interpretation of experimental data’ ([2010b], p. 641). She also argues that characterizing neuroimaging as solely aimed at localizing cognitive functions to specific brain regions, as the three critics noted above do, is not representative of all uses of neuroimaging data. After providing examples of the variety of theoretical aims neuroimaging and subtraction methods are put towards, she concludes that ‘without recognizing the diversity of the immediate goals of imaging studies, it is impossible to do justice to the technique’ (p. 639). Indeed, the recent development of new multivariate analysis techniques, which were introduced to discriminate between modular and distributed accounts of the role that the ventral visual pathway plays in visual perception (Haxby et al. [2001]), has motivated cognitive neuroscientists to investigate hypotheses about the content of brain activity.3 In a review of the theoretical uses of multivariate techniques, the authors predict that ‘the enhanced sensitivity and information content provided by these methods should greatly facilitate the investigation of mind–brain relationships by revealing both local and distributed representations of mental content, functional interactions between brain areas, and the underlying relationships between brain activity and cognitive performance’ (Tong and Pratte [2012], p. 503). The study of mental content, neural representations, and the characterization of these in terms of distributed patterns of brain activity are very different theoretical goals than that of the localization of cognitive functions to parts of the brain. This is grist for Roskies’s mill. Whenever critics of neuroimaging research treat it solely in terms of localization, the critics have failed to appreciate the variety of theoretical applications that the technology is put towards. Furthermore, this theoretical shift, which was made possible by the development of data analysis techniques that treat neuroimaging data as a multidimensional pattern, illustrates the importance of evaluating analysis techniques other than subtraction when evaluating the epistemic value of neuroimaging technology. Despite these defences of neuroimaging, and the theoretical and analytic advances in the field of cognitive neuroscience, the general trend towards scepticism and the focus on subtractive analyses has persisted. While more recent conclusions tend to be on the milder side of scepticism, philosophers continue to challenge the ability of neuroimaging technology to provide evidence that supports the claims neuroscientists use the technology to investigate. Additionally, they continue to do so on the basis of an evaluation of subtraction and subtraction logic. I will take one of the most recent contributions to this debate (Aktunç [2014]) to be a representative example. In line with the sceptical tradition, Aktunç argues that while neuroimaging data are useful, they cannot be used to support the kinds of hypotheses that cognitive neuroscientists use it to support. Aktunç distinguishes between two types of hypotheses that neuroimaging data might be brought to bear on. There are haemodynamic hypotheses, which relate BOLD signal activity to the performance of cognitive tasks or parameters of the tasks. There are also theoretical hypotheses, which relate cognitive processes to the brain structures that implement them (this distinction is from Huettel et al. [2008]). To illustrate this distinction, consider the following example: The claim that patterns of BOLD signal activity in both PrC and PhC are sensitive to differences between faces, buildings, and chairs (Martin et al. [2013], p. 10921) is a haemodynamic hypothesis. The tasks used in this study require participants to judge images of faces, buildings, and chairs as familiar or novel. Thus, this claim is about the relationship between patterns of BOLD signal activity and features of stimuli used in the cognitive task. After discussing these results, the researchers advance a theoretical hypothesis. They claim that the ‘findings indicate that both PrC and PhC contribute to the assessment of item familiarity’ (Martin et al. [2013], p. 10922). This is a theoretical hypothesis because it identifies two brain structures, PrC and PhC, and specifies a cognitive process that they implement, the assessment of item familiarity. It is worth noticing the inferential relationship between these two types of hypotheses: the theoretical hypothesis is inferred from the haemodynamic hypothesis. Where a haemodynamic hypothesis specifies BOLD signal activity, a theoretical hypothesis specifies a structure of the brain. Likewise, where a haemodynamic hypothesis specifies a cognitive task, a theoretical hypothesis specifies a cognitive process. Given this distinction between haemodynamic and theoretical hypotheses, Aktunç uses Deborah Mayo’s error statistical framework to argue that neuroimaging data can only provide a severe test of haemodynamic hypotheses. On the simplest interpretation of Mayo’s severity criterion, a hypothesis passes a severe test just in case (i) the data agrees with the hypothesis and (ii) there is a sufficiently high probability that if the hypothesis were false, then the data would not agree with the hypothesis (Mayo [2005], p. 99). Aktunç ([2014], p. 969) argues that while neuroscientists may be interested in providing evidence that supports theoretical hypotheses, neuroimaging only has evidential import with respect to haemodynamic hypotheses. This is because a difference in mean BOLD signal, which is the pattern identified by subtractive analyses, can be embedded in a statistical significance test. From this, Aktunç ([2014], p. 969) argues that ‘using error probabilities, we can find out whether specific fMRI experiments constitute a severe test of specific hemodynamic hypotheses. Thus, fMRI data do have evidential import for hemodynamic hypotheses’. His argument that theoretical hypotheses cannot be subjected to severe testing relies on two premises. First, there is the ‘fact’ that ‘fMRI obviously does not test for the existence of cognitive modules or functions as defined by theories of cognitive science’ (p. 969) because ‘fMRI gives us data only on hemodynamic activity’ (p. 968). The second premise consists in the arguments made in the existing sceptical literature (specifically, Uttal [2001]; Hardcastle and Stewart [2002]; Klein [2010a]). Thus, according to Aktunç, neuroimaging data cannot support theoretical hypotheses because (i) the data is indirectly related to the content of those hypotheses and (ii) critiques of subtraction analysis show that such inferences are viciously circular, unstable, or otherwise unreliable. Neither of these premises can support the derived conclusion. Inferences from neuroimaging results to theoretical hypotheses, like most inferences from measurement results to theoretical claims, are ampliative; haemodynamic activity is at best an indirect measure of neural activity (Logothetis [2008]), and task performance is at best an indirect indicator of cognitive functions (Poldrack [2010a]). However, the indirect relationship between the data and content of the theoretical hypothesis is not sufficient to support the claim that neuroimaging cannot provide evidence for hypotheses that relate cognitive functions to brain activity. Whether these inferences are warranted depends on the particular theoretical hypotheses that are advanced, and whether the assumptions required by the inferences are justified. Indeed, this is how van Orden and Paap originally argued against the logic of subtraction; it was not on the basis of the indirectness of the data itself, but on the basis of the specific assumptions required to infer from the data to a theoretical hypothesis of a certain kind. However, no matter where you stand on the reliability of inferences from subtraction analysis to claims about the localization of cognitive functions, these arguments cannot be grounds for a sweeping claim about the evidential scope of neuroimaging data. Just because one data analysis technique has certain limitations does not mean that the data themselves are similarly limited. Indeed, neuroimaging data can and are analysed with other analysis techniques that reveal different patterns and correlations in the data. Whether or not neuroimaging data provide evidence in support of theoretical hypotheses depends on how the other analysis techniques help neuroscientists to mediate the inferential gap between haemodynamic and theoretical hypotheses. Inferences to theoretical hypotheses from neuroimaging data can be, and in practice are, strengthened by the use of multiple analysis techniques. The specific case I consider concerns analysis techniques used in sequence as a way to validate assumptions required by the primary analysis procedure. In the final section, I distinguish this use of multiple analyses from functional triangulation as discussed by Roskies, in which multiple independent analyses provide convergent evidence for a hypothesis. In the next section I provide a framework for evaluating the kinds of information about theoretical hypotheses that data analysis techniques provide. 3 Data Analysis and Evidence The sceptical position reviewed in the previous section is a claim about the kinds of hypotheses neuroimaging data can and cannot support. According to sceptics, it can support haemodynamic hypotheses, which specify a relation between features of the data. It cannot support theoretical hypotheses, which specify a relation between the phenomena that those features are taken to indicate. Whether it is used to investigate a haemodynamic or theoretical hypothesis, neuroimaging data needs to be manipulated to reveal correlations between features of the data that are relevant to the hypothesis under investigation. This is the function of data analysis techniques such as subtraction and pattern classification analysis. Data analysis techniques transform the data produced by experimentation into evidence suitable for statistical analysis. These transformations reveal patterns and correlations between features of the data, which are then taken to be evidence in support of a hypothesis. Bogen and Woodward’s distinction between data and phenomena is a useful place to begin thinking about this process. Broadly speaking, they characterize data, which are the result of the interaction between experimental design, implementation, and measurement, as ‘idiosyncratic to particular experimental contexts, and typically cannot occur outside of those contexts’ (Bogen and Woodward [1988], p. 317). Phenomena, on the other hand, ‘have stable, repeatable characteristics which will be detectable by means of a variety of different procedures, which may yield quite different kinds of data’ (p. 317). On this view, data provide evidence for claims about phenomena, while claims about phenomena provide evidence for theories. Bogen and Woodward ([1988], pp. 309–10) illustrate this by considering how one might determine the melting point of lead. To do so, a researcher might take several measurements of a sample of lead just after it melts. The data in this case are a collection of temperature measurements. These temperature measurements provide evidence about the melting point of lead, which is a claim about a phenomenon. The data are idiosyncratic because the result of each temperature measurement depends on a complex network of causal interactions, many of which are not related to the phenomenon of interest. The value of each temperature measurement will be influenced by features of the thermometer used, the heating apparatus, the sample of lead, the time of day, the ambient temperature, and more additional causal factors than can be named. After collecting sufficiently many measurements, the researcher averages them and on the basis of the value of that average, makes a claim about the melting point of lead. Notice that it is not the individual temperature measurements but the average value of the temperature measurements that provides evidence in support of a claim about the melting point of lead. This calls attention to a general feature of scientific practice: the individual data points, which are the products of specific runs of an experiment, need to be transformed to reveal their evidential value. Typically, this involves eliminating the effects of factors that contribute to the value of specific data points that are not relevant to the theoretical question or hypothesis under investigation. With the influence of these factors still in place, the data speaks only to the melting point of this sample of lead, at this time, as measured with this thermometer. Factors such as those arising from the peculiar features of the thermometer are irrelevant to the melting point of lead insofar as they distort or conceal patterns in the data that reflect the ‘true’ melting point of lead. After data are produced, they are manipulated so that the patterns relevant to the phenomenon of interest are revealed and the irrelevant patterns are suppressed. Averaging the temperature measurements of melted lead is intended to suppress the patterns in the data caused by the irrelevant causal factors that contribute to the value of each specific data point. Other examples of manipulations that suppress irrelevant patterns are noise reduction procedures and manipulations that remove the effect of measurement artefacts. Averaging, as well as more complex analytic techniques such as those discussed in detail below, transform data so that patterns relevant to the phenomenon in question are revealed. The result of these manipulations is taken to be evidence for one or more claims about the phenomenon. A data analysis technique, then, is a series of data manipulations or transformations that clarify the evidential import of the data.4 Different data analysis techniques can be distinguished by the data points that they operate on and by the specific transformations of the data they involve. For example, univariate and multivariate techniques can be distinguished by the data points that they manipulate. Univariate techniques, such as subtraction, treat voxels as independent variables, while multivariate techniques, like pattern classification analysis (discussed in detail below) and representational similarity analysis (Kriegeskorte and Kievit [2013]), treat the data as having many dependent variables. Data analysis techniques that operate on the same class of data points, such as these two multivariate techniques, can be distinguished by the particular manipulations they apply to the data. For example, pattern classification analysis uses a machine learning decision procedure to classify the data, whereas representational similarity analysis uses a measure of similarity to compare brain activity between task conditions. Data manipulations are important because they transform otherwise complex data into a form that investigators can interpret and statistically analyse (Good [1983], pp. 285–6).5 Each manipulation, by virtue of the transformation that it makes, imposes assumptions on the result. These assumptions limit what the result can be taken as evidence about. Just as van Orden and Paap identified several assumptions required by the use of subtractive analyses, most data manipulations require researchers to make assumptions about the data. For example, a standard manipulation performed on neuroimaging data is the removal of patterns caused by magnetic field drift. Magnetic resonance scanners use the variations in a magnetic field to detect the BOLD signal, and the magnetic field in some scanners slowly changes during the course of scanning. Manipulating data such that the effects of field drift are removed requires the assumption that the data are corrupted by magnetic field drift. If the procedure is used on data produced by a scanner that does not have a field drift, then the procedure would introduce artificial patterns into the data. It would do so because the required assumption, that the scanner has a field drift with specific parameters, is not true of the data. In the case of field drift correction, the assumption can be validated by measuring the field drift of a scanner. This simple example illustrates how data manipulations entail or require assumptions to be made of the data, and shows that treating a specific data manipulation in isolation from the rest of the experimental process can make the evidential status of the data appear weaker than it in fact is. Different analysis techniques operate on different data points, implement different manipulations, and require making different assumptions of the data. This is how they reveal (and suppress) different data patterns. For example, subtraction reveals correlations between average amplitudes of the BOLD signal and task performance. Techniques like subtraction, when they include processes for smoothing and averaging the signal, suppress information about differences in activity between voxels within a region. Thus, some subtraction analyses are unable to reveal correlations between the co-ordinated activity of groups of voxels that preserve the same level of average activity between tasks. On the other hand, multivariate techniques, such as pattern classification analysis, correlate distributed patterns of BOLD signal activity with task performance. Pattern classification analysis is sensitive to distributed activity patterns that univariate techniques, like subtraction, cannot detect. However, multivariate techniques are less sensitive to one-dimensional effects that covary with stimulus features, to which univariate techniques are very sensitive (for a detailed discussion of the uses of these techniques, see Davis and Poldrack [2013]). By leveraging their differences, investigators can use several data analysis techniques together to overcome the inferential limitations of a particular technique. The limitations of a technique tend to derive from the assumptions that the technique requires. If assumptions can be identified, depending on the nature of those assumptions, other data analysis techniques can be used to validate them. In this way, the use of multiple analysis techniques on the same data can strengthen an inference from the result of one analysis to the target hypothesis by providing a clearer picture of the evidential import of the data. Specifically, where a given analysis technique provides evidence that can support a haemodynamic hypothesis, the inference from that hypothesis to a theoretical hypothesis will require investigators to make further assumptions about the data. Since different data analysis techniques reveal different patterns, it is often possible to validate some of those assumptions by analysing the data in another way. This is how multiple analysis techniques can come together to strengthen the inference from a haemodynamic to a theoretical hypothesis. Typically, this is done through functional triangulation (Roskies [2010a]), where multiple techniques are used separately on the data, and the hypotheses inferred are further supported by independent analysis of different data sets. The case I will discuss below is different, as the evidence is strengthened not through the independent application of multiple analyses, but the sequential application of analysis techniques. 4 Deconvolution and Pattern Classification Analysis Liu and colleagues’ ([2011]) study aims to determine the role that certain regions of the brain play in directing attention. The primary analysis technique used is pattern classification analysis, a multivariate technique derived from research on machine learning. Pattern classification analysis is used to determine whether cognitive tasks can be differentiated based only on patterns in the BOLD signal that correlate with task performance. As I argue below, this technique alone cannot support a theoretical hypothesis attributing a cognitive role to activity within a region or part of the brain. However, Liu and colleagues do not deploy the technique in isolation. Their analysis includes a region of interest (ROI) selection procedure that partially validates one of the crucial assumptions required by pattern classification analysis. While this does not provide definitive evidence in support of the theoretical hypothesis they advance, it demonstrates how multiple techniques can be used together to bring neuroimaging data to bear on hypotheses beyond those that merely relate haemodynamic activity to task performance. Two behavioural tasks were used to generate Liu et al.'s data set. In both tasks, subjects were presented with two overlaid patterns of dots and were instructed to attend to one pattern or the other. In the first task, both patterns were composed of white dots, but one was rotating clockwise and the other counter-clockwise. In the second task, both patterns were moving in a random-walk, but one was composed of red dots and the other green dots (Liu et al. [2011], pp. 4485–6). The resulting data set contained BOLD signal measurements for each of the six task conditions: attending to clockwise rotating dots, attending to counter-clockwise rotating dots, attending to red dots, attending to green dots, and the null-condition for each task (attending to a fixation cross). The data were pre-processed before they were analysed. This involved head motion correction (to remove artefacts caused by subjects moving while being scanned), removal of low-frequency drift (this corrects for a scanning artefact due to a drift in the magnetic field of the scanner), and conversion of the BOLD signal measurements from raw values into a percentage of signal change (Liu et al. [2011], p. 4486). The result of these transformations is a data set suitable for the analysis procedures with patterns due to known artefacts from head motion and scanner drift suppressed. The pre-processed data were then analysed using a series of analysis techniques. Before discussing the techniques in detail, I will provide a brief overview of the whole procedure. The analysis began with deconvolution, a technique used to isolate the task-relevant portion of the BOLD signal data. The result of the deconvolution analysis was used as the input for a ROI selection procedure. The combination of the deconvolution and ROI selection was then used as the input for pattern classification analysis. The result of the pattern classification analysis was then taken to support a claim about the regions of the brain involved in the modulation of attentional control. Notice that this is not a claim about the relationship between task performance and haemodynamic activity. It is a claim about which parts of the brain implement a particular cognitive process (modulation of attentional control). It is about the relationship between a cognitive function and regional brain activity. This is a theoretical hypothesis. There are multiple inferences involved in moving from a haemodynamic hypothesis to a theoretical hypothesis. Recall that a haemodynamic hypothesis relates BOLD signal data to the performance of a task, whereas a theoretical hypothesis relates brain structure (or the activity in brain structure) to a cognitive process. Inferring from one to the other requires treating the BOLD signal measurements as an indicator of cognitively relevant brain activity within a brain structure, and task performance as an indicator of one or more cognitive processes. Whether or not the task can be taken as an indicator of the cognitive function that the researchers are interested in depends on an underlying theory of psychological processing, and the robustness of the accompanying task analysis. As the focus of this article is on the interpretation of the neuroimaging data, I’m going to assume that the behavioural tasks used are reliable indicators of the modulation of attentional control. It is worth noting, however, that this assumption does not generally hold, especially given the relative lack of critical task analyses in neuroimaging research (for discussion, see Poldrack [2010b]). 4.1 Deconvolution analysis Not all of the measured changes in the BOLD signal are relevant to the subject’s performance of the cognitive task. The first substantive step in analysing neuroimaging data is to extract the portion of the BOLD signal that corresponds with the task manipulation. This process is called deconvolution. Deconvolution is an algorithmic solution to a particular type of signal processing problem in which a signal of interest is convolved, or mixed with, another signal. In general, deconvolving the signal of interest requires solving an equation of this form: (f⊗g)=h, where h is the recorded signal, f is the signal of interest, and g is the signal that f needs to be separated from. In the case of fMRI data, h is the measured BOLD signal, g is the design matrix (a mathematical representation of the task), and f is the haemodynamic response function (hrf). Here, the hrf represents the change in blood oxygenation levels that corresponds with the demands of the cognitive task that the subject performed. The aim of deconvolution analysis is to identify the portion of measured brain activity that is modulated by the task. Solving for the hrf requires pseudo-inverting the design matrix and multiplying it by the measured BOLD signal (this is the matrix-algebra equivalent of dividing both sides in the above equation by in order to calculate f). It is important to note that this procedure only works when the trials are mathematically separable, which can be achieved using an event-related design. An event-related design is such that the stimuli or tasks are separated by an inter-trial interval (usually there are about twenty seconds between tasks). Investigators can then assume that task-relevant BOLD activity occurs for short, discrete intervals corresponding to the onset of the task. The inter-trial interval supports this assumption by ensuring that the trial-relevant signal is temporally localized, and does not uniformly influence subsequent trials.6 Mathematically, this amounts to assuming that task-relevant variation in the BOLD signal is linearly summed with the task-irrelevant BOLD signal, and so the two can be separated by the deconvolution procedure described above. It is worth noting that these (and the following) assumptions are supported by supplementary empirical research, and are not arbitrarily made or taken for granted (for a technical introduction to linear regression, see Kass et al. [2014], Chapter 12). Typically, researchers assume that the haemodynamic response has a canonical shape and use that assumption to determine the form of hrf. In this case, however, the investigators did not want to assume that the hrf takes the canonical form and so they used a linear regression formula to model it. This decision eliminates confounds that might arise as a result of deviations from the canonical model in the hrf. The regression approach also allows the form of the hrf to vary from voxel to voxel, instead of assuming that the BOLD signal follows the same pattern in every voxel. Regression is a curve fitting procedure. The investigators specify an equation, a linear one in this case, with unknown coefficients, that is fit to the data. In this case, the ‘data’ that the curve is fit to are the result of multiplying the BOLD signal measurements with the inverted design matrix. The regression formula is expressed by the following equation: x=βy+ε. Regression requires assuming that errors are independent (which is ensured by the event-related design) and that the noise term, ε, is linearly additive. For each regressor, there will be an additional βy term. Liu and colleagues treated each experimental condition as a separate regressor, which resulted in a total of six regression terms (one for each of the clockwise, counter-clockwise, red, green, and null task conditions). Once the regression formula and design matrix are determined, the design matrix is pseudo-inverted and multiplied by the measured BOLD signal. Then, the result of that is used to determine the unknown β values in the regression equation. Note that this procedure is implemented for each voxel, and so each voxel will have its own set of β values. The β values are then filled into the linear regression formula and the result is the hrf. The hrf as represented by the β values, indicates the portion of the measured BOLD activity that varies with task onset. This could be understood as capturing the portion of the data that is relevant to the manipulation of the experiment. The β values are used in both the ROI selection procedure and the pattern classification analysis. 4.2 Region of interest selection Once the hrf was calculated, the investigators used a goodness-of-fit measure to determine the amount of variance in the measured signal that was accounted for by the hrf. This provides an indicator of the portion of the signal that the hrf models accurately. To do this, they first averaged the modelled activity (the β values) over continuous groups of voxels (which they took to indicate specific regions of the brain). Then, they calculated the goodness-of-fit of the hrf, which is a measure of the amount of variance in the signal that is accounted for by the hrf. To evaluate the statistical significance of the estimate they used a permutation test (for details on these procedures, see Nichols and Holmes [2002]; Gardner et al. [2005]). Where the hrf identifies the portion of the signal modulated by the experimental tasks, the goodness-of-fit measure specifies the regions of the brain (understood as a collection of nearby voxels) where the hrf accounts for a significant portion of the variance in the BOLD signal data. The result of the procedure identifies regions of the brain where the variation in activity is correlated with the task demands of the experiment. When the variance of activity in a region accounted for by the hrf was sufficiently high, the investigators concluded that activity in that region ‘is modulated by feature-based attention’ (Liu et al. [2011], p. 4488). This interpretation of the analysis result is a haemodynamic hypothesis since it relates variation in BOLD signal activity to specific task conditions. The particular haemodynamic hypothesis advanced attributes the portion of the measured BOLD signal captured by the β values that satisfy the goodness-of-fit criteria to the behavioural tasks. Calculating the hrf identifies the portion of the signal that corresponds with the onset of each task condition, eliminating the task-irrelevant portion of the signal. The goodness-of-fit procedure identifies the areas of the brain for which the hrf accounts for a significant portion of the variance in the activity. In other words, this ROI selection procedure identifies the regions in which the measured variation of the BOLD signal can be explained in the context of the experiment. The result is used as a processing step to select regions of interest for pattern classification analysis. As I will show, this step improves the strength of the experimental evidence for the theoretical hypothesis the investigators infer by providing partial validation for a crucial assumption implicit in the use of pattern classification analysis. 4.3 Pattern classification analysis The primary aim of the study was to use pattern classification analysis to test ‘whether the pattern of fMRI response across voxels in an area could distinguish which feature was attended, although the average amplitude did not’ (Liu et al. [2011], p. 4490).7 Pattern classification analysis is a type of multivariate analysis technique that treats each voxel as a dependent variable. The procedure involves four distinct stages: feature selection, classifier selection, training, and testing. Feature selection involves choosing the voxels that will be included in the analysis. Typically, the chosen voxels are those within a particular ROI, although how that ROI is defined varies from study to study. Regions of interest can be defined anatomically, either using software to select the voxels that fall within the anatomical ROI, or by manually tracing the ROI. They can also be defined functionally, using a functional localization task. The BOLD signal data collected while a participant performs such a task can be used to identify voxels that are strongly activated during the performance of that task, which are then defined as the ROI. In this case, the investigators selected the voxels indicated by the procedure discussed in the previous section.8 Classifier selection involves choosing the classifier, which is a machine learning algorithm that will be used to implement the analysis. The classifier represents brain activity in a multidimensional space where each dimension corresponds to the BOLD signal value in each voxel. If 300 voxels are selected, then the space has 300 dimensions. Each point in this space specifies a particular BOLD signal value for each selected voxel and so corresponds to a particular state of brain activity. For the purposes of this article, the particular classifier used does not matter, but it is worth noting that different classifiers have different strengths and weaknesses (Misaki et al. [2010]). Once the classifier is selected it is trained and tested. During the training phase, the classifier is presented with labelled data (the labels indicate the task condition, such as ‘attending to clockwise rotating dots’). The classifier identifies correlations between patterns in the BOLD signal and the provided labels, and based on those correlations it divides the multidimensional space into subspaces. Different classifiers use different procedures for subdividing the multidimensional space. Once subdivided, the classifier identifies each subspace with the task condition that is most frequently associated with it. During testing, the classifier is presented with unlabelled data that it has not seen. It locates the novel data in the multidimensional space and, based on the subspace that they fall into, predicts the task label that corresponds with the data. A data point that is located in the ‘attending to red’ subspace is labelled as ‘attending to red’. The predicted labels are compared with the true labels and the classifier’s accuracy at predicting the task condition on the basis of the BOLD signal data is calculated. The regions of the brain (as defined by the ROI selection procedure) where the classifier performed with sufficient accuracy are said to ‘contain the control signals for maintaining attention to visual features’ (Liu et al. [2011], p. 4493). That is to say, the investigators took the classification results to indicate the regions of the brain that contain signals used for the maintenance of attention. They are attributing a cognitive function to a particular region of the brain (in fact, several regions of the brain). This is an inference to a theoretical hypothesis. In this case, the hypothesis specifies the particular role that the identified regions perform: control of attentional processes. The attribution of functional role is made on the basis of the information carried in the signal that is necessary to support the cognitive function. It’s not just a claim that the indicated regions play such-and-such a role, but, by basing this inference on pattern classification analysis, it is a specification of that role in terms of the signal content. Given this, the inference from the successful predictions of a pattern classifier to the content of the brain activity, and subsequent attribution of functional role, requires additional assumptions. One particular assumption is that the patterns leveraged by the classifier contain information that is accessible to the brain. One way to understand why this assumption is required is to distinguish between the informational and representational content of a signal. The informational content of a signal is whatever facts you can learn from the signal. The representational content is the message actually carried by the signal. Informational content and representational content are not necessarily the same (Dretske [1981]). Consider the following simple case: You are in a closed room and someone in an adjoining room is communicating a message by banging objects together. Perhaps they are using Morse code to express a fact about the weather. With sufficient equipment and expertise, you could determine if the person in the other room is moving around, or features of the materials that they are banging together. These facts are part of the informational content of the signal, as they are facts you can learn by analysing the signal. The actual message being communicated, however, may have nothing to do with these facts. Indeed, in this case the message is about the weather. It may even be the case that the individual who is communicating does not have access to the facts you are able to infer from the signal. They may not know what material the objects are made of, and so could not possibly be communicating these facts. Without some knowledge of Morse code, or additional constraints beyond the signal itself, it is difficult to verify that facts learned from analyses of the signal correspond to the representational content of the signal. Thus, showing that regularities in a signal can be used to reliably make inferences or predictions about the world, as pattern classification analysis does, is not sufficient to support the claim that the signal is transmitting those facts. In these terms, pattern classification analysis characterizes some of the informational content of the BOLD signal. It identifies which tasks can be discriminated between on the basis of patterns in the signal. The inference from the informational content of the BOLD signal to an attribution of functional role requires the assumption that the informational content extracted by the analysis reflects the representational content of the signal. Thus successfully making an inference to the role a region plays on the basis of pattern classification requires, at least, that the information leveraged by the classifier is accessible to the brain or, more broadly, the organism. Neuroscientists are well aware of this limitation. Classifiers are known to be very powerful and researchers caution against drawing inferences from the particular decision metric that a classifier implements. This is because a classifier will leverage anything that permits it to make reliable predictions, including patterns in the data irrelevant to understanding the functioning of the brain (Anderson and Oates [2010]). Tong and Pratte ([2012]) relate an illuminating case of a classifier achieving near perfect accuracy at predicting the experience of humour when a subject was watching a sitcom while in an MRI scanner. A close inspection of the classification process revealed that several voxels in the data were located along the edge of a ventricle (ventricles are a hollow space in the brain filled with cerebrospinal fluid). Since the ventricles contain no blood, the BOLD signal there is zero. Thus, a voxel along the edge of a ventricle will display a significant change in BOLD signal value should the subject’s head move (even slightly), such as when stifling laughter. The classifier’s performance was due to a correlation between slight head motion, humorous stimuli, and voxels that overlap with ventricles. This is why researchers use secondary analyses, such as the ROI selection procedure described above and the searchlight procedure described in footnote eight. These procedures help limit the possibility of the classifier ‘cheating’, which in turn provides validation for the assumption that the information in the signal leveraged by the classifier is accessible to the brain. 5 The Strength of Multiple Analyses The analysis techniques discussed above support different types of hypotheses. The ROI selection procedure supports a haemodynamic hypothesis about the relationship between variation in the BOLD signal and variation in the task conditions. Pattern classification analysis is taken to support a theoretical hypothesis about the functional role played by parts of the brain in attentional processes. The difference in use reflects a difference in evidence. ROI selection identifies the portion of the data that can be explained in the context of the experiment. Pattern classification analysis identifies the task conditions that can be discriminated between on the basis of the fMRI data. The goodness-of-fit measure does not provide evidence that could support a claim about what task conditions can be discriminated between on the basis of the neuroimaging data. Likewise, the result of pattern classification analysis cannot support a claim about the quality of the data, or characterize which portion of the signal is modulated by the experimental manipulation. Indeed, that the classifier will leverage any correlation between task label and fMRI data suggests that it is poorly suited to providing evidence in support of such a claim. The difference in evidence can be traced to a difference in the manipulations of the data. Through their different manipulations, the different techniques reveal different patterns. Using these analyses together strengthens the evidence provided by classification analysis with respect to the target theoretical hypothesis. The permutation test indicates the portion of the signal that can be explained in the context of the experiment. By using the results of that procedure to select features for the classifier, the investigators ensured that the patterns available to the classifier are only those contained in the portion of the signal that is modulated by the experimental task. While this does not guarantee that the leveraged signal carries information that is accessible to the system, it ensures that the leveraged variations are at least relevant to the experimental manipulation. In this way, some of the confounds that might prohibit inferring from the result of classification analysis to the target theoretical hypothesis are controlled for by using multiple analyses in series.9 The permutation test, when used to select a portion of the data for classification, provides validation for one of the problematic assumptions invoked by pattern classification analysis. Not only do these analysis techniques have different evidential targets, but brought together they provide stronger evidence for a theoretical hypothesis than either could alone. In this way, multiple analysis techniques that provide different perspectives on the same data and can strengthen the evidence produced in a single neuroimaging experiment. This is a kind of local robustness. Robustness has been used to defend experimental practice from critiques similar to those discussed here. Specifically, Collins’s ([1985]) experimenter’s regress proposes a vicious circle between experimental results and the techniques that produce those results. He argues that a technique is verified only when it produces correct data, but a technique is only known to produce correct data when it is verified. The critiques raised against neuroimaging by van Orden and Paap, which form the foundation of scepticism towards the technology, are of a similar form. The main issue they identify is that subtraction analysis requires assuming that the brain can be subdivided into functional parts, which is the very claim the analysis result is taken to support. This is a localized case of the experimenter’s regress where the feature of scientific practice under scrutiny is not an instrument, but a data analysis technique. Philosophers have argued that with respect to the experimenter’s regress, the epistemic situation is not as dire as Collins makes it out to be. Cartwright ([1991]), for example, argues that the regress is broken by the robust reproducibility of instrument results. Confidence in the report of an instrument is justified when the measurement result aligns with results produced by a variety of instruments, each of which relies on independent assumptions (pp. 451–2). Culp ([1995]) offers a more careful defence along the same lines. She argues, via a detailed case study analysis of approaches to DNA sequencing, that experimentalists are convinced that measurements are getting at the same phenomenon when multiple measurement techniques, each with different theoretical presuppositions, produces a robust body of evidence (p. 441). Robustness is achieved when the same result is obtained by multiple, independent (or mostly independent) techniques (Wimsatt [1981]). Robustness analysis involves determining the features of measurement or analysis techniques that are invariant under changes in the technique that might influence the result (Calcott [2011]). Robustness is derived from the use of multiple independent approaches to detecting, isolating, or measuring the same target. The independence of measurement results is characterized in terms of theoretical presuppositions required by the use of the instrument. These can also be understood as assumptions researchers must make about the production of the resulting data. Different instruments are independent insofar as they require different assumptions. The same can be said of different data analysis techniques. Data analysis techniques, because of the manipulations they impose on data, require investigators to make assumptions about the result. These assumptions, if true, justify interpretations of the result of the data manipulation or analysis procedure. Different techniques, as used to support different hypotheses, require different assumptions. However, there is a relevant difference between using multiple data analysis techniques as I have described, and the use of multiple measuring instruments to detect the same phenomenon. The robustness of a measurement outcome is improved when independent techniques produce the same result. A defence of neuroimaging against van Orden and Paap’s criticisms along these lines is offered by Roskies ([2010b]), in her account of functional triangulation. Functional triangulation occurs when different analysis techniques produce the same result, and so generate a robust body of evidence. The situation I have described is different. The techniques discussed above do not, and indeed cannot, provide the very same result. While the results of the analyses are not precisely the same, they are similarly aimed. The permutation test indicates the regions of the brain that may play a role in attentional processing, and the pattern classification analysis further clarifies that role. Thus, while they do not provide evidence in support of the very same hypothesis, the hypotheses they individually support are mutually supportive. The permutation test provides support for a haemodynamic hypothesis, and the subsequent analysis of the evidence revealed by that test using pattern classification analysis is brought to bear on a theoretical hypothesis. Insofar as this is a robust result, then, it might be regarded as a weakly robust result. Weak because the techniques do not have the same outcome.10 In general, different data analysis techniques provide different perspectives on the same data, and the use of multiple analysis techniques together can strengthen the quality of evidence produced by a particular method or instrument. This can result in evidence that can support inferences that may not be warranted by the result of a single analysis technique or data manipulation. In this way, multiple analysis techniques used in series can provide experimental results a kind of local robustness. It is ‘local’ because the techniques ultimately depend upon one another. While the different perspectives are not fully independent, because one analysis technique is used as a processing step for a subsequently applied technique, they still contribute to the robustness of the inference because different techniques reveal (and suppress) different patterns and rely on different assumptions. Their differences are what contribute to the strengthening of the evidence. The general lesson of the experimenter’s regress is that problematic assumptions can arise in the context of experimentation. The general lesson of the appeals to robustness is that those assumptions can (sometimes) be validated by comparing different perspectives on the same subject. With respect to scepticism towards the use of neuroimaging data, I have argued that problematic assumptions, which arise from the use of particular analysis techniques, can be validated by using different data analysis techniques that require different assumptions. This provides the inference with a (weak) local robustness. 6 Conclusion I have demonstrated that different data analysis techniques provide evidence for different phenomena and that multiple analysis techniques can be used together to improve the epistemic situation in neuroimaging research. Thus, the debate about the epistemic status of neuroimaging, which is framed in terms of the logic of subtraction, is at best an evaluation of the limitations of analysis techniques that depend upon that logic. Sweeping conclusions about the range of hypotheses that neuroimaging technology can and cannot be used to investigate are not supported by this literature. The argument presented above provides grounds for a mild optimism with respect to neuroimaging technology: it can be used to do more than provide evidence about hypotheses specifying the relationships between BOLD activity and task performance. I leave identifying what specific hypotheses and phenomena neuroimaging technology can be used to investigate for future work, as completing this task will require a careful evaluation of a representative collection of the data analysis techniques and experimental strategies used in neuroimaging research. Given that different analysis techniques provide different evidence, the diversity of techniques used in neuroimaging research suggests that philosophers concerned with the epistemology of neuroimaging should focus their attention on evaluating the evidential quality and scope of particular analysis techniques (such as subtraction) and classes of analysis techniques (such as multivariate analyses). Such evaluations should take into account the specific theoretical goals they are put towards (functional localization or tracking the content of neural representations, to name two). The general lesson here is that data analysis techniques play an important role in the generation of scientific evidence. Different data analysis procedures and differences in how those procedures are implemented can make a difference to the range of phenomena about which the result of the analysis is informative. This is a feature of scientific practice in need of more careful philosophical attention. Acknowledgements I’d like to thank Jacqueline Sullivan, Chris Viger, Joseph McCaffrey, Robert Foley, Frédéric Banville, Daniel Booth, Chris Martin, Anna Blumenthal, Jordan Dekraker, and two anonymous reviewers for constructive feedback on drafts. Additional thanks to participants in the annual Canadian Society for History and Philosophy of Science conference, where early versions of this project were presented, and members of the Köhler Memory Lab at the Brain and Mind Institute for productive and insightful discussions on this topic. Funding for this research was provided by the Social Sciences and Humanities Research Council of Canada, and the Rotman Institute of Philosophy. Footnotes 1 There has been a steady shift from using univariate analysis techniques that treat the neuroimaging data as a scalar value, usually an average, towards the use of multivariate analysis techniques that treat the neuroimaging data as a vector. These new techniques have allowed neuroimaging researchers to pursue new theoretical goals and study new hypotheses, such as the investigation of the content of neural representations (for an introductory review of multivariate techniques, see Tong and Pratte [2012]). 2 The fMRI scanning protocol does not directly measure metabolic activity. During an fMRI scan, radio pulses cause hydrogen atoms to align with a uniform magnetic field. As they relax to equilibrium they release energy, which the scanner measures. Deoxygenated haemoglobin, unlike oxygenated haemoglobin, causes the nearby magnetic field strength to vary, resulting in a difference in the measured energy and forming the basis of the BOLD signal. 3 It is important to note that the techniques discussed here, collectively referred to as multivariate pattern analyses, are neither the only nor first multivariate techniques to be used in neuroimaging. For example, spatiotemporal partial least squares is a multivariate technique that has been in use since the late 1990s (McIntosh et al. [1996], [1998]). I owe this clarification to an anonymous reviewer. 4 Thanks to an anonymous reviewer for this phrasing. 5 This process is often referred to as data reduction. 6 The inter-trial interval does not need to be the same between every trial. Indeed, it is typically jittered, or randomly varied, so that the interval between any two pairs of trials varies. The variation in inter-trial interval is important for blocking certain confounds and artefacts that can arise when event onset is uniformly spaced. Since jittered events are still mathematically separable, I have omitted a detailed discussion of jitter for the sake of simplicity. 7 The investigators reported on a third analysis in the paper that I do not discuss in detail. That analysis, which adheres closely with the logic of subtraction, was intended to investigate if average BOLD signal amplitude discriminated between the specific features attended (red dots versus green dots). It did not. 8 In addition to the analyses I discuss in detail, they also completed a whole-brain searchlight analysis. A searchlight is a specific kind of feature selection and analysis process. In a searchlight, investigators define a volume (the ‘searchlight’), and then run the pattern classification analysis procedure over voxels within that volume. Then they move the volume and run the analysis again. This procedure is typically used to identify arbitrary subdivisions of the brain that result in reliable classification, or to examine how classification accuracy changes as the classifier is given data from different parts of the same network or part of the brain. 9 Although not all. I leave discussion of those details for future work as it is beyond the scope of this article. 10 This should not be cause for scepticism, at least not scepticism that is localized to the particular case of neuroimaging. There is reason to believe that any difference between measurement techniques can contribute to a difference in the phenomena probed by those techniques (for a discussion of this with respect to neurobiology, see Sullivan [2009]). If this is true, then weak robustness is the norm for scientific knowledge. References Aktunç E. M. [ 2014 ]: ‘ Severe Tests in Neuroimaging: What We Can Learn and How We Can Learn It ’, Philosophy of Science , 81 , pp. 961 – 73 . Google Scholar Crossref Search ADS Anderson M. L. , Oates T. [ 2010 ]: ‘A Critique of Multi-voxel Pattern Analysis’, Proceedings of the 32nd Annual Conference of the Cognitive Science Society, pp. 1511 – 16 . Ashby F. G. [ 2011 ]: Statistical Analysis of fMRI Data , Cambridge, MA : MIT Press . Bogen J. , Woodward J. [ 1988 ]: ‘ Saving the Phenomena ’, Philosophical Review , 97 , pp. 303 – 52 . Google Scholar Crossref Search ADS Calcott B. [ 2011 ]: ‘ Wimsatt and the Robustness Family: Review of Wimsatt’s Re-engineering Philosophy for Limited Beings ’, Biology and Philosophy , 26 , pp. 281 – 93 . Google Scholar Crossref Search ADS Cartwright N. [ 1991 ]: ‘ Replicability, Reproducibility, and Robustness: Comments on Harry Collins ’, History of Political Economy , 23 , pp. 143 – 55 . Google Scholar Crossref Search ADS Collins H. [ 1985 ]: Changing Order , London : SAGE Publications . Culp S. [ 1995 ]: ‘ Objectivity in Experimental Inquiry: Breaking Data-Technique Circles ’, Philosophy of Science , 62 , pp. 430 – 50 . Google Scholar Crossref Search ADS Davis T. , Poldrack R. A. [ 2013 ]: ‘ Measuring Neural Representations with fMRI: Practices and Pitfalls ’, Annals of the New York Academy of Sciences , 1296 , pp. 108 – 34 . Google Scholar Crossref Search ADS PubMed Dretske F. [ 1981 ]: Knowledge and the Flow of Information , Cambridge, MA : MIT Press . Friston K. J. , Holmes A. P. , Price C. J. , Büchel C. , Worsley K. J. [ 1999 ]: ‘ Multisubject fMRI Studies with Conjunction Analyses ’, NeuroImage , 10 , pp. 385 – 96 . Google Scholar Crossref Search ADS PubMed Gardner J. L. , Sun P. , Waggoner R. A. , Ueno K. , Tanaka K. , Cheng K. [ 2005 ]: ‘ Contrast Adaptation and Representation in Human Early Visual Cortex ’, Neuron , 47 , pp. 607 – 20 . Google Scholar Crossref Search ADS PubMed Good I. J. [ 1983 ]: ‘ The Philosophy of Exploratory Data Analysis ’, Philosophy of Science , 50 , pp. 283 – 95 . Google Scholar Crossref Search ADS Hardcastle V. G. , Stewart M. C. [ 2002 ]: ‘ What Do Brain Data Really Show? ’, Philosophy of Science , 69 , pp. S72 – 82 . Google Scholar Crossref Search ADS Haxby J. V. , Gobbini M. I. , Furey M. L. , Ishai A. , Scouten J. L. , Pietrini P. [ 2001 ]: ‘ Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex ’, Science , 293 , pp. 2425 – 30 . Google Scholar Crossref Search ADS PubMed Huettel S. A. , Song A. W. , McCarthy G. [ 2008 ]: Functional Magnetic Resonance Imaging , Sunderland, MA : Sinauer . Kass R. E. , Eden U. , Brown E. [ 2014 ]: Analysis of Neural Data , New York : Springer . Klein C. [ 2010a ]: ‘ Images Are Not the Evidence in Neuroimaging ’, British Journal for the Philosophy of Science , 61 , pp. 265 – 78 . Google Scholar Crossref Search ADS Klein C. [ 2010b ]: ‘ Philosophical Issues in Neuroimaging ’, Philosophy Compass , 5 , pp. 186 – 98 . Google Scholar Crossref Search ADS Kriegeskorte N. , Kievit R. A. [ 2013 ]: ‘ Representational Geometry: Integrating Cognition, Computation, and the Brain ’, Trends in Cognitive Sciences , 17 , pp. 401 – 12 . Google Scholar Crossref Search ADS PubMed Landreth A. , Richardson R. C. [ 2004 ]: ‘ Localization and the New Phrenology: A Review Essay on William Uttal’s The New Phrenology ’, Philosophical Psychology , 17 , pp. 107 – 23 . Google Scholar Crossref Search ADS Liu T. , Hospadaruk L. , Zhu D. C. , Gardner J. L. [ 2011 ]: ‘ Feature-Specific Attentional Priority Signals in Human Cortex ’, The Journal of Neuroscience , 31 , pp. 4484 – 95 . Google Scholar Crossref Search ADS PubMed Logothetis N. K. [ 2008 ]: ‘ What We Can Do and What We Cannot Do with fMRI ’, Nature , 453 , pp. 869 – 78 . Google Scholar Crossref Search ADS PubMed Martin C. B. , McLean D. A. , O’Neil E. B. , Köhler S. [ 2013 ]: ‘ Distinct Familiarity-Based Response Patterns for Faces and Buildings in Perirhinal and Parahippocampal Cortex ’, The Journal of Neuroscience , 33 , pp. 10915 – 23 . Google Scholar Crossref Search ADS PubMed Mayo D. [ 2005 ]: ‘Evidence as Passing Severe Tests: Highly Probably versus Highly Probed Hypotheses’, in Achinstein P. (ed.), Scientific Evidence: Philosophical Theories and Applications , Baltimore, MD : John Hopkins University Press , pp. 95 – 127 . McIntosh A. , Lobaugh N. , Cabeza R. , Bookstein F. , Houle S. [ 1998 ]: ‘ Convergence of Neural Systems Processing Stimulus Associations and Coordinating Motor Responses ’, Cerebral Cortex , 8 , pp. 648 – 59 . Google Scholar Crossref Search ADS PubMed McIntosh A. R. , Bookstein F. L. , Haxby J. V. , Grady C. L. [ 1996 ]: ‘ Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares ’, Neuroimage , 3 , pp. 143 – 57 . Google Scholar Crossref Search ADS PubMed Misaki M. , Kim Y. , Bandettini P. A. , Kriegeskorte N. [ 2010 ]: ‘ Comparison of Multivariate Classifiers and Response Normalizations for Pattern-Information fMRI ’, NeuroImage , 53 , pp. 103 – 18 . Google Scholar Crossref Search ADS PubMed Nichols T. E. , Holmes A. P. [ 2002 ]: ‘ Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples ’, Human Brain Mapping , 15 , pp. 1 – 25 . Google Scholar Crossref Search ADS PubMed Poldrack R. [ 2010a ]: ‘Subtraction and Beyond: The Logic of Experimental Designs for Neuroimaging’, in Bunzl M. , Hanson S. J. (eds), Foundational Issues in Human Brain Mapping , Cambridge, MA : MIT Press , pp. 147 – 60 . Poldrack R. [ 2010b ]: ‘ Mapping Mental Function to Brain Structure: How Can Cognitive Neuroimaging Succeed? ’, Perspectives on Psychological Science , 5 , pp. 753 – 61 . Google Scholar Crossref Search ADS Roskies A. [ 2010a ]: ‘Neuroimaging and Inferential Distance: The Perils of Pictures’, in Bunzl M. , Hanson S. J. (eds), Foundational Issues in Human Brain Mapping , Cambridge, MA : MIT Press , pp. 195 – 216 . Roskies A. [ 2010b ]: ‘ Saving Subtraction: A Reply to Van Orden and Paap ’, British Journal for the Philosophy of Science , 61 , pp. 635 – 65 . Google Scholar Crossref Search ADS Sullivan J. [ 2009 ]: ‘ The Multiplicity of Experimental Protocols: A Challenge to Reductionist and Non-reductionist Models of the Unity of Neuroscience ’, Synthese , 167 , pp. 511 – 39 . Google Scholar Crossref Search ADS Tong F. , Pratte M. S. [ 2012 ]: ‘ Decoding Patterns of Human Brain Activity ’, Annual Review of Psychology , 63 , pp. 483 – 509 . Google Scholar Crossref Search ADS PubMed Uttal W. [ 2001 ]: The New Phrenology , Cambridge, MA : MIT Press . Uttal W. [ 2011 ]: Mind and Brain: A Critical Appraisal of Cognitive Neuroscience , Cambridge, MA : MIT Press . van Orden G. C. , Paap K. R. [ 1997 ]: ‘ Functional Neuroimages Fail to Discover Pieces of Mind in Parts of the Brain ’, Philosophy of Science , 64 , pp. S85 – 94 . Google Scholar Crossref Search ADS Wimsatt W. [ 1981 ]: ‘Robustness, Reliability, and Overdetermination’, in Re-engineering Philosophy for Limited Beings , pp. 43 – 74 . © The Author 2017. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The British Journal for the Philosophy of Science Oxford University Press

The Analysis of Data and the Evidential Scope of Neuroimaging Results

Loading next page...
 
/lp/ou_press/the-analysis-of-data-and-the-evidential-scope-of-neuroimaging-results-pE0bJtTUdh
Publisher
Oxford University Press
Copyright
© The Author 2017. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
0007-0882
eISSN
1464-3537
D.O.I.
10.1093/bjps/axx012
Publisher site
See Article on Publisher Site

Abstract

Abstract The sceptical positions philosophers have adopted with respect to neuroimaging data are based on detailed evaluations of subtraction, which is one of many data analysis techniques used with neuroimaging data. These positions are undermined when the epistemic implications of the use of a diversity of data analysis techniques are taken into account. I argue that different data analysis techniques reveal different patterns in the data. Through the use of multiple data analysis techniques, researchers can produce results that are locally robust. Thus, the epistemology of neuroimaging must take into consideration the details of the different data analysis techniques that are used to evaluate neuroimaging data, and the specific theoretical aims those techniques are deployed towards. 1 Introduction 2 Scepticism about Neuroimaging 3 Data Analysis and Evidence 4 Deconvolution and Pattern Classification Analysis 4.1 Deconvolution analysis 4.2 Region of interest selection 4.3 Pattern classification analysis 5 The Strength of Multiple Analyses 6 Conclusion 1 Introduction The debate amongst philosophers about the epistemic status of neuroimaging begins with van Orden and Paap’s ([1997]) criticism of the logic of subtraction, the primary technique used to analyse neuroimaging data at the time their paper was published. Philosophers have continued to debate the strengths and weaknesses of neuroimaging as a tool for investigating the relationship between cognitive functions and the brain (Uttal [2001], [2011]; Hardcastle and Stewart [2002]; Roskies [2010a]; Klein [2010a]; Aktunç [2014]). I argue that since most critics have not taken into account the significance of the diversity of data analysis techniques used to analyse neuroimaging data, the scepticism towards neuroimaging technology is misplaced. Many of the sceptical positions are grounded on careful analyses of subtraction and subtraction logic (Uttal [2001]; Hardcastle and Stewart [2002]; Klein [2010a]). While philosophers are rightly critical of the ability of subtraction analyses, on their own, to support claims about the relationship between cognitive functions and the brain, subtraction is only one kind of data analysis technique used to analyse neuroimaging data. Given that the development of new data analysis techniques has been a significant driver of progress in neuroimaging over the last decade and a half, a narrow focus on subtraction is a problem for any argument that aims to shed light on the range of hypotheses that neuroimaging technology can discriminate between.1 Indeed, some recent contributors have noted that the role and impact of multivariate analyses has not been fully appreciated in this debate (Klein [2010b]; Roskies [2010a]). However, while they acknowledge that techniques other than subtraction are important to consider, they do not themselves take up the task of exploring how the use of other analysis techniques changes the evidence available in neuroimaging research. My aim here is to begin to fill this gap by demonstrating that when evaluating the hypotheses and claims that neuroimaging technology can and cannot support, it is important to take into account the contribution of new analysis techniques, such as pattern classification analysis, and to consider how multiple analysis techniques can be brought together to strengthen the evidence provided by neuroimaging technologies. I proceed as follows: In Section 2, I review the debate about the epistemic status of neuroimaging and specify the categories of hypotheses that philosophers claim neuroimaging data can and cannot support. In Section 3, I present a conceptual framework for evaluating the strength and content of evidence produced via a data analysis technique. In Section 4, I apply this conceptual framework to a study that uses multiple analysis techniques to generate evidence in support of a hypothesis and where critics of neuroimaging would argue that the data do not support this hypothesis, I show how it can. The evidence is stronger than it appears because one analysis technique is used to validate a crucial assumption required by the other. In Section 5, I argue that different analysis techniques provide different evidence, and that the use of multiple analysis techniques to examine the same data provides experimental results with a kind of local robustness. 2 Scepticism about Neuroimaging Functional magnetic resonance imaging (fMRI) allows neuroscientists to study the human brain through non-invasive measurements of metabolic activity (see Ashby [2011] for a technical introduction). Experiments using fMRI typically require a participant to perform a cognitive task—such as identifying faces as familiar or unfamiliar (as in Martin et al. [2013])—while the scanner measures changes in the blood oxygenation level dependent (BOLD) signal throughout their brain.2 The scanner does this by dividing the brain into voxels (volumetric pixels), which are 1–3 mm cubes of brain matter, and measuring the BOLD signal in each voxel over time. The value of the BOLD signal is the ratio of oxygenated to deoxygenated haemoglobin in a voxel at the time of scanning. Since it tracks properties of blood flow, the BOLD signal is often referred to as the haemodynamic signal. After a scanning session the investigators will have a data set that consists of BOLD signal values for each voxel, labelled with the task condition that the participant was performing when those data were collected. Neuroimaging data have historically been analysed using subtractive analyses. In the simplest case of subtraction, two sets of neuroimaging data are required, each obtained while the participant performs a different task. The goal of subtractive analysis is to identify the difference in BOLD signal that corresponds with the cognitive difference between the tasks. Roughly, the BOLD signal values in each voxel associated with one task are subtracted from the values in the same voxels associated with the other task. This analysis is classified as univariate because each voxel is treated independently of every other voxel and so the data needs to be corrected for multiple comparisons. The result is a difference map that identifies the voxels (or regions) where brain activity differs significantly between the task conditions. To evaluate if the difference effects can be attributed to the population and not just a given subject in the study, a second-level analysis is carried out (typically random effects analysis; see Friston et al. [1999]). If this analysis shows the difference to be consistent across subjects, then the cognitive difference between the tasks is attributed to the regions of the brain shown to be differentially active. To illustrate the conceptual logic of the process, consider van Orden and Paap’s ([1997]) toy example in which task A consists of reading two words, and task B of reading two words and then judging whether they rhyme. The resulting subtraction between the imaging data obtained during task A and the data obtained during task B is taken to indicate the regions of the brain that are involved in the cognitive process that underlies the rhyming judgement. Van Orden and Paap argue that the subtractive method cannot be used to locate where in the brain cognitive functions ‘reside’ because the reliability of subtractive inferences depends on several assumptions they believe are unlikely to be true. In particular, the reliability of subtraction with respect to localizing cognitive functions to regions of the brain requires that ‘one must begin with a “true” theory of cognition’s components, and assume that the corresponding functional and anatomical modules exist in the brain’ (van Orden and Paap [1997], p. S86). These assumptions, they argue, follow from the fact that a valid subtraction requires that the task-difference precisely isolates a single cognitive component, which can only be the case if the cognitive theory used to design the tasks is accurate (p. S87). Additionally, they argue that functional localization using subtraction further requires that those modules are feed-forward ‘to ensure that the component of interest makes no qualitative changes “upstream” on shared components of experimental and control tasks' and that the contrasted tasks ‘invoke the minimum set of components for successful task performance’ (p. S86). William Uttal ([2001]) engages in a similar kind of sceptical attack on neuroimaging. Building on van Orden and Paap’s critique, Uttal compares neuroimaging to phrenology and argues, among other things, that it requires the false assumption that cognitive processes are managed and maintained by isolable modules of the brain. Valerie Hardcastle and Matthew Stewart ([2002], p. S80) express a similar type of scepticism in arguing that the logic of neuroimaging is viciously circular and conclude that ‘neuroscientists cannot use the data they get to support their claims of function’ because ‘they are assuming local and specific functions prior to gathering appropriate data for the claim’. These critiques all point to a vicious circularity in the inference from the results of subtraction analysis to claims about the localization of cognitive function. Some philosophers have defended cognitive neuroscience from these criticisms. For instance, Landreth and Richardson ([2004]) responded to Uttal’s arguments in part by clarifying the details of how neuroimaging data is processed, analysed, and interpreted. Additionally, Roskies ([2010b]) has rejected van Orden and Paap’s characterization of subtraction. She argues that subtraction results are just one part of a more complex scientific procedure that she calls functional triangulation, whereby ‘information from other task comparisons and other studies is brought to bear on the interpretation of experimental data’ ([2010b], p. 641). She also argues that characterizing neuroimaging as solely aimed at localizing cognitive functions to specific brain regions, as the three critics noted above do, is not representative of all uses of neuroimaging data. After providing examples of the variety of theoretical aims neuroimaging and subtraction methods are put towards, she concludes that ‘without recognizing the diversity of the immediate goals of imaging studies, it is impossible to do justice to the technique’ (p. 639). Indeed, the recent development of new multivariate analysis techniques, which were introduced to discriminate between modular and distributed accounts of the role that the ventral visual pathway plays in visual perception (Haxby et al. [2001]), has motivated cognitive neuroscientists to investigate hypotheses about the content of brain activity.3 In a review of the theoretical uses of multivariate techniques, the authors predict that ‘the enhanced sensitivity and information content provided by these methods should greatly facilitate the investigation of mind–brain relationships by revealing both local and distributed representations of mental content, functional interactions between brain areas, and the underlying relationships between brain activity and cognitive performance’ (Tong and Pratte [2012], p. 503). The study of mental content, neural representations, and the characterization of these in terms of distributed patterns of brain activity are very different theoretical goals than that of the localization of cognitive functions to parts of the brain. This is grist for Roskies’s mill. Whenever critics of neuroimaging research treat it solely in terms of localization, the critics have failed to appreciate the variety of theoretical applications that the technology is put towards. Furthermore, this theoretical shift, which was made possible by the development of data analysis techniques that treat neuroimaging data as a multidimensional pattern, illustrates the importance of evaluating analysis techniques other than subtraction when evaluating the epistemic value of neuroimaging technology. Despite these defences of neuroimaging, and the theoretical and analytic advances in the field of cognitive neuroscience, the general trend towards scepticism and the focus on subtractive analyses has persisted. While more recent conclusions tend to be on the milder side of scepticism, philosophers continue to challenge the ability of neuroimaging technology to provide evidence that supports the claims neuroscientists use the technology to investigate. Additionally, they continue to do so on the basis of an evaluation of subtraction and subtraction logic. I will take one of the most recent contributions to this debate (Aktunç [2014]) to be a representative example. In line with the sceptical tradition, Aktunç argues that while neuroimaging data are useful, they cannot be used to support the kinds of hypotheses that cognitive neuroscientists use it to support. Aktunç distinguishes between two types of hypotheses that neuroimaging data might be brought to bear on. There are haemodynamic hypotheses, which relate BOLD signal activity to the performance of cognitive tasks or parameters of the tasks. There are also theoretical hypotheses, which relate cognitive processes to the brain structures that implement them (this distinction is from Huettel et al. [2008]). To illustrate this distinction, consider the following example: The claim that patterns of BOLD signal activity in both PrC and PhC are sensitive to differences between faces, buildings, and chairs (Martin et al. [2013], p. 10921) is a haemodynamic hypothesis. The tasks used in this study require participants to judge images of faces, buildings, and chairs as familiar or novel. Thus, this claim is about the relationship between patterns of BOLD signal activity and features of stimuli used in the cognitive task. After discussing these results, the researchers advance a theoretical hypothesis. They claim that the ‘findings indicate that both PrC and PhC contribute to the assessment of item familiarity’ (Martin et al. [2013], p. 10922). This is a theoretical hypothesis because it identifies two brain structures, PrC and PhC, and specifies a cognitive process that they implement, the assessment of item familiarity. It is worth noticing the inferential relationship between these two types of hypotheses: the theoretical hypothesis is inferred from the haemodynamic hypothesis. Where a haemodynamic hypothesis specifies BOLD signal activity, a theoretical hypothesis specifies a structure of the brain. Likewise, where a haemodynamic hypothesis specifies a cognitive task, a theoretical hypothesis specifies a cognitive process. Given this distinction between haemodynamic and theoretical hypotheses, Aktunç uses Deborah Mayo’s error statistical framework to argue that neuroimaging data can only provide a severe test of haemodynamic hypotheses. On the simplest interpretation of Mayo’s severity criterion, a hypothesis passes a severe test just in case (i) the data agrees with the hypothesis and (ii) there is a sufficiently high probability that if the hypothesis were false, then the data would not agree with the hypothesis (Mayo [2005], p. 99). Aktunç ([2014], p. 969) argues that while neuroscientists may be interested in providing evidence that supports theoretical hypotheses, neuroimaging only has evidential import with respect to haemodynamic hypotheses. This is because a difference in mean BOLD signal, which is the pattern identified by subtractive analyses, can be embedded in a statistical significance test. From this, Aktunç ([2014], p. 969) argues that ‘using error probabilities, we can find out whether specific fMRI experiments constitute a severe test of specific hemodynamic hypotheses. Thus, fMRI data do have evidential import for hemodynamic hypotheses’. His argument that theoretical hypotheses cannot be subjected to severe testing relies on two premises. First, there is the ‘fact’ that ‘fMRI obviously does not test for the existence of cognitive modules or functions as defined by theories of cognitive science’ (p. 969) because ‘fMRI gives us data only on hemodynamic activity’ (p. 968). The second premise consists in the arguments made in the existing sceptical literature (specifically, Uttal [2001]; Hardcastle and Stewart [2002]; Klein [2010a]). Thus, according to Aktunç, neuroimaging data cannot support theoretical hypotheses because (i) the data is indirectly related to the content of those hypotheses and (ii) critiques of subtraction analysis show that such inferences are viciously circular, unstable, or otherwise unreliable. Neither of these premises can support the derived conclusion. Inferences from neuroimaging results to theoretical hypotheses, like most inferences from measurement results to theoretical claims, are ampliative; haemodynamic activity is at best an indirect measure of neural activity (Logothetis [2008]), and task performance is at best an indirect indicator of cognitive functions (Poldrack [2010a]). However, the indirect relationship between the data and content of the theoretical hypothesis is not sufficient to support the claim that neuroimaging cannot provide evidence for hypotheses that relate cognitive functions to brain activity. Whether these inferences are warranted depends on the particular theoretical hypotheses that are advanced, and whether the assumptions required by the inferences are justified. Indeed, this is how van Orden and Paap originally argued against the logic of subtraction; it was not on the basis of the indirectness of the data itself, but on the basis of the specific assumptions required to infer from the data to a theoretical hypothesis of a certain kind. However, no matter where you stand on the reliability of inferences from subtraction analysis to claims about the localization of cognitive functions, these arguments cannot be grounds for a sweeping claim about the evidential scope of neuroimaging data. Just because one data analysis technique has certain limitations does not mean that the data themselves are similarly limited. Indeed, neuroimaging data can and are analysed with other analysis techniques that reveal different patterns and correlations in the data. Whether or not neuroimaging data provide evidence in support of theoretical hypotheses depends on how the other analysis techniques help neuroscientists to mediate the inferential gap between haemodynamic and theoretical hypotheses. Inferences to theoretical hypotheses from neuroimaging data can be, and in practice are, strengthened by the use of multiple analysis techniques. The specific case I consider concerns analysis techniques used in sequence as a way to validate assumptions required by the primary analysis procedure. In the final section, I distinguish this use of multiple analyses from functional triangulation as discussed by Roskies, in which multiple independent analyses provide convergent evidence for a hypothesis. In the next section I provide a framework for evaluating the kinds of information about theoretical hypotheses that data analysis techniques provide. 3 Data Analysis and Evidence The sceptical position reviewed in the previous section is a claim about the kinds of hypotheses neuroimaging data can and cannot support. According to sceptics, it can support haemodynamic hypotheses, which specify a relation between features of the data. It cannot support theoretical hypotheses, which specify a relation between the phenomena that those features are taken to indicate. Whether it is used to investigate a haemodynamic or theoretical hypothesis, neuroimaging data needs to be manipulated to reveal correlations between features of the data that are relevant to the hypothesis under investigation. This is the function of data analysis techniques such as subtraction and pattern classification analysis. Data analysis techniques transform the data produced by experimentation into evidence suitable for statistical analysis. These transformations reveal patterns and correlations between features of the data, which are then taken to be evidence in support of a hypothesis. Bogen and Woodward’s distinction between data and phenomena is a useful place to begin thinking about this process. Broadly speaking, they characterize data, which are the result of the interaction between experimental design, implementation, and measurement, as ‘idiosyncratic to particular experimental contexts, and typically cannot occur outside of those contexts’ (Bogen and Woodward [1988], p. 317). Phenomena, on the other hand, ‘have stable, repeatable characteristics which will be detectable by means of a variety of different procedures, which may yield quite different kinds of data’ (p. 317). On this view, data provide evidence for claims about phenomena, while claims about phenomena provide evidence for theories. Bogen and Woodward ([1988], pp. 309–10) illustrate this by considering how one might determine the melting point of lead. To do so, a researcher might take several measurements of a sample of lead just after it melts. The data in this case are a collection of temperature measurements. These temperature measurements provide evidence about the melting point of lead, which is a claim about a phenomenon. The data are idiosyncratic because the result of each temperature measurement depends on a complex network of causal interactions, many of which are not related to the phenomenon of interest. The value of each temperature measurement will be influenced by features of the thermometer used, the heating apparatus, the sample of lead, the time of day, the ambient temperature, and more additional causal factors than can be named. After collecting sufficiently many measurements, the researcher averages them and on the basis of the value of that average, makes a claim about the melting point of lead. Notice that it is not the individual temperature measurements but the average value of the temperature measurements that provides evidence in support of a claim about the melting point of lead. This calls attention to a general feature of scientific practice: the individual data points, which are the products of specific runs of an experiment, need to be transformed to reveal their evidential value. Typically, this involves eliminating the effects of factors that contribute to the value of specific data points that are not relevant to the theoretical question or hypothesis under investigation. With the influence of these factors still in place, the data speaks only to the melting point of this sample of lead, at this time, as measured with this thermometer. Factors such as those arising from the peculiar features of the thermometer are irrelevant to the melting point of lead insofar as they distort or conceal patterns in the data that reflect the ‘true’ melting point of lead. After data are produced, they are manipulated so that the patterns relevant to the phenomenon of interest are revealed and the irrelevant patterns are suppressed. Averaging the temperature measurements of melted lead is intended to suppress the patterns in the data caused by the irrelevant causal factors that contribute to the value of each specific data point. Other examples of manipulations that suppress irrelevant patterns are noise reduction procedures and manipulations that remove the effect of measurement artefacts. Averaging, as well as more complex analytic techniques such as those discussed in detail below, transform data so that patterns relevant to the phenomenon in question are revealed. The result of these manipulations is taken to be evidence for one or more claims about the phenomenon. A data analysis technique, then, is a series of data manipulations or transformations that clarify the evidential import of the data.4 Different data analysis techniques can be distinguished by the data points that they operate on and by the specific transformations of the data they involve. For example, univariate and multivariate techniques can be distinguished by the data points that they manipulate. Univariate techniques, such as subtraction, treat voxels as independent variables, while multivariate techniques, like pattern classification analysis (discussed in detail below) and representational similarity analysis (Kriegeskorte and Kievit [2013]), treat the data as having many dependent variables. Data analysis techniques that operate on the same class of data points, such as these two multivariate techniques, can be distinguished by the particular manipulations they apply to the data. For example, pattern classification analysis uses a machine learning decision procedure to classify the data, whereas representational similarity analysis uses a measure of similarity to compare brain activity between task conditions. Data manipulations are important because they transform otherwise complex data into a form that investigators can interpret and statistically analyse (Good [1983], pp. 285–6).5 Each manipulation, by virtue of the transformation that it makes, imposes assumptions on the result. These assumptions limit what the result can be taken as evidence about. Just as van Orden and Paap identified several assumptions required by the use of subtractive analyses, most data manipulations require researchers to make assumptions about the data. For example, a standard manipulation performed on neuroimaging data is the removal of patterns caused by magnetic field drift. Magnetic resonance scanners use the variations in a magnetic field to detect the BOLD signal, and the magnetic field in some scanners slowly changes during the course of scanning. Manipulating data such that the effects of field drift are removed requires the assumption that the data are corrupted by magnetic field drift. If the procedure is used on data produced by a scanner that does not have a field drift, then the procedure would introduce artificial patterns into the data. It would do so because the required assumption, that the scanner has a field drift with specific parameters, is not true of the data. In the case of field drift correction, the assumption can be validated by measuring the field drift of a scanner. This simple example illustrates how data manipulations entail or require assumptions to be made of the data, and shows that treating a specific data manipulation in isolation from the rest of the experimental process can make the evidential status of the data appear weaker than it in fact is. Different analysis techniques operate on different data points, implement different manipulations, and require making different assumptions of the data. This is how they reveal (and suppress) different data patterns. For example, subtraction reveals correlations between average amplitudes of the BOLD signal and task performance. Techniques like subtraction, when they include processes for smoothing and averaging the signal, suppress information about differences in activity between voxels within a region. Thus, some subtraction analyses are unable to reveal correlations between the co-ordinated activity of groups of voxels that preserve the same level of average activity between tasks. On the other hand, multivariate techniques, such as pattern classification analysis, correlate distributed patterns of BOLD signal activity with task performance. Pattern classification analysis is sensitive to distributed activity patterns that univariate techniques, like subtraction, cannot detect. However, multivariate techniques are less sensitive to one-dimensional effects that covary with stimulus features, to which univariate techniques are very sensitive (for a detailed discussion of the uses of these techniques, see Davis and Poldrack [2013]). By leveraging their differences, investigators can use several data analysis techniques together to overcome the inferential limitations of a particular technique. The limitations of a technique tend to derive from the assumptions that the technique requires. If assumptions can be identified, depending on the nature of those assumptions, other data analysis techniques can be used to validate them. In this way, the use of multiple analysis techniques on the same data can strengthen an inference from the result of one analysis to the target hypothesis by providing a clearer picture of the evidential import of the data. Specifically, where a given analysis technique provides evidence that can support a haemodynamic hypothesis, the inference from that hypothesis to a theoretical hypothesis will require investigators to make further assumptions about the data. Since different data analysis techniques reveal different patterns, it is often possible to validate some of those assumptions by analysing the data in another way. This is how multiple analysis techniques can come together to strengthen the inference from a haemodynamic to a theoretical hypothesis. Typically, this is done through functional triangulation (Roskies [2010a]), where multiple techniques are used separately on the data, and the hypotheses inferred are further supported by independent analysis of different data sets. The case I will discuss below is different, as the evidence is strengthened not through the independent application of multiple analyses, but the sequential application of analysis techniques. 4 Deconvolution and Pattern Classification Analysis Liu and colleagues’ ([2011]) study aims to determine the role that certain regions of the brain play in directing attention. The primary analysis technique used is pattern classification analysis, a multivariate technique derived from research on machine learning. Pattern classification analysis is used to determine whether cognitive tasks can be differentiated based only on patterns in the BOLD signal that correlate with task performance. As I argue below, this technique alone cannot support a theoretical hypothesis attributing a cognitive role to activity within a region or part of the brain. However, Liu and colleagues do not deploy the technique in isolation. Their analysis includes a region of interest (ROI) selection procedure that partially validates one of the crucial assumptions required by pattern classification analysis. While this does not provide definitive evidence in support of the theoretical hypothesis they advance, it demonstrates how multiple techniques can be used together to bring neuroimaging data to bear on hypotheses beyond those that merely relate haemodynamic activity to task performance. Two behavioural tasks were used to generate Liu et al.'s data set. In both tasks, subjects were presented with two overlaid patterns of dots and were instructed to attend to one pattern or the other. In the first task, both patterns were composed of white dots, but one was rotating clockwise and the other counter-clockwise. In the second task, both patterns were moving in a random-walk, but one was composed of red dots and the other green dots (Liu et al. [2011], pp. 4485–6). The resulting data set contained BOLD signal measurements for each of the six task conditions: attending to clockwise rotating dots, attending to counter-clockwise rotating dots, attending to red dots, attending to green dots, and the null-condition for each task (attending to a fixation cross). The data were pre-processed before they were analysed. This involved head motion correction (to remove artefacts caused by subjects moving while being scanned), removal of low-frequency drift (this corrects for a scanning artefact due to a drift in the magnetic field of the scanner), and conversion of the BOLD signal measurements from raw values into a percentage of signal change (Liu et al. [2011], p. 4486). The result of these transformations is a data set suitable for the analysis procedures with patterns due to known artefacts from head motion and scanner drift suppressed. The pre-processed data were then analysed using a series of analysis techniques. Before discussing the techniques in detail, I will provide a brief overview of the whole procedure. The analysis began with deconvolution, a technique used to isolate the task-relevant portion of the BOLD signal data. The result of the deconvolution analysis was used as the input for a ROI selection procedure. The combination of the deconvolution and ROI selection was then used as the input for pattern classification analysis. The result of the pattern classification analysis was then taken to support a claim about the regions of the brain involved in the modulation of attentional control. Notice that this is not a claim about the relationship between task performance and haemodynamic activity. It is a claim about which parts of the brain implement a particular cognitive process (modulation of attentional control). It is about the relationship between a cognitive function and regional brain activity. This is a theoretical hypothesis. There are multiple inferences involved in moving from a haemodynamic hypothesis to a theoretical hypothesis. Recall that a haemodynamic hypothesis relates BOLD signal data to the performance of a task, whereas a theoretical hypothesis relates brain structure (or the activity in brain structure) to a cognitive process. Inferring from one to the other requires treating the BOLD signal measurements as an indicator of cognitively relevant brain activity within a brain structure, and task performance as an indicator of one or more cognitive processes. Whether or not the task can be taken as an indicator of the cognitive function that the researchers are interested in depends on an underlying theory of psychological processing, and the robustness of the accompanying task analysis. As the focus of this article is on the interpretation of the neuroimaging data, I’m going to assume that the behavioural tasks used are reliable indicators of the modulation of attentional control. It is worth noting, however, that this assumption does not generally hold, especially given the relative lack of critical task analyses in neuroimaging research (for discussion, see Poldrack [2010b]). 4.1 Deconvolution analysis Not all of the measured changes in the BOLD signal are relevant to the subject’s performance of the cognitive task. The first substantive step in analysing neuroimaging data is to extract the portion of the BOLD signal that corresponds with the task manipulation. This process is called deconvolution. Deconvolution is an algorithmic solution to a particular type of signal processing problem in which a signal of interest is convolved, or mixed with, another signal. In general, deconvolving the signal of interest requires solving an equation of this form: (f⊗g)=h, where h is the recorded signal, f is the signal of interest, and g is the signal that f needs to be separated from. In the case of fMRI data, h is the measured BOLD signal, g is the design matrix (a mathematical representation of the task), and f is the haemodynamic response function (hrf). Here, the hrf represents the change in blood oxygenation levels that corresponds with the demands of the cognitive task that the subject performed. The aim of deconvolution analysis is to identify the portion of measured brain activity that is modulated by the task. Solving for the hrf requires pseudo-inverting the design matrix and multiplying it by the measured BOLD signal (this is the matrix-algebra equivalent of dividing both sides in the above equation by in order to calculate f). It is important to note that this procedure only works when the trials are mathematically separable, which can be achieved using an event-related design. An event-related design is such that the stimuli or tasks are separated by an inter-trial interval (usually there are about twenty seconds between tasks). Investigators can then assume that task-relevant BOLD activity occurs for short, discrete intervals corresponding to the onset of the task. The inter-trial interval supports this assumption by ensuring that the trial-relevant signal is temporally localized, and does not uniformly influence subsequent trials.6 Mathematically, this amounts to assuming that task-relevant variation in the BOLD signal is linearly summed with the task-irrelevant BOLD signal, and so the two can be separated by the deconvolution procedure described above. It is worth noting that these (and the following) assumptions are supported by supplementary empirical research, and are not arbitrarily made or taken for granted (for a technical introduction to linear regression, see Kass et al. [2014], Chapter 12). Typically, researchers assume that the haemodynamic response has a canonical shape and use that assumption to determine the form of hrf. In this case, however, the investigators did not want to assume that the hrf takes the canonical form and so they used a linear regression formula to model it. This decision eliminates confounds that might arise as a result of deviations from the canonical model in the hrf. The regression approach also allows the form of the hrf to vary from voxel to voxel, instead of assuming that the BOLD signal follows the same pattern in every voxel. Regression is a curve fitting procedure. The investigators specify an equation, a linear one in this case, with unknown coefficients, that is fit to the data. In this case, the ‘data’ that the curve is fit to are the result of multiplying the BOLD signal measurements with the inverted design matrix. The regression formula is expressed by the following equation: x=βy+ε. Regression requires assuming that errors are independent (which is ensured by the event-related design) and that the noise term, ε, is linearly additive. For each regressor, there will be an additional βy term. Liu and colleagues treated each experimental condition as a separate regressor, which resulted in a total of six regression terms (one for each of the clockwise, counter-clockwise, red, green, and null task conditions). Once the regression formula and design matrix are determined, the design matrix is pseudo-inverted and multiplied by the measured BOLD signal. Then, the result of that is used to determine the unknown β values in the regression equation. Note that this procedure is implemented for each voxel, and so each voxel will have its own set of β values. The β values are then filled into the linear regression formula and the result is the hrf. The hrf as represented by the β values, indicates the portion of the measured BOLD activity that varies with task onset. This could be understood as capturing the portion of the data that is relevant to the manipulation of the experiment. The β values are used in both the ROI selection procedure and the pattern classification analysis. 4.2 Region of interest selection Once the hrf was calculated, the investigators used a goodness-of-fit measure to determine the amount of variance in the measured signal that was accounted for by the hrf. This provides an indicator of the portion of the signal that the hrf models accurately. To do this, they first averaged the modelled activity (the β values) over continuous groups of voxels (which they took to indicate specific regions of the brain). Then, they calculated the goodness-of-fit of the hrf, which is a measure of the amount of variance in the signal that is accounted for by the hrf. To evaluate the statistical significance of the estimate they used a permutation test (for details on these procedures, see Nichols and Holmes [2002]; Gardner et al. [2005]). Where the hrf identifies the portion of the signal modulated by the experimental tasks, the goodness-of-fit measure specifies the regions of the brain (understood as a collection of nearby voxels) where the hrf accounts for a significant portion of the variance in the BOLD signal data. The result of the procedure identifies regions of the brain where the variation in activity is correlated with the task demands of the experiment. When the variance of activity in a region accounted for by the hrf was sufficiently high, the investigators concluded that activity in that region ‘is modulated by feature-based attention’ (Liu et al. [2011], p. 4488). This interpretation of the analysis result is a haemodynamic hypothesis since it relates variation in BOLD signal activity to specific task conditions. The particular haemodynamic hypothesis advanced attributes the portion of the measured BOLD signal captured by the β values that satisfy the goodness-of-fit criteria to the behavioural tasks. Calculating the hrf identifies the portion of the signal that corresponds with the onset of each task condition, eliminating the task-irrelevant portion of the signal. The goodness-of-fit procedure identifies the areas of the brain for which the hrf accounts for a significant portion of the variance in the activity. In other words, this ROI selection procedure identifies the regions in which the measured variation of the BOLD signal can be explained in the context of the experiment. The result is used as a processing step to select regions of interest for pattern classification analysis. As I will show, this step improves the strength of the experimental evidence for the theoretical hypothesis the investigators infer by providing partial validation for a crucial assumption implicit in the use of pattern classification analysis. 4.3 Pattern classification analysis The primary aim of the study was to use pattern classification analysis to test ‘whether the pattern of fMRI response across voxels in an area could distinguish which feature was attended, although the average amplitude did not’ (Liu et al. [2011], p. 4490).7 Pattern classification analysis is a type of multivariate analysis technique that treats each voxel as a dependent variable. The procedure involves four distinct stages: feature selection, classifier selection, training, and testing. Feature selection involves choosing the voxels that will be included in the analysis. Typically, the chosen voxels are those within a particular ROI, although how that ROI is defined varies from study to study. Regions of interest can be defined anatomically, either using software to select the voxels that fall within the anatomical ROI, or by manually tracing the ROI. They can also be defined functionally, using a functional localization task. The BOLD signal data collected while a participant performs such a task can be used to identify voxels that are strongly activated during the performance of that task, which are then defined as the ROI. In this case, the investigators selected the voxels indicated by the procedure discussed in the previous section.8 Classifier selection involves choosing the classifier, which is a machine learning algorithm that will be used to implement the analysis. The classifier represents brain activity in a multidimensional space where each dimension corresponds to the BOLD signal value in each voxel. If 300 voxels are selected, then the space has 300 dimensions. Each point in this space specifies a particular BOLD signal value for each selected voxel and so corresponds to a particular state of brain activity. For the purposes of this article, the particular classifier used does not matter, but it is worth noting that different classifiers have different strengths and weaknesses (Misaki et al. [2010]). Once the classifier is selected it is trained and tested. During the training phase, the classifier is presented with labelled data (the labels indicate the task condition, such as ‘attending to clockwise rotating dots’). The classifier identifies correlations between patterns in the BOLD signal and the provided labels, and based on those correlations it divides the multidimensional space into subspaces. Different classifiers use different procedures for subdividing the multidimensional space. Once subdivided, the classifier identifies each subspace with the task condition that is most frequently associated with it. During testing, the classifier is presented with unlabelled data that it has not seen. It locates the novel data in the multidimensional space and, based on the subspace that they fall into, predicts the task label that corresponds with the data. A data point that is located in the ‘attending to red’ subspace is labelled as ‘attending to red’. The predicted labels are compared with the true labels and the classifier’s accuracy at predicting the task condition on the basis of the BOLD signal data is calculated. The regions of the brain (as defined by the ROI selection procedure) where the classifier performed with sufficient accuracy are said to ‘contain the control signals for maintaining attention to visual features’ (Liu et al. [2011], p. 4493). That is to say, the investigators took the classification results to indicate the regions of the brain that contain signals used for the maintenance of attention. They are attributing a cognitive function to a particular region of the brain (in fact, several regions of the brain). This is an inference to a theoretical hypothesis. In this case, the hypothesis specifies the particular role that the identified regions perform: control of attentional processes. The attribution of functional role is made on the basis of the information carried in the signal that is necessary to support the cognitive function. It’s not just a claim that the indicated regions play such-and-such a role, but, by basing this inference on pattern classification analysis, it is a specification of that role in terms of the signal content. Given this, the inference from the successful predictions of a pattern classifier to the content of the brain activity, and subsequent attribution of functional role, requires additional assumptions. One particular assumption is that the patterns leveraged by the classifier contain information that is accessible to the brain. One way to understand why this assumption is required is to distinguish between the informational and representational content of a signal. The informational content of a signal is whatever facts you can learn from the signal. The representational content is the message actually carried by the signal. Informational content and representational content are not necessarily the same (Dretske [1981]). Consider the following simple case: You are in a closed room and someone in an adjoining room is communicating a message by banging objects together. Perhaps they are using Morse code to express a fact about the weather. With sufficient equipment and expertise, you could determine if the person in the other room is moving around, or features of the materials that they are banging together. These facts are part of the informational content of the signal, as they are facts you can learn by analysing the signal. The actual message being communicated, however, may have nothing to do with these facts. Indeed, in this case the message is about the weather. It may even be the case that the individual who is communicating does not have access to the facts you are able to infer from the signal. They may not know what material the objects are made of, and so could not possibly be communicating these facts. Without some knowledge of Morse code, or additional constraints beyond the signal itself, it is difficult to verify that facts learned from analyses of the signal correspond to the representational content of the signal. Thus, showing that regularities in a signal can be used to reliably make inferences or predictions about the world, as pattern classification analysis does, is not sufficient to support the claim that the signal is transmitting those facts. In these terms, pattern classification analysis characterizes some of the informational content of the BOLD signal. It identifies which tasks can be discriminated between on the basis of patterns in the signal. The inference from the informational content of the BOLD signal to an attribution of functional role requires the assumption that the informational content extracted by the analysis reflects the representational content of the signal. Thus successfully making an inference to the role a region plays on the basis of pattern classification requires, at least, that the information leveraged by the classifier is accessible to the brain or, more broadly, the organism. Neuroscientists are well aware of this limitation. Classifiers are known to be very powerful and researchers caution against drawing inferences from the particular decision metric that a classifier implements. This is because a classifier will leverage anything that permits it to make reliable predictions, including patterns in the data irrelevant to understanding the functioning of the brain (Anderson and Oates [2010]). Tong and Pratte ([2012]) relate an illuminating case of a classifier achieving near perfect accuracy at predicting the experience of humour when a subject was watching a sitcom while in an MRI scanner. A close inspection of the classification process revealed that several voxels in the data were located along the edge of a ventricle (ventricles are a hollow space in the brain filled with cerebrospinal fluid). Since the ventricles contain no blood, the BOLD signal there is zero. Thus, a voxel along the edge of a ventricle will display a significant change in BOLD signal value should the subject’s head move (even slightly), such as when stifling laughter. The classifier’s performance was due to a correlation between slight head motion, humorous stimuli, and voxels that overlap with ventricles. This is why researchers use secondary analyses, such as the ROI selection procedure described above and the searchlight procedure described in footnote eight. These procedures help limit the possibility of the classifier ‘cheating’, which in turn provides validation for the assumption that the information in the signal leveraged by the classifier is accessible to the brain. 5 The Strength of Multiple Analyses The analysis techniques discussed above support different types of hypotheses. The ROI selection procedure supports a haemodynamic hypothesis about the relationship between variation in the BOLD signal and variation in the task conditions. Pattern classification analysis is taken to support a theoretical hypothesis about the functional role played by parts of the brain in attentional processes. The difference in use reflects a difference in evidence. ROI selection identifies the portion of the data that can be explained in the context of the experiment. Pattern classification analysis identifies the task conditions that can be discriminated between on the basis of the fMRI data. The goodness-of-fit measure does not provide evidence that could support a claim about what task conditions can be discriminated between on the basis of the neuroimaging data. Likewise, the result of pattern classification analysis cannot support a claim about the quality of the data, or characterize which portion of the signal is modulated by the experimental manipulation. Indeed, that the classifier will leverage any correlation between task label and fMRI data suggests that it is poorly suited to providing evidence in support of such a claim. The difference in evidence can be traced to a difference in the manipulations of the data. Through their different manipulations, the different techniques reveal different patterns. Using these analyses together strengthens the evidence provided by classification analysis with respect to the target theoretical hypothesis. The permutation test indicates the portion of the signal that can be explained in the context of the experiment. By using the results of that procedure to select features for the classifier, the investigators ensured that the patterns available to the classifier are only those contained in the portion of the signal that is modulated by the experimental task. While this does not guarantee that the leveraged signal carries information that is accessible to the system, it ensures that the leveraged variations are at least relevant to the experimental manipulation. In this way, some of the confounds that might prohibit inferring from the result of classification analysis to the target theoretical hypothesis are controlled for by using multiple analyses in series.9 The permutation test, when used to select a portion of the data for classification, provides validation for one of the problematic assumptions invoked by pattern classification analysis. Not only do these analysis techniques have different evidential targets, but brought together they provide stronger evidence for a theoretical hypothesis than either could alone. In this way, multiple analysis techniques that provide different perspectives on the same data and can strengthen the evidence produced in a single neuroimaging experiment. This is a kind of local robustness. Robustness has been used to defend experimental practice from critiques similar to those discussed here. Specifically, Collins’s ([1985]) experimenter’s regress proposes a vicious circle between experimental results and the techniques that produce those results. He argues that a technique is verified only when it produces correct data, but a technique is only known to produce correct data when it is verified. The critiques raised against neuroimaging by van Orden and Paap, which form the foundation of scepticism towards the technology, are of a similar form. The main issue they identify is that subtraction analysis requires assuming that the brain can be subdivided into functional parts, which is the very claim the analysis result is taken to support. This is a localized case of the experimenter’s regress where the feature of scientific practice under scrutiny is not an instrument, but a data analysis technique. Philosophers have argued that with respect to the experimenter’s regress, the epistemic situation is not as dire as Collins makes it out to be. Cartwright ([1991]), for example, argues that the regress is broken by the robust reproducibility of instrument results. Confidence in the report of an instrument is justified when the measurement result aligns with results produced by a variety of instruments, each of which relies on independent assumptions (pp. 451–2). Culp ([1995]) offers a more careful defence along the same lines. She argues, via a detailed case study analysis of approaches to DNA sequencing, that experimentalists are convinced that measurements are getting at the same phenomenon when multiple measurement techniques, each with different theoretical presuppositions, produces a robust body of evidence (p. 441). Robustness is achieved when the same result is obtained by multiple, independent (or mostly independent) techniques (Wimsatt [1981]). Robustness analysis involves determining the features of measurement or analysis techniques that are invariant under changes in the technique that might influence the result (Calcott [2011]). Robustness is derived from the use of multiple independent approaches to detecting, isolating, or measuring the same target. The independence of measurement results is characterized in terms of theoretical presuppositions required by the use of the instrument. These can also be understood as assumptions researchers must make about the production of the resulting data. Different instruments are independent insofar as they require different assumptions. The same can be said of different data analysis techniques. Data analysis techniques, because of the manipulations they impose on data, require investigators to make assumptions about the result. These assumptions, if true, justify interpretations of the result of the data manipulation or analysis procedure. Different techniques, as used to support different hypotheses, require different assumptions. However, there is a relevant difference between using multiple data analysis techniques as I have described, and the use of multiple measuring instruments to detect the same phenomenon. The robustness of a measurement outcome is improved when independent techniques produce the same result. A defence of neuroimaging against van Orden and Paap’s criticisms along these lines is offered by Roskies ([2010b]), in her account of functional triangulation. Functional triangulation occurs when different analysis techniques produce the same result, and so generate a robust body of evidence. The situation I have described is different. The techniques discussed above do not, and indeed cannot, provide the very same result. While the results of the analyses are not precisely the same, they are similarly aimed. The permutation test indicates the regions of the brain that may play a role in attentional processing, and the pattern classification analysis further clarifies that role. Thus, while they do not provide evidence in support of the very same hypothesis, the hypotheses they individually support are mutually supportive. The permutation test provides support for a haemodynamic hypothesis, and the subsequent analysis of the evidence revealed by that test using pattern classification analysis is brought to bear on a theoretical hypothesis. Insofar as this is a robust result, then, it might be regarded as a weakly robust result. Weak because the techniques do not have the same outcome.10 In general, different data analysis techniques provide different perspectives on the same data, and the use of multiple analysis techniques together can strengthen the quality of evidence produced by a particular method or instrument. This can result in evidence that can support inferences that may not be warranted by the result of a single analysis technique or data manipulation. In this way, multiple analysis techniques used in series can provide experimental results a kind of local robustness. It is ‘local’ because the techniques ultimately depend upon one another. While the different perspectives are not fully independent, because one analysis technique is used as a processing step for a subsequently applied technique, they still contribute to the robustness of the inference because different techniques reveal (and suppress) different patterns and rely on different assumptions. Their differences are what contribute to the strengthening of the evidence. The general lesson of the experimenter’s regress is that problematic assumptions can arise in the context of experimentation. The general lesson of the appeals to robustness is that those assumptions can (sometimes) be validated by comparing different perspectives on the same subject. With respect to scepticism towards the use of neuroimaging data, I have argued that problematic assumptions, which arise from the use of particular analysis techniques, can be validated by using different data analysis techniques that require different assumptions. This provides the inference with a (weak) local robustness. 6 Conclusion I have demonstrated that different data analysis techniques provide evidence for different phenomena and that multiple analysis techniques can be used together to improve the epistemic situation in neuroimaging research. Thus, the debate about the epistemic status of neuroimaging, which is framed in terms of the logic of subtraction, is at best an evaluation of the limitations of analysis techniques that depend upon that logic. Sweeping conclusions about the range of hypotheses that neuroimaging technology can and cannot be used to investigate are not supported by this literature. The argument presented above provides grounds for a mild optimism with respect to neuroimaging technology: it can be used to do more than provide evidence about hypotheses specifying the relationships between BOLD activity and task performance. I leave identifying what specific hypotheses and phenomena neuroimaging technology can be used to investigate for future work, as completing this task will require a careful evaluation of a representative collection of the data analysis techniques and experimental strategies used in neuroimaging research. Given that different analysis techniques provide different evidence, the diversity of techniques used in neuroimaging research suggests that philosophers concerned with the epistemology of neuroimaging should focus their attention on evaluating the evidential quality and scope of particular analysis techniques (such as subtraction) and classes of analysis techniques (such as multivariate analyses). Such evaluations should take into account the specific theoretical goals they are put towards (functional localization or tracking the content of neural representations, to name two). The general lesson here is that data analysis techniques play an important role in the generation of scientific evidence. Different data analysis procedures and differences in how those procedures are implemented can make a difference to the range of phenomena about which the result of the analysis is informative. This is a feature of scientific practice in need of more careful philosophical attention. Acknowledgements I’d like to thank Jacqueline Sullivan, Chris Viger, Joseph McCaffrey, Robert Foley, Frédéric Banville, Daniel Booth, Chris Martin, Anna Blumenthal, Jordan Dekraker, and two anonymous reviewers for constructive feedback on drafts. Additional thanks to participants in the annual Canadian Society for History and Philosophy of Science conference, where early versions of this project were presented, and members of the Köhler Memory Lab at the Brain and Mind Institute for productive and insightful discussions on this topic. Funding for this research was provided by the Social Sciences and Humanities Research Council of Canada, and the Rotman Institute of Philosophy. Footnotes 1 There has been a steady shift from using univariate analysis techniques that treat the neuroimaging data as a scalar value, usually an average, towards the use of multivariate analysis techniques that treat the neuroimaging data as a vector. These new techniques have allowed neuroimaging researchers to pursue new theoretical goals and study new hypotheses, such as the investigation of the content of neural representations (for an introductory review of multivariate techniques, see Tong and Pratte [2012]). 2 The fMRI scanning protocol does not directly measure metabolic activity. During an fMRI scan, radio pulses cause hydrogen atoms to align with a uniform magnetic field. As they relax to equilibrium they release energy, which the scanner measures. Deoxygenated haemoglobin, unlike oxygenated haemoglobin, causes the nearby magnetic field strength to vary, resulting in a difference in the measured energy and forming the basis of the BOLD signal. 3 It is important to note that the techniques discussed here, collectively referred to as multivariate pattern analyses, are neither the only nor first multivariate techniques to be used in neuroimaging. For example, spatiotemporal partial least squares is a multivariate technique that has been in use since the late 1990s (McIntosh et al. [1996], [1998]). I owe this clarification to an anonymous reviewer. 4 Thanks to an anonymous reviewer for this phrasing. 5 This process is often referred to as data reduction. 6 The inter-trial interval does not need to be the same between every trial. Indeed, it is typically jittered, or randomly varied, so that the interval between any two pairs of trials varies. The variation in inter-trial interval is important for blocking certain confounds and artefacts that can arise when event onset is uniformly spaced. Since jittered events are still mathematically separable, I have omitted a detailed discussion of jitter for the sake of simplicity. 7 The investigators reported on a third analysis in the paper that I do not discuss in detail. That analysis, which adheres closely with the logic of subtraction, was intended to investigate if average BOLD signal amplitude discriminated between the specific features attended (red dots versus green dots). It did not. 8 In addition to the analyses I discuss in detail, they also completed a whole-brain searchlight analysis. A searchlight is a specific kind of feature selection and analysis process. In a searchlight, investigators define a volume (the ‘searchlight’), and then run the pattern classification analysis procedure over voxels within that volume. Then they move the volume and run the analysis again. This procedure is typically used to identify arbitrary subdivisions of the brain that result in reliable classification, or to examine how classification accuracy changes as the classifier is given data from different parts of the same network or part of the brain. 9 Although not all. I leave discussion of those details for future work as it is beyond the scope of this article. 10 This should not be cause for scepticism, at least not scepticism that is localized to the particular case of neuroimaging. There is reason to believe that any difference between measurement techniques can contribute to a difference in the phenomena probed by those techniques (for a discussion of this with respect to neurobiology, see Sullivan [2009]). If this is true, then weak robustness is the norm for scientific knowledge. References Aktunç E. M. [ 2014 ]: ‘ Severe Tests in Neuroimaging: What We Can Learn and How We Can Learn It ’, Philosophy of Science , 81 , pp. 961 – 73 . Google Scholar Crossref Search ADS Anderson M. L. , Oates T. [ 2010 ]: ‘A Critique of Multi-voxel Pattern Analysis’, Proceedings of the 32nd Annual Conference of the Cognitive Science Society, pp. 1511 – 16 . Ashby F. G. [ 2011 ]: Statistical Analysis of fMRI Data , Cambridge, MA : MIT Press . Bogen J. , Woodward J. [ 1988 ]: ‘ Saving the Phenomena ’, Philosophical Review , 97 , pp. 303 – 52 . Google Scholar Crossref Search ADS Calcott B. [ 2011 ]: ‘ Wimsatt and the Robustness Family: Review of Wimsatt’s Re-engineering Philosophy for Limited Beings ’, Biology and Philosophy , 26 , pp. 281 – 93 . Google Scholar Crossref Search ADS Cartwright N. [ 1991 ]: ‘ Replicability, Reproducibility, and Robustness: Comments on Harry Collins ’, History of Political Economy , 23 , pp. 143 – 55 . Google Scholar Crossref Search ADS Collins H. [ 1985 ]: Changing Order , London : SAGE Publications . Culp S. [ 1995 ]: ‘ Objectivity in Experimental Inquiry: Breaking Data-Technique Circles ’, Philosophy of Science , 62 , pp. 430 – 50 . Google Scholar Crossref Search ADS Davis T. , Poldrack R. A. [ 2013 ]: ‘ Measuring Neural Representations with fMRI: Practices and Pitfalls ’, Annals of the New York Academy of Sciences , 1296 , pp. 108 – 34 . Google Scholar Crossref Search ADS PubMed Dretske F. [ 1981 ]: Knowledge and the Flow of Information , Cambridge, MA : MIT Press . Friston K. J. , Holmes A. P. , Price C. J. , Büchel C. , Worsley K. J. [ 1999 ]: ‘ Multisubject fMRI Studies with Conjunction Analyses ’, NeuroImage , 10 , pp. 385 – 96 . Google Scholar Crossref Search ADS PubMed Gardner J. L. , Sun P. , Waggoner R. A. , Ueno K. , Tanaka K. , Cheng K. [ 2005 ]: ‘ Contrast Adaptation and Representation in Human Early Visual Cortex ’, Neuron , 47 , pp. 607 – 20 . Google Scholar Crossref Search ADS PubMed Good I. J. [ 1983 ]: ‘ The Philosophy of Exploratory Data Analysis ’, Philosophy of Science , 50 , pp. 283 – 95 . Google Scholar Crossref Search ADS Hardcastle V. G. , Stewart M. C. [ 2002 ]: ‘ What Do Brain Data Really Show? ’, Philosophy of Science , 69 , pp. S72 – 82 . Google Scholar Crossref Search ADS Haxby J. V. , Gobbini M. I. , Furey M. L. , Ishai A. , Scouten J. L. , Pietrini P. [ 2001 ]: ‘ Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex ’, Science , 293 , pp. 2425 – 30 . Google Scholar Crossref Search ADS PubMed Huettel S. A. , Song A. W. , McCarthy G. [ 2008 ]: Functional Magnetic Resonance Imaging , Sunderland, MA : Sinauer . Kass R. E. , Eden U. , Brown E. [ 2014 ]: Analysis of Neural Data , New York : Springer . Klein C. [ 2010a ]: ‘ Images Are Not the Evidence in Neuroimaging ’, British Journal for the Philosophy of Science , 61 , pp. 265 – 78 . Google Scholar Crossref Search ADS Klein C. [ 2010b ]: ‘ Philosophical Issues in Neuroimaging ’, Philosophy Compass , 5 , pp. 186 – 98 . Google Scholar Crossref Search ADS Kriegeskorte N. , Kievit R. A. [ 2013 ]: ‘ Representational Geometry: Integrating Cognition, Computation, and the Brain ’, Trends in Cognitive Sciences , 17 , pp. 401 – 12 . Google Scholar Crossref Search ADS PubMed Landreth A. , Richardson R. C. [ 2004 ]: ‘ Localization and the New Phrenology: A Review Essay on William Uttal’s The New Phrenology ’, Philosophical Psychology , 17 , pp. 107 – 23 . Google Scholar Crossref Search ADS Liu T. , Hospadaruk L. , Zhu D. C. , Gardner J. L. [ 2011 ]: ‘ Feature-Specific Attentional Priority Signals in Human Cortex ’, The Journal of Neuroscience , 31 , pp. 4484 – 95 . Google Scholar Crossref Search ADS PubMed Logothetis N. K. [ 2008 ]: ‘ What We Can Do and What We Cannot Do with fMRI ’, Nature , 453 , pp. 869 – 78 . Google Scholar Crossref Search ADS PubMed Martin C. B. , McLean D. A. , O’Neil E. B. , Köhler S. [ 2013 ]: ‘ Distinct Familiarity-Based Response Patterns for Faces and Buildings in Perirhinal and Parahippocampal Cortex ’, The Journal of Neuroscience , 33 , pp. 10915 – 23 . Google Scholar Crossref Search ADS PubMed Mayo D. [ 2005 ]: ‘Evidence as Passing Severe Tests: Highly Probably versus Highly Probed Hypotheses’, in Achinstein P. (ed.), Scientific Evidence: Philosophical Theories and Applications , Baltimore, MD : John Hopkins University Press , pp. 95 – 127 . McIntosh A. , Lobaugh N. , Cabeza R. , Bookstein F. , Houle S. [ 1998 ]: ‘ Convergence of Neural Systems Processing Stimulus Associations and Coordinating Motor Responses ’, Cerebral Cortex , 8 , pp. 648 – 59 . Google Scholar Crossref Search ADS PubMed McIntosh A. R. , Bookstein F. L. , Haxby J. V. , Grady C. L. [ 1996 ]: ‘ Spatial Pattern Analysis of Functional Brain Images Using Partial Least Squares ’, Neuroimage , 3 , pp. 143 – 57 . Google Scholar Crossref Search ADS PubMed Misaki M. , Kim Y. , Bandettini P. A. , Kriegeskorte N. [ 2010 ]: ‘ Comparison of Multivariate Classifiers and Response Normalizations for Pattern-Information fMRI ’, NeuroImage , 53 , pp. 103 – 18 . Google Scholar Crossref Search ADS PubMed Nichols T. E. , Holmes A. P. [ 2002 ]: ‘ Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples ’, Human Brain Mapping , 15 , pp. 1 – 25 . Google Scholar Crossref Search ADS PubMed Poldrack R. [ 2010a ]: ‘Subtraction and Beyond: The Logic of Experimental Designs for Neuroimaging’, in Bunzl M. , Hanson S. J. (eds), Foundational Issues in Human Brain Mapping , Cambridge, MA : MIT Press , pp. 147 – 60 . Poldrack R. [ 2010b ]: ‘ Mapping Mental Function to Brain Structure: How Can Cognitive Neuroimaging Succeed? ’, Perspectives on Psychological Science , 5 , pp. 753 – 61 . Google Scholar Crossref Search ADS Roskies A. [ 2010a ]: ‘Neuroimaging and Inferential Distance: The Perils of Pictures’, in Bunzl M. , Hanson S. J. (eds), Foundational Issues in Human Brain Mapping , Cambridge, MA : MIT Press , pp. 195 – 216 . Roskies A. [ 2010b ]: ‘ Saving Subtraction: A Reply to Van Orden and Paap ’, British Journal for the Philosophy of Science , 61 , pp. 635 – 65 . Google Scholar Crossref Search ADS Sullivan J. [ 2009 ]: ‘ The Multiplicity of Experimental Protocols: A Challenge to Reductionist and Non-reductionist Models of the Unity of Neuroscience ’, Synthese , 167 , pp. 511 – 39 . Google Scholar Crossref Search ADS Tong F. , Pratte M. S. [ 2012 ]: ‘ Decoding Patterns of Human Brain Activity ’, Annual Review of Psychology , 63 , pp. 483 – 509 . Google Scholar Crossref Search ADS PubMed Uttal W. [ 2001 ]: The New Phrenology , Cambridge, MA : MIT Press . Uttal W. [ 2011 ]: Mind and Brain: A Critical Appraisal of Cognitive Neuroscience , Cambridge, MA : MIT Press . van Orden G. C. , Paap K. R. [ 1997 ]: ‘ Functional Neuroimages Fail to Discover Pieces of Mind in Parts of the Brain ’, Philosophy of Science , 64 , pp. S85 – 94 . Google Scholar Crossref Search ADS Wimsatt W. [ 1981 ]: ‘Robustness, Reliability, and Overdetermination’, in Re-engineering Philosophy for Limited Beings , pp. 43 – 74 . © The Author 2017. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. For Permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Journal

The British Journal for the Philosophy of ScienceOxford University Press

Published: Dec 1, 2018

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off