Вестник Томского государственного университета. Философия. Социология. Политология. 2023.
№ 75. С. 76-93.
Tomsk State University Journal of Philosophy, Sociology and Political Science. 2023. 75. pp. 76-93.
Original article УДК 167.7
doi: 10.17223/1998863Х/75/7
FALLIBILISM AS THE BASIS OF RATIONALITY: PHILOSOPHICAL IMPLICATIONS FOR NATURAL AND ARTIFICIAL INTELLIGENCE
Igor F. Mikhailov
Institute of Philosophy, Russian Academy of Sciences, Moscow, Russian Federation, ifmikhailov@iph. ras. ru
Abstract. There are two principal conceptions of rationality: one that bounds it with logic, and the other that associates it with efficiency. The first one has language for its model with its systematicity and regularity. Within this conception, to be rational is to follow rules of inference. The other conception is rather modeled by mathematics, as it presupposes that to be rational, in general, is to get more at a lesser cost, to which end all the data should be represented quantitatively. The history of cognitive science may well be seen as a gradual transition from logic and language as the science's basis to parallel processing and statistic computations. The basic concepts of cognitive science in its classical era were "representation" and "computation (processing)", i.e., mind was attributed the architecture of the von Neumann machine: processor, memory, input/output, etc. In connectionism, on the contrary, computation is focused on the most effective adaptation of the network to changing conditions. At the same time, even if individual chains of computations are slower here than in a serial architecture machine, the network as a whole benefits due to the ability to carry them out not only simultaneously and in parallel, but also interdependently - when the sequence takes into account not only the result of the previous step, but also the results of parallel processes. What the predictive processing theory has in common with cognitive symbolism is that it also relies on computations and representations, although not in all its variations. However, like connectionism and neural network vision in general, representations are considered not as symbolic, but as sub-symbolic, expressed by certain probability distributions, and, accordingly, computations are understood as probabilistic (Bayesian) inference. But where predictive processing differs from both competing computationalist paradigms is the understanding of representation as prediction: a cognitive system driven by multilevel attractors generates hypotheses about the causal structure of the environment, allowing for the prediction of incoming perceptual data. These considerations are enough for us to rationally conclude that rationality is rather capability to surf the intractable world with just an updatable engine for predictions and a feedback circuit than commitment to any imposed rules. Philosophers - and not only them - should consider the question of whether some strict and reliable logic of abduction is possible, or - the more probable option as it seems - probabilistic mathematics will remain the only assistant in explaining and creating intelligent systems. Keywords: rationality, cognitive science, computation, inference
For citation: Mikhailov, I.F. (2023) Fallibilism as the basis of rationality: philosophical implications for natural and artificial intelligence. Vestnik Tomskogo gosudarstvennogo uni-versiteta. Filosofiya. Sotsiologiya. Politologiya - Tomsk State University Journal of Philosophy, Sociology and Political Science. 75. pp. 76-93. (In Russian). doi: 10.17223/1998863X/75/7
© I.F. Mikhailov, 2023
Научная статья
ФАЛЛИБИЛИЗМ КАК ОСНОВА РАЦИОНАЛЬНОСТИ: ФИЛОСОФСКИЕ СЛЕДСТВИЯ ДЛЯ ЕСТЕСТВЕННОГО И ИСКУССТВЕННОГО ИНТЕЛЛЕКТА
Игорь Феликсович Михайлов
Институт философии РАН, Москва, Россия, ifmikhailov@iph.ras.ru
Аннотация. Рациональность может осмысливаться исходя либо из логики, генетически связанной с языком и грамматикой, либо из эффективности, требующей количественного подхода. Первый вариант может быть оправдан метафизической верой в то, что мир устроен логически и требует общей реалистической позиции. Второй не нуждается ни в каких метафизических допущениях и совместим как с номинализмом, так и с инструментализмом.
Ключевые слова: рациональность, когнитивная наука, вычисления, вывод
Для цитирования: Михайлов И.Ф. Фаллибилизм как основа рациональности: философские следствия для естественного и искусственного интеллекта // Вестник Томского государственного университета. Философия. Социология. Политология. 2023. № 75. С. 76-93. doi: 10.17223/1998863Х/75/7
There are two principal conceptions of rationality: one that bounds it with logic, and the other that associates it with efficiency. The first one has language for its model with its systematicity and regularity. Within this conception, to be rational is to follow rules of inference. But, if we suppose that, eventually, it is all about successful dealing with the world, and, at the same time, we hold the first stance, then, metaphysically, we are committed to the belief that the world is logical in its composition and arrangement. In other words, if you feel that you need to seek logical ties between factual statements to obtain pragmatically valid results, this is because, in your view, facts are logically bound in reality.
The other conception is rather modeled by mathematics, as it presupposes that to be rational, in general, is to get more at a lesser cost, whatever it means in specific cases. Metaphysically, this position is more cautious as it restrains from judging about the internal essence of what is outside focusing instead on the interactional experience enhanced with memory and feedback.
Historically, the first position dates back to realism, the second one to nominalism. In the modern-day philosophy of science, we may speak of realism vs. instrumentalism as applied to this case.
Do people think logically?
Rationality studies have respectable history. In Wason's 1968 experiment, subjects were given a set of cards with letters printed on one side and numbers on the other. According to the rule they were provided with, if there is a vowel on one side of the card, then there will certainly be an even number on the other. The subjects were offered four lying cards, on the "upper" sides of which there were symbols A, K, 2, 7 respectively. The subjects were urged to answer the question: which cards need to be turned over to make sure that the rule is fulfilled. According to the logical rules modus ponens and modus tollens, you need to turn over the cards labelled "A" and "7". However, the distribution (in %) of the real answers of real subjects looked as follows: "A" - 89, "K" - 16, "2" - 62, "7" - 25
[1. P. 12771]. It is obvious that for the most part people were guided not by the rules of logic declared in their culture, but by some other considerations.
In an experiment by Gigerenzer (1998), subjects were given the following condition: suppose that 0.3% of people have colon cancer. There is a 50 percent chance that a colon cancer test will detect the disease, and a 3 percent chance that it will show that cancer is present when it is not. Q: What is the likelihood that a person who tests positive has cancer? According to Bayes' rule, the correct answer is 4.8%. Gigerenzer interviewed medical professionals (!), who can hardly be suspected of a lack of relevant experience and training. And yet, the median answer of the respondents is 47%, almost 10 times more than the mathematically correct one [1. P. 12771-12772].
These experiments have provoked intense discussion in the literature and have generated several explanatory hypotheses. Thus, the author of the review article himself believes that performance of "Wason (1968) card tasks, Gigerenzer's (1998) statistical tasks, and other experiments demonstrate that people are, in fact, irrational, when measured against accepted (in their cultures. - I.M.) standard inferential rules." [1. P. 12772]. Tversky and Kahneman [2] believe that, in these experiments, people transfer heuristics and rules of thumb that are successful in everyday life to laboratory settings where they do not apply. In these interpretations, there is a conviction that any deviation from the only possible rationality is irrational.
On the contrary, John Anderson, a classic of cognitive science in its classical symbolic incarnation, believes that the critical point of what could be called "rational analysis" is to figure out what parameter the cognitive system is intended to optimize, and, knowing this, make predictions about how people will behave in specific experimental tasks [3. P. 28]. To elaborate on this point, Oaksford and Chater emphasize that people do not view a task as a logical test, but rather try to establish a causal relationship between two events [4]. Lacking relevant experience with cards, people interpret Wason's announced rule as a causal relationship similar to those they have seen in the past. In Wason's problem, as the authors see it, we have a situation in which behavior is non-rational with respect to the laws of deduction but can be understood as rational in the context of how people usually seek information.
I should point out a consideration important for understanding the nature of rationality that the causal relationship is associative, not logical. According to Bender and Beller, current research is still discussing a "modified version (of the Levy-Bruhl concept. - I.M.), according to which two modes of thinking are still distinguished - sometimes termed as rule-based vs. associative, reflective vs. intuitive, or abstract vs. content-specific - yet assumed to co-exist in all cultures, is still discussed in research on thinking and reasoning, albeit controversially" [5. P. 2].
A rather important clarification was made by Gigerenzer himself. If the same task was offered to medical examinees not in percentage-statistical, but in simple arithmetic terms - 30 out of every 10,000 people have colon cancer; of those 30, 15 will test positive; of the remaining 9,970 cancer-free people, 300 will still test positive - 67 percent of the subjects answered correctly, compared with 4 percent in the experiment when the data were presented according to the rules of probability theory [1. P. 12774]. From this fact, we can conclude that the success
of solving a problem depends not only on the content and complexity of the data, but also on the conformity of the form of their representation with the daily practice of people in the search for information.
I believe that this discussion makes the following generalization quite reasonable. Initially, we, people, like other animals, are not logical machines, but associative-statistical ones, since we have a parallel (a neural network), not serial, computer on-board. We are compelled to follow deductive rules by science, law, and other social institutions that carry out and regulate social computation. Logic, being an important product of these institutions, nevertheless knows very little about real human thinking.
What does the history of cognitive science teach us?
This history may well be seen as a gradual transition from logic and language as the science's basis to parallel processing and statistic computations. The basic concepts of cognitive science in its classical era were "representation" and "computation (processing)", i.e. mind was attributed the architecture of the von Neumann machine: processor, memory, input/output, etc. In mind, according to this view, there are structures representing data about external objects, and there are processors that perform computational procedures upon these data. Cognitivists of various schools still debate what representations and computations are, but these concepts are sort of signature ones for cognitive science as it has been conceived, including some varieties of connectionism, a trend in cognitive research that models mental processes using neural network models. At the same time, current publications also present anti-representationalist currents, such as radical enactivism [6, 7] or "dynamic systems" [8, 9] sometimes labelled as post-cognitivist.
Symbolism
The development of cognitive sciences began in the middle of the 20th century as a response to the dominance of behaviorism in psychology. Behaviorism was formed within the framework of a general positivist trend when there were no scientific means and opportunities to study the inner content of the brain and consciousness. Under these conditions, a rigor science could focus only on the study of observed behavior. In the 1950s-1960s, the first computer technologies appeared. However, long before that, back in the 17th century, Thomas Hobbes famously defined reasoning as computation [10]. In the 1830s, Charles Babbage came up with the idea of a universal automatic computer. In 1936, Alan Turing formulated the concept of a universal machine that can compute any computable function [11].
Cognitive science, which emerged in the mid-1960s, bases on the "computer metaphor". Both Noam Chomsky and his school, and later Jerry Fodor and his followers believed that, over the neural network of the brain, a computing device is built up, which breaks up into blocks, modules, and other components of the serial computer architecture. Within the framework of this theory, the activity of mind is considered compatible with the Turing machine.
In 1976, Newell and Simon hypothesized a "physical symbol system" as a machine that over time develops a certain set of symbol structures. Such a machine (be it a human or a digital computer) has conditions necessary and sufficient for
general intelligent activity [12. P. 116]. One of the classics of the symbolic paradigm in cognitive science, Zenon Pylyshyn, wrote in 1984 that people are able to act basing on representations because they physically form them as cognitive codes. Since this is exactly what computers do, it follows that cognition is a kind of computation [13. P. xiii]. Classicist approaches postulate the presence of fundamentally identifiable physical states of the cognitive system, one-to-one corresponding to the identifiable mental state. These representations are the internal "symbols" that mind operates with. So, symbolism as historically the first form of cognitive science proceeded from an understanding of computation that dates back to Turing's seminal 1936 article: a computation is a transformation of a sequence of characters in accordance with some algorithm. To transfer this vision to the human cognitive apparatus, it was necessary to assume that inside this apparatus there is something that functionally plays the role of symbols and something that functionally plays the role of algorithms. As John Anderson wrote,
However, the unitary theory found an important metaphor in the modern general-purpose computer and, perhaps more significantly, in symbolic Programming languages, which showed how a single set of principles could span a broad range of computational tasks. It also became clear that the set of computational functions was unlimited, meaning that general processing principles were essential to span broad ranges of tasks. It made no sense to create a special system for each conceivable function [14. P. 2].
In the literature, the terms "classicism", "symbolism", and "computationalism" are often used interchangeably, tacitly assuming that Turing's view of computation as a rule-like manipulation of symbols is the only legitimate one. However, as Nir Fresco shows, in addition to symbolic computations, which are preferred within the framework of classicism, one can single out sub-symbolic computations that formed the basis of the connectionist paradigm, which, in turn, can be understood as digital or analog, depending on the architecture of the constructed neural networks [15]. One can also speak of a special kind of computation on which computational neuroscience is based, and which, according to Fresco, are neither digital nor analog, but are computations sui generis.
Connectionism
In the mid-1980s connectionism appears as an alternative direction in cognitive science. It emerged as an interdisciplinary approach to the study of cognition that integrates elements from the fields of artificial intelligence, neuroscience, cognitive psychology, and the philosophy of mind. This approach assumes that cognitive phenomena can be explained using a set of general information processing principles known as parallel distributed processing (PDP).
Connectionism is a framework for studying cognitive phenomena using the architecture of simple processors interconnected by weighted connections. According to this model, each neuron receives many inputs from other neurons. The neuron integrates signals by calculating a weighted activation sum. Based on the amount of total input, the activation function (e.g., the threshold function) determines the level of outgoing activation of the neuron. Outgoing activation propagates to subsequent neurons.
The group of researchers who laid the foundations of PDP set up an experiment to train a connectionist network to use irregular English verbs in the
past tense. Linguists know the effect of the so-called U-turn in mastering past tense forms of irregular verbs. During the learning process, the connectionist network reproduced the same effect: that is, at the beginning of the experiment, the network revealed a good knowledge of the forms that irregular English verbs take in the past tense. Then the quality of this understanding deteriorated. And only at the last stage of the training did it return to its previous level. According to the proponents of connectionism, this experiment shows that their approach to cognitive science points the right way realistically reproducing mental habits of real people.
The statistical training of networks makes cognition possible in the face of fuzzy or insufficient data, context-sensitive concepts, and dynamic representations. By simple computing units, we mean neurons that can only enter quantitatively measured states of activation and measure the weights of connections creating complex network configurations described by an equally complex mathematical apparatus. Each such configuration described by a mathematical vector can serve as a representation of a mental state. But such a neural network, unlike serial architecture computers, practically does not need pre-programming; on the contrary, some of them are capable of self-training, because of which they come to generalizations, classifications, and predictions. Connectionist models have proven to be effective in speech and pattern recognition, as well as in the study of memory and learning processes.
In contrast to the programming principles adopted for serial computational architecture, where all algorithms are executed sequentially and the programmer knows the final goal of the program (the condition upon which it terminates), a network program is not contained in anyone's head in the form of blocks, algorithms, and goals. It is focused on the most effective adaptation of the network to changing conditions. At the same time, even if individual chains of computations are slower here than in a serial architecture machine, the network as a whole benefits due to the ability to carry them out not only simultaneously and in parallel, but also interdependently - when the sequence takes into account not only the result of the previous step, but also the results of parallel processes.
Neuroscience
Recently, the complex of cognitive sciences started to integrate with neuroscience, which gives rise to what can be called a complex of neurocognitive sciences. This integration is taking place partly in the experimental realm, driven by the advent of new tools for studying the brain, such as functional magnetic resonance imaging, transcranial magnetic stimulation, and optogenetics. Part of this integration is taking place on the theoretical realm due to advances in understanding how large populations of neurons can perform tasks that have so far been explained in terms of rules and concepts.
Theoretical neuroscience is an attempt to develop mathematical and computational theories, as well as models of structures and processes in the brain of humans and other animals. It differs from connectionism in greater biological accuracy and realism in modeling the behavior of many neurons organized into functionally significant cerebral regions. As science advances, computational models of the brain are approaching both more realistic imitations of neurons, reflecting the electrical and chemical aspects of their activity and modeling interaction between different areas of the brain, such as the hippocampus and the
cerebral cortex. These models are not rigorous alternatives to computational models attempting to imitate inference, rule following, conceptual structures, analogies, and images, but must ultimately conform to them and demonstrate the neurofunctional basis of cognitive processes.
Embodied mind
The "embodied" (situated) approach rejects the classicist understanding of intellect as abstract, individual, rational, and distinguished from perception and action, opposing this understanding of it as embodied, embedded, and distributed [16]. In other words, cognitive processes do not take place in the brain, but between the brain, the rest of the body, and the environment. Classical approaches in AI try to create an artificial expert or an artificial physicist when one should start from the level of insect "intelligence" to proceed to the recreation of language and abstract thinking based on these working models. Learning how to tie shoelaces is more costly than learning how to solve math problems or play chess, and it took billions of years for evolution to develop sensitivity and mobility, and only millions of years to develop proper human abilities.
I would also unite the dynamic approach [8] and the embodied mind into one family. The first involves describing cognitive activity in terms of a "space" of system states defined by time-dependent variables. The consequence of this approach is the replacement of the theoretical language based on computations and representations with the language associated with geometry and dynamic states. Summarizing, one could say that in dynamic explanations the role of mathematics increases at the expense of the role of logic.
The reasons for the unification are their following common features:
• a fundamentally anti-symbolic position - the rejection of the metaphor of computations, a return to the good old causality;
• a turn from logic to mathematics;
• an acknowledgement of the essential role of human biological nature in the formation of intellectual capabilities.
Predictive processing
According to some newer concepts, our internal representations are in no way casts of reality. In fact, they are models created by our brains. These models are updated again and again in accordance with newly incoming perceptual data. That is, a representation that we have and that we are aware of is the result of multiple iterations of the brain rebuilding the a priori models it has basing on the external feedback.
"Predictive processing" (PP) is considered by many to be one of the most influential and most explanatory cognitive approaches currently available [17-22].
What PP has in common with cognitive symbolism is that it also relies on computations and representations, although not in all its variations [23]. However, like connectionism and neural network vision in general, representations are considered not as symbolic, but as sub-symbolic, expressed by certain probability distributions, and, accordingly, computations are understood as probabilistic (Bayesian) inference. But where PP differs from both competing computationalist paradigms is the understanding of representation as prediction: a cognitive system driven by multilevel attractors generates hypotheses about the causal structure of
the environment, allowing for the prediction of incoming perceptual data. The quantitative discrepancy between the predicted and actually obtained external data is the "free energy" that any non-equilibrium system tends to minimize. This minimization follows two alternative paths: updating generative models (cognitive acts) or active inference to change external data (behavior).
Karl Friston offers a biologically plausible explanation based on generative models that generate top-down predictions. The latter are encountered and compared with lower-level upstream representations to calculate the prediction error [24. P. 392]. The natural tendency to minimize the difference between the predicted representation and the incoming data is at the core of the so-called "free energy principle" (FEP). Proponents of PP derive the main principles of their approach from the philosophical and psychological doctrines known in the past, such as those of Alhazen, Kant, and Helmholtz [25. P. 210]. As regards the latter, PP goes back to his idea of "unconscious conclusions". They are formed at an early age and, according to Helmholtz, form the basis of many perception phenomena. He claims that we tune our senses to distinguish the things that affect them with maximum accuracy. Perception, therefore, is the result of a meeting of external influences with what a person already knows [26]. The physicist has also explained the very concept of "free energy" [27]. At the end of the 20th century, the Helmholtz machine, a hierarchical unsupervised learning algorithm capable of determining structures underlying various data patterns, was named after him [28].
PP complements the idea of predictive coding, firstly, with the concept of precision optimization, which modulates the computation of prediction errors at different levels of the system. The precision of the samples is optimized by previous experience, so the computation requires a statistical basis - the "empirical Bayesian approach".
This means that processing is layered and context-dependent, and thus a processing system, be it a cell, a human, or a robot, is capable of hierarchical empirical Bayesian inference approximation, thereby better adapting to an uncertain and ever-changing environment than a system performing only exact Bayesian inference. On the other hand, PP is largely based on the concept of active inference. To minimize prediction errors, the system can, for one, update predictive models to match sensory inputs. Alternatively, it can actively sample the environment in search of data that better matches the prediction. This means that it just actively engages with its environment, from where, in fact, the term itself comes. According to Friston, a living organism can be better explained as an operating system if we assume that the triggers for active output are proprioceptive inputs, because they can be directly functionally related to reflex arcs [29, 30]. Thus, PP turns out to be a unified theoretical framework capable of explaining both perception and action. The above concepts are partial theories that appeal to real or simulated mechanisms and can therefore be directly falsified. But PP is based on FEP that operates in the same way as the well-known principles of natural science: it shapes explanations of particular theories but is itself not amenable to direct refutation. Equipped with a full set of tools, PP leads not only to interesting empirical explanations, such as mood changes or schizophrenia, but also to some philosophical conclusions. Thus, the idea of perception as an ongoing inference from sensory data to their probable causes is an interesting reformulation of such favorite subjects of philosophers as body image, body feeling, and body self-
awareness. These phenomena are naturally explained by the putative causes of interoceptive and proprioceptive sensations. According to Jakob Hovey, the Self in this context can be represented "as a subset of putative causes of sensory input related to one's own actions, and hence the possibility of discussing whether such a set of causes deserves the designation 'I' " [25. P. 217]. Such an extended version of PP obviously includes the familiar plots of Merleau-Ponty-style phenomenology, which, as is well known, was the methodological basis for enactivism and "embodied cognition" [16].
As for the mechanistic implementation of these computational models, they are "created and parameterized by physical variables in the brain of the organism, such as neural activity and synaptic strength, respectively" [27. P. 57]. Thus, we can conclude that PP develops, corrects, and enriches the original doctrine of connectionism, using, in particular, other formal tools, based on the same ontology.
Friston demonstrates an interesting application of this theoretical framework to the modeling of language communication, which was previously the realm of classical symbolic cognitive science. According to him, the criteria for evaluating and fine-tuning the interpretation of someone else's behavior are the same that underlie actions and perceptions in general, namely, minimizing prediction errors. The concept of communication in PP is based on a generative model or narrative shared by agents who exchange sensory signals. According to Friston, models based on hierarchical attractors that generate different categories of sequences allow the hermeneutic circle to be closed simply by updating the generative models and their predictions to minimize prediction errors. It is important to note that these errors can be calculated without even knowing the true state of the other interlocutor, which thus solves the problem of hermeneutics [31. P. 129-130]. Friston and his colleagues created a computer emulation of two songbirds using software agents whose trills were generated by some attractor-based models and recursively refined by listening to each other. The model showed that birds follow the narrative generated by dynamic attractors in their generative models, which are synchronized through sensory exchange. This means that both birds can sing "from the same music sheet" while maintaining a coherent and hierarchical structure of their narrative. It is this phenomenon that Friston associates with communication [24. P. 400]. Generative models used to determine one's own behavior can be used to determine the beliefs and intentions of the other, provided both parties have similar generative models. This perspective creates representations of a set of intentional actions and narratives offering a collective narrative shared by communicating agents [24. P. 401].
It can be assumed that the explanatory possibilities of PP cover not only issues of psychology and traditional philosophy of mind, but also newer topics of "social intelligence" and social cognition. Of course, such universality may raise concerns about the falsifiability of the theory. A brief discussion on this subject is found in [25. P. 221]. But, as a rule, such concern may be related to the question whether each behavior can be represented as being guided by the subject's statistical predictions. It seems that in this case it is difficult to imagine a direct refutation, but there is always room for a better theory that will demonstrate more explanatory potential. Moreover, the PP theory is still too young: although it shows a certain success in real experiments, it will inevitably face the need to give detailed mechanistic explanations of all implemented statistical models. These explanations will certainly be falsifiable.
Evolution in understanding computations
As shown above, the "cognitive revolution" in cognitive sciences took place as a reaction to neobehaviorism at a time when the real computer revolution gave psychologists and linguists the conceptual tools for the scientific study of consciousness. Until then, a desire to avoid psychological terminology was considered a sign of being scientific. The original version of this neo-mentalist picture was perhaps too bluntly copied from computer science and included two main elements: computations that are performed on symbolic representations. That is, computation and representation have become the two conceptual pillars on which the cognitivist paradigm is based. However, to correctly determine the place and role of each of these concepts, we need to understand the conceptual subtleties, which are connected with broader shifts in modern natural sciences. This concerns the concept of computation more, as it plays an increasing role in physics, astrophysics, neuroscience, biology, and even social sciences. This does not relate to their instrumental or methodological role, but mostly the ontological one - when not the researcher, but the subject of research is busy with computations.
Computations and the problem of their definition
The first abstract model of what constitutes a computation came about as an attempt to solve the computability problem for functions posed by David Hilbert. Alan Turing [32] tried to schematize the activity of a human "computer" ("computer" in his terminology) in the most abstract form: as a sequential recording and transformation of symbols of some finite alphabet (for example, numbers) in cells printed on a paper sheet. Since, according to our intuitions, in order to be a computation, this process must obey certain rules - that is, be rigidly determined - then, according to Turing, a person can be replaced by some abstract printing cartridge, and paper by an endless tape divided into cells, each of which can contain only one symbol. And here we have a machine whose device is determined by several finite sets: possible actions (print, erase, move left or right), possible states, alphabet, and rules. If each computational step is uniquely determined by an already existing symbol in the tape cell being read and the current state of the machine, then such a machine is considered deterministic. If a machine is capable of reading from a tape not only symbols (data), but also the rules for their processing (program), then such a machine is considered universal. Turing showed that his machine could compute any computable function. Based on the concept of a universal Turing machine, John von Neumann described the fundamental architecture of a serial digital computer.
Turing's concept of computation and computability emerged in the context of mathematical discussion. Hence comes the (I would say, purely psychological) belief that computations are carried out by conscious subjects (say, mathematicians) for a specific purpose. Although in fact, it seems to me, Turing's general idea (perhaps not even fully realized by himself) was this: if a person (a "computer" in his terms) performs some sequence of symbolic operations that can also be performed by a machine, then this is computation. By default, computation assumes an algorithm. But if you have an algorithm, you can basically build a machine to run it.
The characteristic features of the Turing model of computation are its symbolism, seriality, and algorithmic nature: a computer, whoever or whatever it may be, performs operations with symbols, processing one of them at each step of the computation in accordance with some algorithm. However, one could go higher up the stairway of abstraction: instead of "Some device or person processes a series of characters in accordance with an algorithm," this process can be described as "X does something to Y in accordance with Z." At the generic level, computation appears as a kind of a dynamic procedural tripartite relation. This relationship can be described by two pairs of related categories: serial - parallel (distributed) and discrete - continual (the last pair is sometimes not quite correctly identified with a pair of digital - analog).
Real complex computational devices are often mixed in type. So, some artificial neural networks allow real numbers as the values of their variables, i.e., in fact, they are (partly) continual in their computational architecture. Neurons in biological brains exchange series of spikes, each of which is described by a continuum amplitude, but their sequences themselves are discrete [33. P. 467]. And the participation of chemical neurotransmitters in enhancing or suppressing the electrical activity of neurons significantly adds "analogue" to the whole process. Due to the understanding of the limited applicability of the Turing concept for describing natural computational processes, theoretical alternatives have appeared. Thus, a few publications formulate the concept of computations based on the abstraction of a "mechanism" [34]. A mechanism as a computational system is understood as a space-time unity of functionally defined components or elements with a sufficiently large number of possible states. The initial state of the mechanism is considered the "input" of the computational process, the final state is its "output".
Due to its abstract formulation, this model at first glance looks like a concept that can demonstrate the computational nature of a wide range of natural processes. However, it overlooks the peculiarity of computational processes, which can be called multilevel. This distinctive feature of computational processes, unlike other systematic ones, was first noticed by David Marr, who listed the levels of any computational process that are irreducible to each other [35. P. 24-25]. Approximately the same was meant by B.J. Copeland in his discussion on the analog system of a physical device as an "honest" (non-non-standard) model of architecture and algorithm of some computation [36]. Both concepts suggest that the actual computational level of the process, which appears to a reasonable observer as symbolic and expedient, is built on top of the algorithmic level, consisting of an arbitrarily complex combination of primitive steps, and the level of physical implementation of this computation, where both the goal and the algorithm are implemented in the form of electrical, chemical, quantum, and other natural interactions. At the same time, a certain computation with its symbolism and purpose can be implemented by more than one algorithm, and a certain algorithm, in turn, can be implemented by more than one physical system. The concept of mechanism, in my opinion, does not have sufficient conceptual means to distinguish between these levels.
I would also like to note that the question of whether "computation" is some kind of a metaphor for describing multilevel processes in self-organizing systems, or such systems are literally computational devices, is relevant to the discussion
between scientific realism and anti-realism, which is not specific to the issues of computational approach.
For natural computing, it is not so much their discrete/continuous nature that is important, but their parallel/distributed architecture. The latter is a clear, albeit indirect, evidence of the evolutionary origin of natural computational systems. In the absence of an engineer acting consciously and purposefully, computational power can only be increased by summing the capacities of already existing processors. Random combinations of cells or individuals, if they lead to an increase in computational power and/or save energy costs for computing, are naturally selected and fixed. Therefore, species with a more developed brain displace those who lag in this indicator from certain ecological niches. In the same way, species with a more efficient social organization gain evolutionary advantages. In the latter case, brain size may no longer be of decisive importance since computational tasks are distributed over a well-organized network of cognitively loaded individuals (cf. the history of Neanderthals and Cro-Magnons).
Modeling of computational systems by humans initially went in the direction opposite to natural evolution. It turned out to be easier for us to teach the computer symbolic, logical, and mathematical operations than to recognize images, sounds, to walk and control fine motor skills. Similarly, if artificial computers need to be specially prepared for networking, natural computing devices (cells or organisms) at a certain stage of their evolution are readily combined into tissues, complexes, or networks. Natural causality, as it is understood in traditional natural science, is not capable of explaining this development. But, if computational systems are ontologically implemented as a superstructure of control levels and a chain of descending causality, then the search for greater computational efficiency leads to their parallelization. If this is so, then we need a theory that combines physics and the theory of computation (possibly, linking the increase in the "computability" of natural processes with non-equilibrium thermodynamics).
Given the considerations outlined above, one could define that:
(Df 1) Computation is a process carried out by a computational system, which may be one of many possible mechanisms for some representation, and
(Df 2) Representation is a mapping of the formal properties of one process to the formal properties of another.
This definition retains the classicist cognitive stance in that computation continues to be understood as operations on representations. But representations themselves get rid of any kind of psychologism and receive a purely formal definition. This approach corresponds to our intuitions, according to which any computation is always a computation of something, and this something is understood more as a goal, and not as a referent. Finally, the definition presented clearly shows the inclusion relation between sets of algorithmic and computational processes, since processes that can be represented as obeying rules but not aiming at the formal properties of other processes can be considered algorithmic but not computational. So, any algorithm can be represented as a set of implications of the form "if A, then B". Then, if A and B are formal properties of distinct but interrelated processes, their algorithmic interdependence forms a natural or artificial computing device. Otherwise, if, for example, A is a property of a thing, and B is an action with it, then the process that fulfills such rules can be considered algorithmic, but not computational.
Computation levels
As said above, according to Marr, any computational system can be described at three levels:
(1) a theory of computation describing its goal, appropriateness, and strategy,
(2) representation of data and the algorithm of their processing, and
(3) the physical implementation of the algorithm and representations.
The heuristic value of the three-level concept, according to its author, is that "since the three levels are only rather loosely related, some phenomena may be explained at only one or two of them" [35. P. 25]. For example, according to Marr, the phenomenon of afterglow (after looking at a burning lamp) can only be explained at the physical (physiological) level (3), while the illusion known as the Necker cube suggests a non-equilibrium neural network with two possible stable states (level 2) and the fundamental possibility of a two-dimensional three-dimensional interpretation of two-dimensional images (level 1).
Marr uses his scheme against the popular thesis, including among connectionists, about the fundamental difference between the (von Neumannian) computer and the brain - the thesis, which implies that the first performs serial computations, and the second parallel ones. For Marr, "the distinction between serial and parallel is a distinction at the level of algorithm; it is not fundamental at all - anything programmed in parallel can be rewritten serially (though not necessarily vice versa)" [35. P. 27]. Even more interestingly, one of the most significant objections (which, however, can be considered as an addition) to Marr's classical three-level scheme was made by his friend and co-author Tomaso Poggio, almost thirty years after the publication of the first edition of Vision. In the afterword to the reprint of the book in 2010 and later, in a separate article [37], Poggio argues that, if in the 1970s it seemed to him and Marr that computer science could teach neuroscience a lot, now "the table has been turned", and many of the discoveries of computational neuroscience, the progress of which was largely ensured by the works of Marr, can make a significant contribution to the general theory of computation.
Thus, the most important, from Poggio's point of view, omission of Marr's theory was the lack of an answer to the question of how a living organism can learn the necessary computational algorithms. Some of the current statistical machine learning theories offer plausible answers to this question. Poggio believes that the empirically proven hierarchical organization of the brain regions responsible for processing visual information is an evolutionarily developed structure whose goal is to reduce the level of complexity of visual samples (sample complexity) to develop a representation suitable for efficient computation.
In our time, because of the aforementioned "turning of the table", theories of computation are emerging that can shed light on the algorithms underlying learning and even evolution. Therefore, Poggio proposes the following update of the Marrian scheme:
(1) evolution,
(2) learning and development,
(3) computation,
(4) algorithms,
(5) wetware, hardware, circuits, and components.
The main idea of the article seems to be the following: we will not be able to create thinking (intelligent) machines while we are forced to program their intelligent actions. We must not charge them with ready-made algorithms but make them capable of developing these algorithms in interaction with the external environment, i.e., learning. We must also reproduce in machines the evolutionary mechanisms that ensure the progress of learning. And today it becomes possible.
Computation types
According to Bechtel, computations are understood as a sequence of states of a mechanistic system, in which input is the initial state of the system, and output at different points in time is its subsequent states [34. P. 469-470]. Consequently, according to Fresco, weaker than symbolism, the concepts of digital computing are no longer based on the Turing concept, but on algorithmic descriptions of the operation of mechanistic systems. Alternative concepts for digital computing include finite automata, the Gandhi machine, discrete neural networks, and supercomputers, among others. The possibility of these and similar implementations allows us to argue that cognition is an algorithmic computation, which does not have to be symbolic [15. P. 361].
So, the gradual generalization of the varieties of computationalism is as follows: (a) symbolism, (b) weak digital computationalism, and (c) generic computationalism, including digital, analog, quantum, and neurophysiological ones.
There is a problem with the precise localization of connectionist computations on this scale of generality. Neural networks have different architectures and process different types of data. Some of their varieties can be considered as a subset of digital (non-character) computations, some as a subset of analog ones. But in any case, connectionism, if, according to William Ramsey's important caveat, we are talking about trainable rather than programmable networks, has no intersection points with the symbolic approach: "When connectionists examine the role of the weights in network computations, their primary question is not 'What rule does this weight encode?' but rather, 'How does this connection causally contribute to the processing?' " [38. P. 49]. And this, of course, shows that, even in discrete connectionist networks, computation can only be considered as digital in a weak sense based on the basis of, for example, the abstraction of a mechanistic system or some other alternative models.
The most interesting problem is the exact identification and localization of neurophysiological computations within the proposed taxonomy. At first glance, this variety should be related to connectionist (neural network) computation and, accordingly, obey the logic and mathematics used in connectionist networks. However, this is not the case, and there are several reasons for this. Firstly, artificial neurons are very simple processors, mostly virtual ones, while brain neurons are much more complex physiologically, chemically, and genetically. Secondly, an important part of training an artificial neural network is the backpropagation algorithm, which implies the possibility of transmitting data between layers in the opposite direction. In contrast, the input and output devices of a biological neuron - the dendrites and axons, respectively - assume only unidirectional signal transmission. Thirdly, in connectionist networks, data transmission proceeds sequentially from layer to layer, while the architecture of the
brain is much more complicated and, in many ways, not completely clear to neurophysiologists themselves.
But the most interesting difference is that, if connectionist networks in principle obey the main Marr principle, which asserts the relative independence of the levels of a computational system, then in neurophysiological computations, the material architecture of a biological neural network plays a decisive role in determining possible computational algorithms. Thus, if the Marrian classicist approach is characterized as top-down, then neurophysiological computations demonstrate the opposite - bottom-up - approach, where understanding of the essence of computations is predetermined by knowledge of facts at the level of physical implementation [15. P. 367].
While computational neuroscience remains a highly debatable area, the bottom-up approach practiced here suggests that neurophysiological computing is a special type that is not localized only in the "strong (symbolic)-weak computationalism" dichotomy, but also in the digital-analog dichotomy. A series of neural excitation peaks (spike trains) demonstrate both analog and digital properties: the curve of electrical potentials at the external "ports" of cells is continual, but the sequence of peaks itself forms a discrete structure. It can be assumed that computational neuroscience as a scientific discipline is still too young, and it has yet to decide what kind of computations it must deal with.
What does fallibilism have to do with mind?
The historical trajectory of both the cognitive science and the conception of computations outlined above invokes a sudden analogy, which is Charles Peirce's doctrine of fallibilism [39. Vol. 1. P. 141-175]. All the specific formulations thereof aside, it may converge to a simple thought that our cognitive and epistemic means are never sufficient to ascribe ultimate truth to any proposition. Recursively, this very statement may be considered totally or partially true depending on one's philosophical stance. Thus, Peirce himself appears to be a restricted fallibilist as he stated that it is "a matter of fact to say that each person has two eyes. It is a matter of fact to say that there are four eyes in the room. But to say that if there are two persons and each person has two eyes there will be four eyes is not a statement of fact, but a statement about the system of numbers which is our own creation" [39. Vol 1. P. 149].
As we see, this statement continues, as some call it, the "transcendental" line that passes through later Kant and earlier Wittgenstein and that consists in clear distinguishing propositions saying anything about the world and those showing1 the arrangement of our representational tools. If the latter are supposed to be justified by default - and it is hard to imagine how they could not be as such, then Peirce and the like-minded exemplify the so-called restricted fallibilism. Although the history of mathematics and logic witnesses rather against this restriction, it is more important in the scope of this discussion that whatever kind a particular fallibilism exemplifies, it generally does not imply skepticism. While the former claims that we never have enough reasons to consider any of our beliefs ultimately true, the latter goes further in stating that we never have reasons to ascribe truth to them at all. This implies that the two philosophical standpoints have different
1 For more on the distinction between "saying" and "showing", see [41].
models of truth: with skepticism, it may be only total, inseparable, and uncountable, while fallibilism allows for this or that quantifying or conditioning of truth. This, in turn, makes it an effective pattern for better presenting the cognitive grounds of rationality defended herewith.
Karl Popper's idea of falsifiability [40. P. 57-73] logically relates to fallibilism. To briefly sum it up, a theory may be considered sufficiently confirmed if and until its factual predictions are not refuted. This easily entails that the theory currently held by the scientific community is never perfect or ultimate, but it is the best one we have now.
But why do we invoke this epistemological principle in the course of this cognitive discussion? From all the above said, one may conclude that this principle in the philosophy of science reflects some general structure of natural or -nowadays - some of artificial cognitive agents. Thus, if theory can be likened to a generative model, and its empirical consequences to the probabilistic inferences of the latter, we may see then living things bestowed with cognitive tools and capabilities as performing the same Popperian algorithm: generate a hypothesis, make all the available inferences thereof, test them against perceptions, and, as long as they stand, go on with the hypothesis. The difference is that the Boolean two-valued logic of the Popper system is replaced in our brains, as in any living cells, by a probabilistic computation of some acceptable discrepancy limit.
Logically, both approaches are based on the principle of abduction, formulated by the same Charles Peirce [39. Vol. 2. P. 100-194], which itself is a kind of a probabilistic conclusion. In his own words, abduction "is a method of forming a general prediction without any positive assurance that it will succeed either in the special case or usually, its justification being that it is the only possible hope of regulating our future conduct rationally, and that Induction from past experience gives us strong encouragement to hope that it will be successful in the future ..." [39. Vol. 2. P. 270]. Please note both keywords of our discussion used here: "prediction" and "rationally". From A^B and B, we tend to conclude that "it is likely A", though logic hardly provides any justifications thereto.
And these considerations are enough for us to rationally conclude that rationality is rather a capability to surf the intractable world with just an updatable engine for predictions and a feedback circuit, than commitment to any imposed rules.
Philosophers - and not only them - should consider the question of whether some strict and reliable logic of abduction is possible, or - the more probable option as it seems - probabilistic mathematics will remain the only assistant in explaining and creating intelligent systems.
References
1. Schooler, L. & Schooler, L. (2001) Rational Theory of Cognition in Psychology. In: Smelser, N.J. & Baltes, P.B. (eds) International Encyclopedia of Social & Behavioral Sciences. Pergamon. pp. 12771-12775.
2. Tversky, A. & Kahneman, D. (1974) Judgment under uncertainty: Heuristics and biases. Science. 185(4157). pp. 1124-1131. DOI: 10.1126/science.185.4157.1124
3. Anderson, J.R. (1990) The adaptive character of thought. New Jersey: Erlbaum Associates.
4. Oaksford, M. & Chater, N. (1996) Rational Explanation of the Selection Task. Psychological Review. 103(2). pp. 381-391.
5. Bender, A. & Beller, S. (2011) The cultural constitution of cognition: Taking the anthropological perspective. Frontiers in Psychology. 2(67). pp. 1-6. DOI: 10.3389/fpsyg.2011.00067
6. Hutto, D.D. (2011) Representation Reconsidered. Philosophical Psychology. 24(1). pp. 135—
139.
7. Hutto, D.D. & Myin, E. (2013) Radicalizing Enactivism: Basic Minds Without Content. MIT
Press.
8. Smith, L.B. & Thelen, E. (2003) Development as a dynamic system. Trends in Cognitive Sciences. 7(8). pp. 343-348. DOI: 10.1016/s1364-6613(03)00156-6
9. Thelen, E. & Bates, E. (2003) Connectionism and dynamic systems: are they really different? Developmental Science. 6(4). pp. 378-391. DOI: 10.1111/1467-7687.00294
10. Barnouw, J. (2008) Reason as Reckoning: Hobbes's Natural Law as Right Reason. Hobbes Studies. 21(1). pp. 38-62.
11. Turing, A.M. (1938) On Computable Numbers, with an Application to the Entscheidungsproblem. A Correction. Proceedings of the London Mathematical Society. s2-43(1). pp. 544-546.
12. Newell, A. & Simon, H.A. (1976) Computer Science as Empirical Inquiry: Symbols and Search. Communications of the ACM. 19(3). pp. 113-126. DOI: 10.1145/360018.360022
13. Pylyshyn, Z.W. (1986) Computation and Cognition: Toward a Foundation for Cognitive Science. Cambridge, MA, US: The MIT Press.
14. Anderson, J.R. (1983) The Architecture of Cognition. Hillsdale, NJ, US: Harvard University
Press.
15. Fresco, N. (2012) The Explanatory Role of Computation in Cognitive Science. Minds Mach (Dordr). 22(4). pp. 353-380. DOI: 10.1007/s11023-012-9286-y
16. Varela, F.J., Rosch, E. & Thompson, E. (1992) The Embodied Mind: Cognitive Science and Human Experience. MIT Press.
17. Keller, G.B. & Mrsic-Flogel, T.D. (2018) Predictive Processing: A Canonical Cortical Computation. Neuron. 100(2). pp. 424-435.
18. Wiese, W. (2017) What are the contents of representations in predictive processing? Phenomenology and the Cognitive Sciences. 16(4). pp. 715-736. DOI: 10.1007/s11097-016-9472-0
19. Clark, A. (2015) Radical predictive processing. Southern Journal of Philosophy. 53(S1). pp. 3-27. DOI: 10.1111/sjp.12120
20. Hohwy, J. (2018) The Predictive Processing Hypothesis. In: Newen, A., De Bruin, L. & Gallagher, S. (eds) The Oxford Handbook of 4E Cognition. Oxford University Press. pp. 128-146.
21. Mendonza, D., Curado, M. & Gouveia, S.S. (eds) (2020) The Philosophy and Science of Predictive Processing. Bloomsbury Publishing Plc.
22. Wiese, W. & Metzinger, T. (2017) Vanilla PP for philosophers: A primer on predictive processing. Philosophy and Predictive Processing. pp. 1-18.
23. Kirchhoff, M.D. & Robertson, I. (2018) Enactivism and predictive processing: a non-representational view. Philosophical Explorations. 21(2). pp. 264-281. DOI: 10.1080/13869795.2018.1477983
24. Friston, K. & Frith, C. (2015) A Duet for one. Consciousness and Cognition. 36. pp. 390405.
25. Hohwy, J. (2020) New directions in predictive processing. Mind and Language. 35(2). pp. 209-223. DOI: 10.1111/mila.12281
26. von Helmholtz, H. (2013) Treatise on Physiological Optics. Dover Publications.
27. Buckley, C.L. et al. (2017) The free energy principle for action and perception: A mathematical review. Journal of Mathematical Psychology. 81. pp. 55-79. DOI: 10.1016/j.jmp.2017.09.004
28. Dayan, P. et al. (1995) The Helmholtz Machine. Neural Computation. 7(5). pp. 889-904.
29. Friston, K. et al. (2017) Active inference: A process theory. Neural Computation. 29(1). pp. 1-49. DOI: 10.1162/NECO_a_00912
30. Sajid, N. et al. (2021) Active inference: demystified and compared. Neural Computation. 33(3). pp. 674-712. DOI: 10.1162/neco_a_01357
31. Friston, K.J. & Frith, C.D. (2012) Active inference, Communication and hermeneutics. Cortex. 68(Kelso). pp. 129-143. DOI: 10.1016/j.cortex.2015.03.025
32. Turing, A.M. (1937) On Computable Numbers, with an Application to the Entscheidungsproblem. Proceedings of the London Mathematical Society. s2-42(1). pp. 230-265.
33. Piccinini, G. & Bahar, S. (2013) Neural Computation and the Computational Theory of Cognition. Cognitive Science. 37(3). pp. 453-488. DOI: 10.1111/cogs.12012
34. Craver, C. & Bechtel, W. (2006) Mechanism. In: Pfeifer, J. & Sarkar, S. (eds) The Philosophy of Science: An Encyclopedia. Psychology Press. pp. 469-478.
35. Marr, D. (2010) Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. The MIT Press.
36. Copeland, J.B. (1996) What is computation? Synthese. 108(3). pp. 335-359.
37. Poggio, T. (2012) The Levels of Understanding Framework, Revised. Perception. 41(9). pp. 1017-1023.
38. Ramsey, W. (1997) Do Connectionist Representations Earn Their Explanatory Keep? Mind and Language. 12(1). pp. 34-66. DOI: 10.1111/j. 1468-0017.1997.tb00061 .x
39. Peirce, C.S. (1958) Collected Papers of Charles Sanders Peirce. Cambridge, MA: Harvard University Press.
40. Popper, K. (2005) The Logic of Scientific Discovery. Routledge.
41. Mikhailov, I.F. (2018) On Whistling What We Cannot Say. The Semantics of Music-an attempt at a conceptual account. [Online] Available from: https://www.researchgate.net/publication/ 360727199_0n_Whistling_What_We_Cannot_Say_The_Semantics _of_ Music-an_attempt_at_a_con-ceptual_account?channel=doi&linkId=628784f4cd5c1b0b34e9586e&showFulltext=true (Accessed: 7th August 2023).
Information about the author:
Mikhailov I.F. - Dr. Sci. (Philosophy), leading research fellow, Institute of Philosophy, Russian Academy of Sciences (Moscow, Russian Federation). E-mail: ifmikhailov@gmail.com
The author declares no conflicts of interests.
Сведения об авторе:
Михайлов И.Ф. - доктор философских наук, ведущий научный сотрудник Института философии РАН (Москва, Россия). E-mail: ifmikhailov@gmail.com
Автор заявляет об отсутствии конфликта интересов.
Статья поступила в редакцию 15.07.2023; одобрена после рецензирования 15.09.2023; принята к публикации 07.10.2023
The article was submitted 15.07.2023; approved after reviewing 15.09.2023; accepted for publication 07.10.2023