https://doi.org/10.48417/technolang.2024.01.07 Research article
Toward Practical Hermeneutics of Fourth Paradigm AI for Science
Tiantian Liu'( ) and Carl Mitcham2 'Fudan University. 220 HandanRoad, Shanghai, 200433, China 2 Colorado School of Mines, 1500 Illinois St., Golden, CO 80401, USA liutt20@fudan.edu.cn
Abstract
The combination of artificial intelligence and science creates a new method for scientific research, which has achieved magnificent success, but also raises questions of how to understand the knowledge produced by this method. Hermeneutics is a method of interpreting scripture that is widely used in the humanities such as history. Based on the history of science, Thomas Kuhn suggests that science can also be understood hermeneutically. Building on Kuhn's work, Joseph Rouse argues that there are two hermeneutics for understanding scientific knowledge, a theoretical hermeneutics and a practical hermeneutics. The knowledge generated by Al-enabled science can also be examined from the perspective of these two hermeneutics. Theoretical hermeneutics argues that scientific knowledge has not been revolutionized at the theoretical level and that AI is only another tool to improve the efficiency of scientific research. However, this approach fails to acknowledge problems of Al-enabled knowledge generation such as data as a new form of publication and Al-assisted writing, automated laboratories, the role of AI in knowledge generation, and the opaqueness, unexplainability and bias of machine learning-generated knowledge. This article suggests the need for practical hermeneutics to address the above issues and to understand the knowledge produced by new research methods in the context of scientific practice.
Keywords: AI for science; Theoretical hermeneutics; Practical hermeneutics; Joseph Rouse
Acknowledgment LIU Tiantian thanks the "Second International Workshop on Hermeneutics of Science and technology" held in June 2023 at South China University of Technology. She also acknowledges and thanks WANG Guoyu and WANG Yingchun for significant comments on the concepts of the "Fourth Paradigm" and "AI for Science". Additionally, Carl Mitcham thanks WANG Guoyu for hosting him as a visiting scholar in the Center for the Ethics of Science and Technology for the Human Future.
Citation: Liu, T., & Mitcham, C. (2024). Toward Practical Hermeneutics of Fourth Paradigm AI for Science. Technology and Language, 5(1), 89-105. https://doi.org/10.48417/technolang.2024.01.07
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
УДК 1: 004.8
https://doi.org/10.48417/technolang.2024.01.07 Научная статья
На пути к практической герменевтике четвертой парадигмы искусственного интеллекта для науки
ТяньтяньЛю^ ) и Карл Митчем2 Университет Фудань, 220 Ханьдань-роуд, Шанхай, 200433, Китай 2 Государственный университет в Голдене, ул. Иллинойс, 1500, Голден, Колорадо CO 80401, США
liutt20@fudan.edu.cn
Аннотация
Сочетание искусственного интеллекта и науки создает новый метод научных исследований, достигший великолепных успехов, но также ставящий вопрос о том, как понимать знания, полученные с помощью этого метода. Герменевтика - это метод толкования священных текстов, который широко используется в гуманитарных науках, таких как история. Основываясь на истории науки, Томас Кун предполагает, что науку можно понимать и герменевтически. Основываясь на работе Куна, Джозеф Роуз утверждает, что существует две герменевтики для понимания научного знания: теоретическая герменевтика и практическая герменевтика. Знания, генерируемые наукой с помощью ИИ, также можно рассматривать с точки зрения этих двух герменевтик. Теоретическая герменевтика утверждает, что научное знание не подверглось революции на теоретическом уровне и что ИИ лишь еще один инструмент повышения эффективности научных исследований. Однако этот подход не учитывает проблемы генерации знаний с помощью ИИ, такие как данные, как новая форма публикации; написанное с помощью ИИ; автоматизированные лаборатории; роль ИИ в генерации знаний, а также непрозрачность, необъяснимость и предвзятость знания полученного с помощью машинного обучения. В данной статье говорится о необходимости практической герменевтики для решения вышеуказанных проблем и понимания знаний, получаемых с помощью новых методов исследования, в контексте научной практики.
Ключевые слова: Искусственный интеллект; ИИ для науки; Теоретическая герменевтика; Практическая герменевтика; Джозеф Роуз
Благодарность Тяньтянь Лю благодарит "Второй международный семинар по герменевтике науки и технологий", состоявшийся в июне 2023 года в Южно-Китайском технологическом университете. Она также выражает признательность Гоюй Вану и Инчунь Вану за важные комментарии по концепциям "Четвертой парадигмы" и "Искусственного интеллекта для науки". Кроме того, Карл Митчем благодарит Гоюя Вана за то, что он принял его в качестве приглашенного ученого в Центре этики науки и технологий для будущего человечества.
Для цитирования: Liu, T., Mitcham, C. Toward Practical Hermeneutics of Fourth Paradigm AI for Science // Technology and Language. 2024. № 5(1). P. 89-105. https://doi.org/10.48417/technolang.2024.01.07
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
INTRODUCTION
At a January 2007 meeting of the U.S. National Research Council, Turing Award computer scientist Jim Gray gave a talk suggesting that, with the development of new methods for data collection and analysis, a new paradigm was emerging in the practice of what he called "e-science." In his words,
Originally there was just experimental science, and then there was theoretical science, with Kepler's Laws, Newton's Laws of Motion, Maxwell's equations, and so on. Then, for many problems, the theoretical models grew too complicated to solve analytically, and people had to start simulating. These simulations have carried us through much of the last half of the last millennium. At this point, these simulations are generating a whole lot of data, along with a huge increase in data from the experimental sciences. People now do not actually look through telescopes. Instead, they are "looking" through large-scale, complex instruments which relay data to datacenters, and only then do they look at the information on their computers.
The world of science has changed.... The new model is for the data to be captured by instruments or generated by simulations before being processed by software and for the resulting information or knowledge to be stored in computers. Scientists only get to look at their data fairly late in this pipeline. The techniques and technologies for such data-intensive science are so different that it is worth distinguishing data-intensive science from computational science as a new, fourth paradigm for scientific exploration. (in Hey et al., 2009, pp. xvii-xix)
This idea was more formally iterated in a 2009 "Perspectives" piece in Science (Bell et al., 2009) and became the theme of an oft-cited book (Hey et al., 2009). In 2020, the argument was expanded in a U.S. Department of Energy (DOE) report, AIfor Science. Using the term "data-intensive science," it surveyed a "new generation of methods and scientific opportunities in computing, including the development and application of AI methods (e.g., machine learning, deep learning, statistical methods, data analytics, automated control, and related areas) to build models from data and to use these models alone or in conjunction with simulation and scalable computing to advance scientific research" (Stevens et al., 2020, p. 1).
Jim Gray and the DOE report are concerned with how to interpret the knowledge produced by the new methods of data-intensive science: how will it fit with or advance existing scientific knowledge? But to examine AI for science solely in terms of its knowledge-producing potential elides its practical or power-altering aspects. New methods of knowledge production invite practical as well as theoretical hermeneutic reflection. Drawing particularly on the work of philosopher of science Joseph Rouse, we seek to introduce practical hermeneutic reflection on this variously named "fourth paradigm" that is alleged to form a historically emergent complement to scientific traditions of empirical description, mathematical modeling, and computational simulation.
SIGNATURE ACHIEVEMENTS OF FOURTH PARADIGM SCIENCE
To appreciate the character of fourth paradigm science, consider some signature achievements. One highly representative example is protein 3D structure prediction. Machine learning from protein structure databases has enabled AlphaFold to predict protein structure (Jumper et al., 2021). This development dramatically reduces the time required for protein structure prediction and supersedes previous experimental methods (such as cryo-EM) to provide a more rapid method for designing new proteins.
Another example is the recent Chinese development of an "all-around AI-Chemist with a scientific mind" that can read literature, design experiments, complete experimental processes, analyze data, and finally produce predictive models to obtain material samples with desirable composition ratios (Zhu et al., 2022). Such instruments radically reduce the amount of time human chemists spend on experiments and alter the way new materials can be discovered or engineered with potential to transform the chemical laboratory of the future. Generative AI is another tool for speeding things up by quickly surveying the literature and providing first drafts for reports (Noy and Zhang, 2023).
AI for Science surveys related changes in computational materials science, digital earth systems science, computational biology, and high energy and nuclear physics. Similar transformations are occurring in the social sciences (Hill, 2020). Al's introduction into multiple fields produces efficiencies and results that could not have been imagined with previous methods, thus exemplifying the potential of the new paradigm in scientific research (Xu et al., 2021) and in many engineering fields (Montans et al., 2019). On the basis of such achievements, data-driven and AI-enabled research is being interpreted as a historically new, fourth paradigm of science.
THE PERSPECTIVE OF HERMENEUTICS
The philosophical name for the conscious attempt to make interpretations is "hermeneutics." Hermeneutics was originally concerned with methods for the theoretical interpretation of sacred texts such as the Bible that were considered culturally authoritative. As the Bible was supplemented or replaced by secular texts such as legal codes or culture-defining works of art, hermeneutics became the basic method of the social and human sciences. Insofar as natural science was presumed to produce positive or causal knowledge that was self-confirming, hermeneutics was a method distinct from that which is operative in the modern natural sciences. In the philosophies of Martin Heidegger and Hans-Georg Gadamer, interpretation or hermeneutics even became the definitive difference between the human and the scientist.
The fundamental insight of hermeneutic philosophy is that there is no privileged, unquestionable, or certain beginning to thinking or living. Human beings are born into and become conscious of themselves within a context that encompasses them; they learn to understand it and themselves in a repetitive, piecemeal process that moves back and forth from part to whole and whole to part. In the hermeneutics of texts such as the Bible, for instance, early Christian theologians such as St. Augustine argued against any quick and easy interpretation of the meaning of particular words or passages in the Bible. The
parts must be understood in light of the whole and the whole from the parts. It was a circular or, better, a spiral process of developing a progressively more comprehensive and adequate understanding of the text.
The 19th-century German philosopher Wilhelm Dilthey argued that the same process is foundational for the development of historical understanding. Historians work back and forth from the reading of historical documents and descriptions of previous events to the development of an understanding of what life was like at some time in the past - or perhaps in another, foreign culture in the present. To this kind of humanistic understanding, Dilthey contrasted the causal or explanatory knowledge produced by the natural sciences: knowledge of how A causes B, as a result of the peculiarly productive combination of experiment and mathematical model creation found in modern natural science.
Yet insofar as hermeneutics defines the human, not just the humanities - that is, insofar as to be human is to seek understanding of oneself through a hermeneutic engagement with the world - it must also be present in the natural sciences; it ceases to be a method peculiar to the human sciences alone. Since scientists are also human beings, and to be a scientist is just one way of being human, hermeneutics will be present in the sciences. Hermeneutics is universalized; it applies across all disciplines.
During the mid-20th century, philosophers of science began to recognize two senses in which the methods of hermeneutics are relevant to understanding the natural sciences. In one sense, the history of science requires interpretation. As Thomas Kuhn observed in an autobiographical reflection,
What I discovered in studying Aristotle was that a text required interpretation. And by interpretation I mean something similar to what was then quite well known in Europe ... as hermeneutics.. It was a way of reading texts, of looking for things that don't quite fit, puzzling over them, and then suddenly finding a way of sorting out the pieces. (Sigurdsson, 2016, p. 21)
In a second sense, even within science itself, again, as Kuhn recognized, scientists use principles of hermeneutics to find ways of sorting out pieces of experimental data and unite them into theories. Experiments cannot produce knowledge of causal relations that do not depend on interpretations about what counts as a cause or a relationship. An interpretation may be latent and un-thematized in a scientific paradigm of knowledge production, nevertheless, it is there and calls for philosophical articulation.
In the case of Kuhn and science generally, hermeneutics in both senses remains largely concerned with concepts and theories. Late in the 20th century, a new kind of philosopher of science, a science studies philosopher, began to argue that there was also a hermeneutic circle at work in scientific practices. The hermeneutic circle is present in the natural sciences when particular experimental results are interpreted in the light of theories or models and vice versa. But as experimental processes become more and more dependent on increasingly complex instrumentation, the hermeneutics of ideas demands complementation by a hermeneutics of practice. To understand science more fully, we need to interpret relationships between concepts and theories and relationships between
scientific practices and society. One philosopher of science who has focused especially on developing a hermeneutics of practice is Joseph Rouse.
HERMENEUTICS OF PRACTICE
In Knowledge and Power Rouse (1987) charts a transformation in philosophy of science that emerged in the wake of Kuhn and the rejection of logical empiricist accounts that held sway in Anglo-American philosophy until the 1960s. Rouse's account is concerned in the first instance with how the opening up of the laboratory to ethnographic inspection revealed how material practices contributed as much as logical methods to the production of scientific knowledge (e.g., Latour and Woolgar, 1986). The key feature of post-empiricist philosophy is the questioning of any naïve representational theory of knowledge. Rejecting the naïve empiricist belief that scientific methods, when successful, provide direct observational access to and representations of reality, post-empiricism argues that
scientists compare their theoretical representations with other theoretical representations rather than with the observed, uninterpreted world. The history of science is not a story of the gradual accumulation of a storehouse of knowledge about the given world. It tells instead of discontinuous changes in the overall structure of our representations and, with them, of changes in how the world appears to us. This revised picture of science has had some remarkable successes, both in resolving the many embarrassing conceptual difficulties in empiricist philosophy of science and in developing a fruitful dialogue between historians and philosophers of science. (Rouse, 1987, p. 4)
What it has not so well developed in post-empiricist philosophy, however, is an understanding of the technological power of science. As Rouse remarks, quoting Hilary Putnam: "non-realist accounts of science (such as the post-empiricist model...) seem at first glance to make the technical success of science a miracle" (Rouse, 1987, p. 6). Post-empiricist philosophy further tends to undercut the ability of science to, quoting a shibboleth, "speak truth to power" (Marmot, 2017). If scientific knowledge production is influenced by irrational power conditions, then on what basis does it claim to correct or oppose power?
According to Rouse, classical empiricism provides three views of the possible relationship between knowledge and power. First, knowledge can be applied in order to make power more effective. Second, power can be used to inhibit or distort scientific research. (Only later does Rouse note that power can also fund or support scientific research; presumably, if knowledge is being used by power, power will also be interested in supporting its production.) Third, knowledge can be liberating from the repressions of power. In all three cases, however, knowledge and power are conceived of as separate or independent, and power is located primarily in individual agents.
The received view of science-power relations is mistaken, according to Rouse. "It leads us to overlook important ways power is exercised today and to misunderstand both scientific practices and their political effects" (Rouse, 1987, p. 17). There are, for Rouse,
two philosophies of science that open up possibilities for better understanding of powerknowledge relationships: pragmatism and what he calls the "new empiricism." Yet insofar as pragmatism and the new empiricism highlight solely the constructive (or co-constructive and contingent) character of scientific knowledge and the ways power relationships influence epistemic production, it fails to adequately analyze the nature of power. Rouse aims to remedy this deficiency by reintroducing practical hermeneutics.
According to Rouse, the universalization of hermeneutics - that is, the idea that both the natural and the human sciences are hermeneutical - does not do away with a distinction between theoretical and practical hermeneutics.
Theoretical hermeneutics is a theory-dominant philosophy of science. ...[I]t assigns a preeminent role to theories (i.e., a particular sort of semantic structure) within the practice of scientific research. Experiments and observations are significant only within a theoretical context. Theory guides the construction and performance of experiments, supplies the categories within which observations are to be interpreted, and mediates the transmission and application of results of research. Ultimately, theories are the end product of research: the aim of science is to produce better theories.. "Theory" has commonly signified a kind of understanding that is not tied to our practical involvements with the world. (Rouse, 1987, p. 69).
Science is not only the production of propositions interpreted within a theoretical framework; it exists in the patterns that emerge from the interdisciplinary interaction between actors, the instruments, and the objects of scientific research, constructing both the actors and the environment. "Scientific practices, and the extension of their models, practices, and constituents beyond the laboratory, reconfigure the possibilities in terms of which people can intelligibly understand and enact their lives" (Rouse, 1996, pp. 132133). Science today can no longer be interpreted simply as knowledge production but needs to include critical reflection on the practical dimensions of research. Rouse argues for developing accounts of scientific practice as an activity within historical, social, technological, and psychological constraints.
Scientific practices rearrange our surroundings so that novel aspects of the world show themselves and familiar features are manifest in new ways and new guises. They develop and pass on new behaviors and skills (including new patterns of talk), which also require changes in prior patterns of talk, perception, and action to accommodate these novel possibilities. (Rouse, 2015, p. 216)
Practical hermeneutics emphasizes that propositions are not abstract from practice in separate conceptual worlds but are interwoven with actual doing, producing local knowledge in a context or what Rouse calls "microworlds." Local scientific knowledge may lack a unified overarching theory, but it exists in the deployment of concrete exemplars. The expansion of technical control in science does not depend on the development of theoretical explanations of that control, and skills and practices in local, material, and social contexts are important to all explanation.
For Rouse, practical hermeneutics reveals more about the processes by which scientific knowledge is produced and contributes to a more complete understanding of science than theoretical hermeneutics. Work in the history and anthropology of science has shown that theoretical hermeneutics alone inadequately appreciates the extent to which scientific theories are dependent on the practical activities of science.
In a similar manner, Latour and other sociological examinations of laboratory life call attention to the many material and social factors behind and intertwined with scientific propositions. If one assumes that the laboratory, the equipment, and the network of social relations in which research is embedded are all external elements of scientific knowledge production, one will likely misapprehend the richness and complexity of science, a blindness that will extend to the emergence of an alleged fourth paradigm of science.
THEORETICAL VS PRACTICAL HERMENEUTICS IN FOURTH
PARADIGM SCIENCE
Despite significant changes in the methods of scientific research introduced by AI, the hermeneutics of theory will continue to view science as a knowledge system characterized by the relationship between theory, concept, model, and background knowledge, a system that is advanced by new methods and instrumentations. New machines are constructed, and new skills are learned to produce evidence that supports hypotheses. Eventually, this process leads to the construction of new theories (Cornelio et al., 2023). Theory-centric advocates will argue that "hypothesis testing" remains the fundamental method of scientific research under the fourth paradigm. Functionally, machine learning is no different than Galileo's telescope or Leeuwenhoek's microscope; it simply adds another tool to fuel concept formation and theory construction.
However, this view obscures the conditions of AI-generated scientific knowledge and fails to appreciate the extent to which the fourth paradigm cannot be judged by the same criteria as the previous modes. In an extended examination of what she calls "data-centric" biology, Sabina Leonelli (2016) questions the adequacy of this view, confirming the need for practical hermeneutics in this area. Data is not fixed in the logical frame of propositions; data changes with material, social, technological, and institutional attributes. According to Leonelli, scientific knowledge is produced in and through these changes. On the one hand, data-driven knowledge is material and technological. The classification of data is the production of knowledge, and databases integrate standardized data, infrastructure, and processes in practice. Furthermore, data is not simply given but must be selected, tagged, and disseminated. It can also be obstructed or lost. On the other hand, data-driven knowledge is social and institutional. Social institutions are built up and surround material databases. Data "from where?", "for whose use?", and "to what benefit?", are social questions that correspond with epistemic norms. Scientific data is produced in settings of scientific power. These constitutive elements contribute to Leonelli's insistence that we understand Al-enabled knowledge as produced by and embedded in material practices.
Mathematician Weinan E (2022) proposes that Al-enabled science will go
through three phases: a scientist-led conceptualization period, a large-scale infrastructure construction period marked by collaboration between scientists and engineers, and an engineer-led application period. In the course of this development, there will be significant changes in the flow of experimentation and a gradual transformation of "scientific problems" into "computational and engineering problems." Theoretical superiority will be gradually discarded. Regardless, the scientific community envisions the long-term vision as advancing theory and eventually discovering scientific principles. This mismatch shows the scientific community's ambivalence toward a practical hermeneutics of the Al-fueled fourth paradigm for science.
FIVE PRACTICAL HERMNEUTIC ISSUES WITH AI FOR SCIENCE
Artificial intelligence is transforming scientific practices in terms of scientists' skills and the material conditions within which they work. New skills and material conditions influence the development of policies and standards in turn. For general purposes, the practice of data-intensive, fourth paradigm science can be interpreted broadly in terms of five overlapping themes: (1) the development of novel forms of scientific writing and publication, (2) new infrastructures, (3) automated research processes, (4) human-machine hybrid actors, and (5) new policy norms and ethics.
First, the classic process of reporting and disseminating research results - writing a paper, submitting it to a journal, where it undergoes peer review, leading to rejection or author revision before hard copy journal publication circulated by post - has been disappearing for some time. Scientific papers are increasingly multi-authored, with an increasing number of co-authors. With the increasing number of publications and their increasing specialization, peer review has become less rigorous and is often bypassed with digital pre-prints. Digital publication speeds dissemination while internet search engines intensify the information overload rather than manage it. Conference presentations and now Zoom conferencing, webinars, press releases, and podcasts contribute to the dissemination flood. AI promises only to continue such procedural trends.
Other changes are at work in the content of scientific reports. Traditional publication shared propositional results that were, in principle, justifiable or falsifiable, either by empirical or analytic repetition. Claims to empirical justification took the form of empirical data sets created by the researcher and included in or referenced by a paper. This type of publication is now being supplemented by referencing increasingly large and often independently produced data sets that have been mined by researchers using AIs that sometimes even create their own algorithms. Scientific data can even be published directly as a form of knowledge. Scientific conferences and journals increasingly request the submission of relevant datasets, including databases created by others, institutions, or instrumentation independent of human curation. Scientific data dissemination is becoming an independent form of publication.
The direct dissemination of scientific datasets that may or may not have been humanly curated and the use of that data by someone who did not produce it introduce an
additional trust gap into a scientific publication. Referencing independently produced and available datasets is quite different from referencing previous scientific literature or one's own research data. In Latour's (1987) analysis, a scientific text is supported by citations from previous literature, and the more it is cited by later literature, the more reliable it becomes. Constrained by the space requirements of scientific publishing and traditional norms of reporting, data (including graphs, tables, and photographs) - as evidence in support of propositional conclusions - remains at a distance.
Citing others' datasets implies that the AI trains models using others' data. According to Latour's analysis, citation is crucial to scientific arguments, meaning that what is included in a paper needs to support one's point of view as much as possible. But citing other people's data increases the risk that trust in the dataset is far from established, and, for this reason, scientists prefer to use their own data. The publication of datasets breaks this trust even more because it is difficult to have established criteria for evaluating the merits of a dataset, as is the case with papers, and it is even more unknown what knowledge can be found in other people's datasets. These changes call for a new way to create trust based on submission to uniform regulations on the sources, methods, and formats of data.
Additionally, artificial intelligence can now generate its own scientific text. Large Language Model generative AI can already generate text that imitates human writing, but scientific propositions generated in this way are not supported by evidence. This aporia has led several universities and journals to explicitly request that the GPT series not be used for scientific writing. The analysis given by Latour on scientific texts clearly shows that behind the debate on scientific texts is a contest between scientific workers, in Latour's theory, authors and dissenters. Both are identified as individual scientists; that is to say, human beings are the subjects of scientific practice. The addition of artificial intelligence complicates the social relations behind scientific texts. When asked about AI's role in paper writing, the scientists interviewed said that AI can be a writing partner but not a surrogate. In other words, AI becomes a stand-in for a writing partner, like someone who can make suggestions and bring new ideas but who doesn't actually write the final story (Hutson, 2022). Technical work on scientific texts includes considering external opinions, and AI may be a quick and low-risk way to get such opinions. Artificial intelligence can provide a quick new perspective on the writing process and may help authors overcome the immediate compositional obstacles they face. Some also say that AI-assisted writing is like car-assisted driving. While AI will not automatically write the paper, it will greatly reduce the cognitive burden on the writer. Other scientists believe that by writing with AI, the creation of text becomes a collaboration, with the human guiding the AI and the program following directions to write the actual text. The scientist's role is no longer to type but to organize, plan, check, and evaluate.
Second, materiality shapes the way knowledge is produced. From the perspective of theoretical hermeneutics, material factors are external to knowledge production. They do not shake the fundamentals of knowledge generation. However, scientific research is significantly changed by the availability of AI to augment existing practice, especially with infrastructures.
New hardware and new software are the basic norms of new knowledge. A typical example is the field of materials science and engineering, where a 2016 study used machine learning to design new material structures using data previously "failed" (also known as "dark reaction data") (Raccuglia et al., 2016). The materials science community is beginning to actively advocate for a data-driven approach to research, believing that this will change the way materials are discovered and that synergy and intersection around data is the way forward for the field (Pollice et al., 2021). The focus of materials science efforts is beginning to shift toward developing databases that enable scientists to search, mine, and query them, which means that infrastructure becomes a platform for materials discovery. The services that current infrastructures provide to materials discovery platforms are maturing and expanding. The infrastructure for materials data construction indexes over a hundred data sources and runs automated data queries and metadata extraction channels to facilitate automated analysis (Himanen et al., 2019).
In addition to materials science, distributed computing infrastructure in high-energy physics (Klimentov, 2020), diverse databases in biology (Arkin et al., 2018), and raw data capture to complex Earth system applications (Yue et al., 2016) all benefit from this new mode. New infrastructures mean that new space is built, new skills are learned, new process are formed, new social relationships are built, and new knowledge is generated. Generally, equipment is limited in a laboratory; AI-enabled science infrastructures expand the power of the instruments to much broader boundaries. In another sense, it changes the laboratory as well. Next, we will see the differences in auto-lab.
Third, changes in experimental processes imply changes in knowledge. A traditional pillar of practical hermeneutics was the laboratory. Scientists used laboratories to create specific environments to study particular phenomena and produce scientific knowledge. Today, automated laboratories are becoming possible. Materials science, chemistry, and nanoscience are pioneering the application of automated smart labs. Self-driving laboratories are being designed (also true in engineering design). Artificial intelligence learns relevant scientific concepts and learns how to design experiments. Intelligent experimental equipment can integrate experimental and simulation data, handle large, heterogeneous data sets, and provide precise control throughout the experiment. New Automated Intelligence Lab synthesizes different fields and consists of two main components: robotics (hardware that automatically pre-processes, conducts experiments, and measures results) and artificial intelligence (data-driven modeling and analysis of processed data). Automated intelligence labs can autonomously select the experiments to be performed based on the predefined goals of human researchers. The all-round AI-Chemist developed at the University of Science and Technology of China combines automation of mechanical operations with machine learning and computer simulation, which has the ability to perform high-level chemical research.
But Leonelli criticizes the automated lab as not belonging to practical hermeneutics. She thinks that laboratories should be places where tacit knowledge grows, which means that researchers have to physically engage with the materials, processes, and agents in order to gain knowledge of know-how. If labs were automated, then there would be tacit knowledge gained through physical engagement. From a practical hermeneutics point of
view, automation could mean that people are no longer involved. This concern is not unreasonable. However, automated laboratories are still practical in a broader sense.
In fact, the design of experiments by artificial intelligence, the manipulation of experiments by robots, and the control of experimental data all grow on top of the practice of human researchers. The expressed design of the experiment is an important part of the experimental process because it enables collaborators and other scientists to monitor progress throughout. Experimental manipulation and tracking refer to the ways the process is monitored from the beginning. Tracking can easily incorporate artificial intelligence because the process involves classifying, coding, filing, recording identity, locating, and processing. Lastly, AI can control the data to control the phenomena in automated laboratories and intelligent experimental processes. Therefore, the benefits of AI involvement are apparent: automated platforms free scientific workers from repetitive tasks and reinforce isolation, intervention, and control simultaneously. Basically, the Automation Lab does not oppose the hermeneutics of practice but rather supports it. Nevertheless, the recent involvement of large language models (LLMs) in autonomous laboratories has raised concerns about the potential risks to science (Tang et al., 2024). If LLMs are seen as new agents in scientific practice, the nature of practice and related issues such as norms of knowledge, norms of action, scientific community, science and society should be reconsidered.
Fourth, the heterogeneous composition of practitioner networks creates human-machine hybrid actors. Rouse argues that, from the perspective of practical hermeneutics, knowledge is constituted not as a web of beliefs but as a web of practitioners. Practice is not only the actions performed by actors but also the complex interrelationships in which actors are understood. Rouse thinks actors belong to a practice in a strong sense; this means that to understand agents (and their motivations) requires an account of the practice in which they are involved. Furthermore, rooting actors in practice enables practical hermeneutics to distinguish between actors and non-actors. Actors and non-actors, from this perspective, are established in practice and in constant interaction with the world. The involvement of AI in the practice of science is different from the involvement of people or objects, so there needs to be more thought devoted to the nature of their agency. Some scientists are already confused about the place of AI in their research teams and wonder if it should be seen as an agent in automated laboratories and scientific publication and communication, reflecting the heterogeneous composition of actors in scientific practice, i.e., mixed human-computer actors.
Latour emphasizes the importance of relationships in practice where the object is the actor as a participant, a tack that can begin to explain AI's role in scientific knowledge production. Artificial intelligence cannot, for the moment, be an actor in the same reciprocal scientific practice as humans, nor can it manipulate and control humans in order to gain scientific knowledge. However, what Latour points out is that the object or technology plays a mediating or intermediary role in the practical activity. Similar arguments can be found in postphenomenological mediation theory (Rosenberger and Verbeek, 2015). Inevitably, scientists must deal with the infrastructure that generates the data, the algorithmic platforms that process it, the laboratories that run it automatically,
the big models that generate the paper, and construct multiple and complex social relationships.
Finally, the fifth theme involves the discussion of AI as agent in ethical and legal spheres. One touchpoint in this conversation is that AI's ability to mimic some human functions indicates that it has a different role and status from other technological objects. But the issue extends beyond imitation to interdependence. In scientific practice, AI is not only able to imitate functions, but, more importantly, to realize data processing and other "cognitive" tasks beyond human comprehension. In other words, AI can replace some of the functions of scientists, such as designing experiments or reading literature. Still, scientists cannot replace some of the functions of AI, such as the processing of petabytes of data. For instance, AlphaFold2's prediction of the three-dimensional structure of proteins is based on 350,000 known protein structures and more than 200 million unknown protein structures. Thus, we could go so far as to say that human scientists and AI are linked as hybrid (heterogeneous) actors (or relational complexes, as Rouse calls them), working together on new scientific practices.
Here, there emerge new ethical issues and challenges because scientific practices are always interconnected and fundamentally influence the development of social practices. Rouse argues that norms are naturally formed in practice and that norms are reinforced while practices become comprehensible; this is also true within Latour's network of actors. The involvement of artificial intelligence in other scientific research has also generated intellectual and ethical normative issues in the field of practice, the boundaries between which are not entirely clear. For our purposes, we will focus on the ethical dimension of normativity.
Scientific data, like other data, face common privacy and security issues that concern questions of autonomy and responsibility. The paradigmatic examples of these are geospatial data and health data. The ethical checks given by the UK Statistics Authority (2021) for geospatial data include 16 aspects, including do no harm, transparency, confidentiality, and avoidance of bias; it also lists a series of ethical considerations for research and statistics: general ethical principles, potential for bias, interpretability, accountability, and confidentiality. These ethical considerations apply especially to specific geospatial data such as retrospective unique remote sensing data. In contrast, the ethical issues raised by data in the health domain have received more attention, focusing on privacy, confidentiality, informed consent, equity, justice, trust, and data ownership (Viberg et al., 2022), and suggesting various approaches and governance tools (Maseme, 2022).
The ethics of scientific data has generally been discussed within the debate about "open data," and there are additional concerns that AI-driven science brings to the fore. Open data requires breaking down geographical, disciplinary, and institutional barriers, and scientific data and AI-driven scientific research tend to be shared across time, space, disciplines, and organizations. Currently, open scientific data is guided by the FAIR principles that dictate data should be "Findable, Accessible, Interoperable, and Reusable" (Wilkinson et al., 2016). Beyond this, there is consensus that countries have an important responsibility to use policies to facilitate the flow of information at all levels and develop widespread data access. In particular, the European Union and the United States have
achieved a certain degree of open access to data and have developed a set of public policies and principles.
Unfortunately, FAIR principles cannot solve the unequal problem in scientific data practice, and the risks of data openness between countries cannot be ignored. Indigenous data is a typical example. CARE principles - "Collective Benefit, Authority to Control, Responsibility, and Ethics" (Carroll et al., 2020) - were developed in the whole data life cycle to protect disadvantaged groups, and they focus on dividing power and maximizing the benefits of data-driven science. CARE principles indicate how deeply knowledge generation is imbricated in the social and ethical values of science practice.
Scientific data also faces the conflict between science and business. When it comes to trading personal data between data analyzing entities, the value of data as a commercial commodity - including the speed and efficiency with which assessing or accessing certain data can help develop new products - often takes precedence over science. This can lead to considerations at the scientific level, decisions that raise questions, consequences of the assumptions made, and processes used in an investigation that are not readily appreciated. This focus on business can easily translate into a materialization of discrimination, inequality, and potential errors in the data considered (Srnicek, 2017).
CONCLUSION
Fourth paradigm science involving AI has been promoted as another method for knowledge production, continuing the historical development from observational description of empirical phenomena, to mathematical theory modeling, to computational simulation. AI-propelled science has been celebrated for its potential to both enhance the speed of knowledge production and extend its reach. But in the AI for science vision, machine learning, deep learning, statistical methods, data analytics, automated control, and related areas are imagined primarily if not exclusively in terms of the advancement of scientific research. By contrast, Joseph Rouse and others would argue that science is never adequately understood in terms of theoretical hermeneutics alone: science is also material practices that interface with society. This lacuna calls for a hermeneutics of practice to complement that of theory. Consideration of practical hermeneutics points toward the need for a political philosophy of fourth paradigm science that engages the challenges posed by new forms of scientific writing and publication, new infrastructures, the creation of new scientific infrastructures, new human-machine hybrid actors, and the need for new policy norms and ethics.
REFERENCES
Arkin, A. P., Cottingham, R. W., Henry, C. S., Harris, N. L., Stevens, R. L., Maslov, S., Dehal, P., Ware, D., Perez, F., Canon, S., Sneddon, M. W., Henderson, M. L., Riehl, W. J., Murphy-Olson, D., Chan, S. Y., Kamimura, R. T., Kumari, S., Drake, M. M., Brettin, T. S., ... Yu, D. (2018). KBase: The United States Department of Energy Systems Biology Knowledgebase. Nature Biotechnology, 36(7), 566-569. https://doi.org/10.1038/nbt.4163 Bell, G., Hey, T., & Szalay, A. (2009). Beyond the Deluge. Science, 323(5919), 1297-
1298. https://doi.org/10.1126/science.1170411 Carroll, S. R., Garba, I., Figueroa-Rodriguez, O. L., Holbrook, J., Lovett, R., Materechera, S., Parsons, M., Raseroka, K., Rodriguez-Lonebear, D., Rowe, R., Sara, R., Walker, J. D., Anderson, J., & Hudson, M. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19, 43. https://doi.org/10.5334/dsj-2020-043 Cornelio, C., Dash, S., Austel, V., Josephson, T. R., Goncalves, J., Clarkson, K. L., Megiddo, N., El Khadir, B., & Horesh, L. (2023). Combining Data and Theory for Derivable Scientific Discovery with AI-Descartes. Nature Communications, 14(1), 1777. https://doi.org/10.1038/s41467-023-37236-y E, W.(2022) Artificial Intelligence for Science: A Global Outlook 2022 edition. E-report. Hey, T., Tansley, S. & Tolle, K. (Eds.). (2009) The Fourth Paradigm: Data-Intensive
Scientific Discovery. Microsoft Research. Hill, C. A. (2020). Moving Social Science into the Fourth Paradigm: The Data Life Cycle. In C. A. Hill, P. P. Biemer, T. D. Buskirk, L. Japec, A. Kirchner, S. Kolenikov, & L. E. Lyberg (Eds.), Big Data Meets Survey Science (pp. 713-731). Wiley. https://doi.org/10.1002/9781118976357.ch24 Himanen, L., Geurts, A., Foster, A. S., & Rinke, P. (2019). Data-Driven Materials Science: Status, Challenges, and Perspectives, Advanced Science, 6(21), 1900808. https://doi.org/10.1002/advs.201900808 Hutson, M. (2022). Could AI Help You to Write Your Next Paper? Nature, 611, 192-193.
https://doi.org/10.1038/d41586-022-03479-w Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Zidek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., ... Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2
Klimentov, A. A. (2020). Exascale Data Processing in Heterogeneous Distributed Computing Infrastructure for Applications in High Energy Physics. Physics of Particles and Nuclei, 51, 995-1068. https://doi.org/10.1134/S1063779620060052 Latour, B. (1987). Science in Action: How to Follow Scientists and Engineers through
Society. Harvard University Press. Latour, B., & Woolgar, S. (1986) Laboratory Life: The Construction of Scientific Facts.
Princeton University Press. Leonelli, S. (2016). Data Centric Biology: A Philosophical Study. Chicago: University of Chicago Press.
Maseme, M. (2022). Ethical Considerations for Health Research Data Governance. In B. S. Kumar (Ed.), Data Integrity and Data Governance. IntechOpen. https://doi .org/10.5772/intechopen.106940 Montâns, F. J., Chinesta, F., Gomez-Bombarelli, R., & Kutz, J. N. (2019). Data-driven modeling and learning in Science and Engineering. Comptes Rendus Mécanique, 347, 845-855. https://doi.org/10.1016/j.crme.2019.11.009 Marmot, M. (2017). Galileo -Speaking Truth to Power, The Lancet, 359(10086), 2277-
2278. https://doi.org/10.1016/S0140-6736(17)31497-6 Noy, S., & Zhang, W. (2023). Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. Science 381(6654), 187-192. https://doi.org/10.1126/science.adh2586 Pollice, R., Dos Passos Gomes, G., Aldeghi, M., Hickman, R. J., Krenn, M., Lavigne, C., Lindner-D'Addario, M., Nigam, A., Ser, C. T., Yao, Z., & Aspuru-Guzik, A. (2021). Data-Driven Strategies for Accelerated Materials Design. Accounts of Chemical Research, 54(4), 849-860. https://doi.org/10.1021/acs.accounts.0c00785 Raccuglia, P., Elbert, K. C., Adler, P. D. F., Falk, C., Wenny, M. B., Mollo, A., Zeller, M., Friedler, S. A., Schrier, J. & Norquist, A. J. (2016). Machine-learning-assisted Materials Discovery Using Failed Experiments. Nature, 533, 73-76. https://doi.org/10.1038/nature 17439 Rosenberger, R., & Verbeek, P.-P. (Eds.). (2015). Postphenomenological Investigations:
Essays on Human-Technology Relations. Lexington Books. Rouse, J. (1987). Knowledge and Power: Toward a Political Philosophy of Science.
Cornell University Press. Rouse, J. (1996). Engaging Science: How to Understand Its Practices Philosophically.
Cornell University Press. Rouse, J. (2015). Articulating the World: Conceptual Understanding and Scientific Image.
University of Chicago Press. Sigurdsson, S. (2016). The Nature of Scientific Knowledge: An Interview with Thomas S. Kuhn. In A. Blum, K. Gavroglu, C. Joas, and J. Renn (Eds.), Shifting Paradigms: Thomas S. Kuhn and the History of Science (pp. 17-30). Max Planck Institute for the History of Science. Srnicek, N. (2017). Platform Capitalism. Polity Press.
Stevens, R., Nichols, J., & Yelick, K. (Eds.). (2020). AI for Science. Argonne National Laboratory.
Tang, X., Jin, Q., Zhu, K., Yuan, T., Zhang, Y., Zhou, W., Qu, M., Zhao, Y., Tang, J., Zhang, Z., Cohan, A., Lu, Z., and Gerstein, M. (2024). Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science. arXiv:2402.04247. https://doi.org/10.48550/arXiv.2402.04247 UK Statistics Authority. (2021). Ethical Considerations in the Use of Geospatial Data for Research and Statistics.
https://uksa.statisticsauthority.gov.uk/publication/ethical-considerations-in-the-use-of-geospatial-data-for-research-and-statistics/ Viberg J., Bentzen, H.B. & Mascalzoni, D. (2022). What Ethical Approaches Are Used by Scientists When Sharing Health Data? An Interview Study. BMC Medical Ethics, 23, 41. https://doi .org/10.1186/s12910-022-00779-8 Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., Da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., ... Mons, B. (2016). The FAIR Guiding Principles for Scientific Data Management and Stewardship. Scientific Data, 3(1), 160018. https://doi .org/10.1038/sdata.2016.18
Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu, C-W., Qiu, J., Hua, K., Su, W., Wu, J., Xu, H., Han, Y., Fu, C., Yin, Z., Liu, M., ... Zhang, J. (2021). Artificial Intelligence: A Powerful Paradigm for Scientific Research. The Innovation, 2(4), 100179.
https://doi.org/10.1016/j.xinn.2021.100179 Yue, P., Ramachandran, R., Baumann, P., Khalsa, S. J. S., Deng, M. & Jiang, L. (2016). Recent Activities in Earth Data Science [Technical Committees]. IEEE Geoscience and Remote Sensing Magazine, 4(4), 84-89.
https://doi.org/10.1109/MGRS.2016.2600528 Zhu, Q., Zhang, F., Huang, Y., Xiao, H., Zhao, L., Zhang, X., Song, T., Tang, X., Li, X., He, G., Chong, B., Zhou, J., Zhang, Y., Zhang, B., Cao, J., Luo, M., Wang, S., Ye, G., Zhang, W., ... Luo, Y. (2022). An all-round AI-Chemist with a Scientific Mind. National Science Review, 9(10), nwac190. https://doi.org/10.1093/nsr/nwac190
СВЕДЕНИЯ ОБ АВТОРАХ / THE AUTHORS
Тяньтянь Лю, liutt20@fudan.edu.cn, Карл Митчэм, cmitcham@mines.edu ORCID 0000-0003-4199-5940
Tiantian Liu, liutt20@fudan.edu.cn, Carl Mitcham, cmitcham@mines.edu ORCID 0000-0003-4199-5940
Статья поступила 5 января 2024
одобрена после рецензирования 18 февраля 2024
принята к публикации 28 февраля 2024
Received: 5 January 2024 Revised: 18 February 2024 Accepted: 28 February 2024