Memento ludi: information retrieval from a game-theoretic perspective

Parfionov George; Zapatrin Romàn

George Parfionov1 and Roman Zapatrin2

1 Friedmann Laboratory For Theoretical Physics, Department of Mathematics,

SPb EF University, Griboyedova 30-32,

191023 St.Petersburg, Russia 2 Department of Information Science,

The State Russian Museum, Inzenernaya 4, 191186, St.Petersburg, Russia E-mail: [email protected]

Abstract. We develop a macro-model of information retrieval process using Game Theory as a mathematical theory of conflicts. We represent the participants of the Information Retrieval process as a game of two abstract players. The first player is the ‘intellectual crowd’ of users of search engines, the second is a community of information retrieval systems. In order to apply Game Theory, we treat search log data as Nash equilibrium strategies and solve the inverse problem of finding appropriate payoff functions. For that, we suggest a particular model, which we call Alpha model. Within this model, we suggest a method, called shifting, which makes it possible to partially control the behavior of massive users.

The paper is asddressed to researchers in both game theory (providing a new class of real life problems) and information retrieval, for whom we present new techniques to control the IR environment.

Introduction

The techniques we present are inspired by the success of macro-approach in both natural and social science. In thermodynamics, starting from a chaotic motion of billions of billions of microparticles, we arrive a simple transparent strongly predictive theory with few macro-variables, such as temperature, pressure, and so on. In models of market behavior the chaotic motion is present as well, but there are two definite parties, each consisting of a big number of individuals with common interests, whose behavior is not concorded.

From a global perspective, information retrieval looks similar: there are many individual seekers of knowledge, on one side, and a number of knowledge providers, on the other: each are both chaotic and non-concorded. There are two definite parties, whose members have similar interests, and every member of each party tends to maximally fulfill his own interests. How could a Mathematician help them? At first sight, each party could be suggested to solve a profit maximization problem. But back in 1928 it was J. von Neumann who realized this approach to be inadequate: you can not maximize the value you do not know (Von Neumann, 1928). In fact, the profit gained by each agent depends not only on its actions, but also on the activities

* This work was supported by the Russian basic Research Foundation under grant No.07-06-00119.

of its counterpart, which are not known. Then the game theory was developed replacing the notion of optimality by that of acceptability. Similarly, the crucial point of information retrieval, in contrast to data retrieval, is to get some satisfaction (feeling of relevance) rather than retrieve something exact. The analogy

Data Retrieval — —matching Optimization — —maximum

Information Retrieval — —relevance Game Theory — —equilibrium

was a starting point for us to explore applications of game theory to the problems of information retrieval.

The standard problem of game theory is seeking for reasonable (in various senses) strategies. When the rules of the game are given, there is a vast machinery, which makes it possible to calculate such strategies. In information retrieval we have two parties whose interaction is of exactly game nature, but the rules of this game are not explicitly formulated. However, we may observe the consequence of these rules as users behavior, that is, we deal with the inverse problem of game theory, studied by Dragan (Dragan, 1966) for cooperative games. In this paper we expand it to noncooperative case.It turns out that the solution of the inverse problem is essentially non-unique: different rules can produce the same behavior. We suggest a particular class of models, called Alpha models describing an idealized search system similar to Wolfram Alpha engine.

What can search engine managers benefit of these techniques? Game theory can work out definite recommendations how to control the interaction between the parties of the information retrieval process. This sounds unrealistic: can one control massive chaotic behavior? Thermodynamics shows us that the answer is yes. We can not control individual molecules, but in order to alter their collective behavior we are able to change macroparameters: the engine of your car reminds it to you. In our case the payoff functions of the Alpha model are just those parameters.

In Section 1. we introduce (only the necessary) basic notion from game theory, in Section 2. we formulate the information retrieval process in terms of game theory and formulate our method as the inverse problem in game theory. In Section 3. we suggest its particular solution, which we call Alpha model as it resembles Wolfram Alpha engine and in Section 4. we suggest a method to control massive users’ behavior.

1. Direct problem: classical game theory

Game theory is a mathematical theory studying conflicts and trade-offs. It involves rational participants who follow formal rules. A game is specified by its players, players’ strategies and players’ payoffs. Begin with a well-known example (a reformulated Prisonners’ dilemma (Tucker, 1950)).

There are two players A and B. The player A can choose color: Red or Green, while B chooses direction: Left or Right. The rules of the game are specified by the following pair of payoff matrices (Table 1)

The Mathematician can predict the outcome of this game provided the players are rational, namely, wishing to gain more: the rational player A will necessarily choose Red and B will choose Left .

However, both players know the payoff matrices, so, being rational, why can’t they agree for A to choose Green and for B to choose Right ? The point is that

Left Right

Red 10 25

Green 5 20

Left Right

Red 11 4

Green 23 17

The gain of A The gain of B

Table 1. A game with domination, defined by its pair of payoff matrices having the following meaning: if A chooses Green and B chooses Right, A gains 20 and B gains 17, and so on.

they are acting independently, which exclude any agreement. This kind of games are called non-cooperative and this is the case for the IR community.

The peculiarity of the above mentioned example is that it has a unique (and therefore straightforward) solution. However, such kind of examples does not describe the generic situation. Now let us consider a more general example (Table 2).

Left Right

Red 10 20

Green 5 25

Left Right

Red 11 4

Green 17 23

The gain of A The gain of B

Table 2. A non-dominating case: two Nash equilibria.

First note that no player has a dominating strategy here, so the outcome of the game is at first glance unpredictable. However the Mathematician predicts us the outcome of this game as well. First, we see that both (Green, Left) and (Red, Right) will not1 be realized by rational players. One of the following two pairs (just according to the maritime Rules of the Road) will necessary occur: (Red , Left )

or (Green, Right ). Why so? The motivation for a rational player to be abide of

certain strategy is that leaving it unilaterally reduces his gain:

( Ha (Red, Left) ^ Ha (a, Left) ^

\ Hb (Red, Left) > Hb (Left ,3) (1)

where Ha (a, 3) (Hb ((a, 3), resp.) is the gain of A (B, resp.) when A chooses strategy a and B chooses 3. The relations (1) are the famous Nash inequalities. A pair of strategies is said to form the Nash equilibrium, if they satisfy these inequalities. In the above example the pair of strategies (Red ,Left) is Nash equilibrium, but so is the pair (Green,Right) as well! So, what will be the Mathematician’s prediction for the outcome of this game? He will point out what will not occur and what will take place stably.

Now let us pass to the next example (Table 3), which is generic.

We see that there is no equilibrium pairs of strategies in this game, that is, if the players are represented by individuals, the outcome of an instance of the game

1 How it works: suppose A chooses Green , observes that he gains only 5 and then switches to Red , which brings him 10.

Left Right

Red 10 20

Green 5 25

Left Right

Red 4 11

Green 23 17

The gain of A The gain of B

Table 3. No Nash equilibria.

can not be predicted. What can the Mathematician tell us now? He will suggest to consider players represented by communities. A choice of the strategy by the collective player A is described by the distribution of the individuals with respect to the strategies they choose:

p (pRed , pGreen )

q (qLeft , qRight )

(2)

The gain of the collective players with respect to the chosen pair of strategies is the average:

f нA(p, q) = E ajkPj qk \Hb (p, q) = E bjkPj qk

where [ajk], [bjk] are the payoff matrices for the players A and B, respectively.

(3)

The prediction of the outcome of the game is now a pair of distributions (p*, q*) obtained from the same Nash inequalities (3), but referred now to averages.

Ha(p*, q*) > Ha(p, q*) Hb(p*, q*) > Hb(p*, q)

(4)

The fundamental result of game theory is Nash theorem (Owen, 1995), which states that the equilibrium in the sense of (4) always exist. Moreover, when the number of players is two, the answer can be written explicitly:

pi = bll+tl-lT2-b2l ; p^

Cl22—bl2

q1

«11 +«22—^12 — «21

; q2

1 - pi

1 - qi

(5)

Note that the behavior of the player A is completely determined only by the payoff matrix of the player B and vice versa.

2. Crowd Meets Crowd — Inverse Problem

In this section we describe our IR macromodel as a non-antagonistic conflict of two parties, or, other words, a cooperative game of two players. The first player, call it A, asks questions, the second, call it B, provides answers. The player A stands for the community of users (intellectual crowd) of IR systems, the player B stand for the community of providers of search results (which is symmetrically treated as intellectual crowd).

Each particular strategy aj of the player A is just typing something in a search-box. Each particular strategy f3k of the player B is to return a page with an answer, which, viewed as, say, HTML code, is a string of symbols as well. An instance of the game is a pair

aj 3k = (input-string,returned-string)

which is somehow evaluated by each participant. For example, the payoff value HA(a.j j3k) for the player A for the pair

aj3k = ('accommodation’,'No results found')

is evidently low. In the meantime we do not dare to ascribe any payoff value Hb(aj3k) of this instance for the player B (we do not know providers’ priorities). In more general situations even the evaluations of the player A is not known as well.

However, numerical payoff values are needed in order to apply game theory: its basic concept — of Nash equilibrium — is based on comparison of instances (4). As a matter of fact, the participants of IR process do compare instances, but they do it qualitatively. But the Mathematician needs numbers! What data should he proceed in order to get them?

Stability and equilibrium. In a sense this week’s World Wide Web is the same as it was a week ago, whatever be the variety of different queries and answers. What is stable in time is the statistics of instances aj 3k: things frequently asked yesterday repeat today. The Mathematician tells us that from a game-theoretic perspective this stability is not surprising: these are Nash equilibria which are stable, because leaving them is unfavorable.

If we had known the payoff functions, we could find the Nash equilibrium. But in our situation we know the equilibrium (statistics of instances) and we have to find the appropriate payoff functions HA(a.j3k), Hb (aj3k) in (3). This is the inverse problem in game theory (Dragan, 1966). The inverse problem has multiple solutions: for given frequencies there are many different payoff matrices yielding the same equilibrium2 . Below, we introduce a specific model, called Alpha model with the smallest number of free parameters.

3. Alpha model

The raw material for us will be a collection of search strings with appropriate frequencies and a collection of returned results with appropriate frequencies as well. According to our model, we interpret it as realized equilibrium. Now we are about to reconstruct the payoff functions. First, according to the remark made above, we assume that the number of different strategies for both players is the same. If not, we may reach it by appropriate prepocessing of data, identifying some data strings.

Note that, given a pair of strategies (p, q), there are (infinitely) many different payoff functions, for which this pair of strategies is equilibrium. Among all such models, we consider the simplest one, closest to data retrieval. For this model, the payoff matrices are diagonal:

A

( ai 0 ... 0 \

0 a2 ... 0

B

(b1 0 . 0 b2 .

\ 0 0 ... an )

where a1,... ,an; b1,...,bn are positive numbers.

0 0

0 0 .

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

(6)

bn )

2 A trivial example of such non-uniqueness is multiplying the payoff matrix by a positive number.

This feature of this model is that the only valuable answer for question a.j is 3j with the same index j, other answers 3k for k = j are of zero value. This looks like Wolfram Alpha search engine, which provides the only answer to a query, that is why we call our model Alpha.

The Nash equilibrium for the game is given by:

Pj = h-1 ’ qk = -1 S'* 4- -1 (7)

b i + ••• + bn a i + • • • + an

We can check this directly checking Nash inequalities (4). It is sufficient (Owen, 1995)

to check it only for pure strategies

< (8) Recall that we have the inverse problem, that is, we know (p, q). Its solution is

ab a.j = — ; bk = — (9)

qj Pk

for any fixed positive numbers a, b. The obtained result shows us that:

— The less frequent is a instance, the higher is its value.

— The value of a question is determined by the frequency of the reply, and vice

versa, the value of a reply is determined by the frequency of the question.

The first statement means that within this model frequently asked questions have low value for the provider B, and, vice versa, rarely delivered answers are of high value for the user A.

The magic of Nash theory is captured in the second statement. It means that the behavior of player A is completely determined only by the payoff matrix of player B. In other words, the popularity (=frequency) of users’ questions depends on priorities of the answering side rather than on their own priorities.

4. Shifting of users’ behavior

So far, we have suggested a quantitative model of IR process. The aim of this model is not just to describe, but also to give some means of control to the overall process. There are two parties involved, each having its own interests. Let us consider what could the provider B do in order to increase its gain.

At first sight, the strategy q should be changed, but the power of Nash theory is that the answer is immediate: it does not make sense, any unilateral deviation from the equilibrium is unfavorable for B. The player B can not directly, by ordering, control the strategy p of player A, nor its payoff matrix. So, the only thing B can do is to change its own interests: what remains under control of B, is its own payoff matrix. How it works?

A simple suggestion is to multiply all the elements of B by, say, 1957. This suggestion does not affect, as it follows from (7), the strategy of player A: it is similar to recalculating your wealth from euro to Italian liras: you may feel happy, but your wealth will not grow. So far, we have to accept a normalization condition

for the bonuses bk of B in order to make them scale-invariant. Let us suppose their total amount B to be fixed:

bk = B = const (10)

k

As it was shown in previous section, the strategy of A depends only on the payoffs of B. Hence, changing the matrix B will affect the behavior of its counterpart A. Furthermore, the statistics of instances will change and, therefore, the average gain

of B will change. Let us first calculate how the average gain HB of B depends on

the parameters of its payoff matrix (6):

H B (p, q) = ^ bjPj qj (11)

j

For any strategies pj, qj. Within our model we know, however, that in equilibrium Pj = ■£- (9), therefore the optimal average gain is:

hB p q) = ^ bqj =b (12)

j

The value of the multiple b can now be derived from (9) and the condition E Pk = 1, therefore the optimal gain of the player B reads:

HB = IEb-' I (13)

1

Now let us explore how the optimal gain HB changes under small variations dbk of the parameters of the Alpha model. It follows from the normalization condition (10) that

=0 (14)

and calculate the gradient of the optimal gain HB :

^ = - -(-sH-iH (15)

The variations 5bk are obtained from the gradient VkHB by requiring the conditions (14) to be satisfied:

Sbk =P2k ~ ~J2p2j (16)

n zJ j

which is unnormalized Yule’s characteristic (Yule, 1944), reflecting the diversity of the variety of queries.

The shifting. Now suppose we are in a position to make small changes, of the magnitude e, of the payoff function of the Alpha Provider. How should we apply them in order to make the gain of B maximally increase? The answer is given by the formula (16), according to which the Alpha Provider has to do the following:

— Find out the relative frequencies pk of users queries ak.

— Calculate the average of their squares w = - E p\

— Slightly re-evaluate the instances placing more bonuses on queries, whose frequencies are above the threshold value w, taking them from rarely asked questions, whose frequencies are below w.

As a result, the equilibrium will shift, the frequencies of users’ requests will adjust accordingly and the Alpha Provider will increase his gain, as it follows from (11) by

5b = eJ25bj qj (17)

j

Conclusions

So far, we have described the process of Information retrieval as a non-antagonistic conflict between two parties: Users and providers. The mathematical model of such conflict is a bimatrix cooperative game. Starting from the assumption that de facto search log statistics is the Nash equilibrium of certain game, we provide a method of calculating the parameters (9) of this game, thus solving the appropriate inverse problem.

A significant, somewhat counter-intuitive consequence of Nash theory is that in this class of games the equilibrium, i. e. stable, behavior of the User is completely determined only by the distribution of priorities of the Provider. From this, we infer suggestions for the provider how to affect the behavior of massive User.

References

John von Neumann (1928). Zur Theorie der Gesellschaftsspiele (Theory of parlor games), Math. Ann. 100 295

Irinel Dragan (1966). Values, potentials and inverse problems in cooperative Game Theory.

European Journal of Operational Research, 95, 451-463 Tucker, Albert (1950). A two-person dilemma: Stanford University Press Owen, G. (1995). Game Theory: Academic Press, UK

Yule, G. (1944). The Statistical Study of Literary Vocabulary: Cambridge University Press

Memento ludi: information retrieval from a game-theoretic perspective Текст научной статьи по специальности «Математика»

Аннотация научной статьи по математике, автор научной работы — Parfionov George, Zapatrin Romàn

Похожие темы научных работ по математике , автор научной работы — Parfionov George, Zapatrin Romàn

Текст научной работы на тему «Memento ludi: information retrieval from a game-theoretic perspective»