Научная статья на тему 'Why under stress positive reinforcement is~more effective? Why optimists study better? Why people become restless? Simple utility-based explanations'

Why under stress positive reinforcement is~more effective? Why optimists study better? Why people become restless? Simple utility-based explanations Текст научной статьи по специальности «Компьютерные и информационные науки»

CC BY
144
14
i Надоели баннеры? Вы всегда можете отключить рекламу.
Ключевые слова
UTILITY THEORY / POSITIVE VS. NEGATIVE REINFORCEMENT / OPTIMISTS VS.PESSIMISTS

Аннотация научной статьи по компьютерным и информационным наукам, автор научной работы — Zapata F., Kosheleva O., Kreinovich V.

In this paper, we use the utility-based approach to decision making to provide simple answers to the following three questions: Why under stress positive reinforcement is more effective? Why optimists study better? Why people become restless?

i Надоели баннеры? Вы всегда можете отключить рекламу.
iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.
i Надоели баннеры? Вы всегда можете отключить рекламу.

Текст научной работы на тему «Why under stress positive reinforcement is~more effective? Why optimists study better? Why people become restless? Simple utility-based explanations»

Mathematical Structures and Modeling 2018. N. 2(46). PP. 66-72

UDC 519.81:159.923 DOI: 10.25513/2222-8772.2018.2.66-72

WHY UNDER STRESS POSITIVE REINFORCEMENT IS MORE EFFECTIVE? WHY OPTIMISTS STUDY BETTER? WHY PEOPLE BECOME RESTLESS? SIMPLE UTILITY-BASED EXPLANATIONS

Francisco Zapata

Ph.D. (Phys.-Math.), Instructor, e-mail: [email protected] Olga Kosheleva Ph.D. (Phys.-Math.), Associate Professor, e-mail: [email protected]

Vladik Kreinovich Ph.D. (Phys.-Math.), Professor, e-mail: [email protected]

University of Texas at El Paso, El Paso, Texas 79968, USA

Abstract. In this paper, we use the utility-based approach to decision making to provide simple answers to the following three questions: Why under stress positive reinforcement is more effective? Why optimists study better? Why people become restless?

Keywords: utility theory, positive vs. negative reinforcement, optimists vs. pessimists.

1. Why Under Stress Positive Reinforcement Is More Effective?

Phenomenon. To encourage a person to do something, we can use both positive reinforcement — when we reward a person for doing this, and negative reinforcement — when we penalize a person for not doing the task. Both approaches have their strengths and limitations.

It has been observed that in stress situations, when a person's mood is negative, the relative strength of positive reinforcement increases; see, e.g., [11,21]. Why?

Let us formulate this problem in precise terms. In traditional decision theory, human preferences are described by utilities; see, e.g., see, e.g., [2,3,14,17,19]. A utility is defined as follows: we select a very good situation Ai and a very bad situation A0 and then compare each situation A with the lottery L(p) in which:

• we get Ai with probability p and

• we get A0 with the remaining probability 1 — p.

For small p, L(p) is close to A0 and is, thus, much worse than A: L(p) < A. For p close to 1, L(p) is close to Ai and is, thus, much better than A: A < L(p) There is therefore a threshold value p0 such that:

• for p > p0, we have A < L(p), while

• for p < p0, we have L(p) < A.

This threshold value — for which A is (in this sense) equivalent to A(p0) - is called the utility u(a) of the alternative A.

If p < p', then, of course, L(p') is better than L(p). Thus, among several alternatives, we should select a one for which the utility u(a) is the largest.

It is known that the utility of monetary rewards or losses is approximately proportional to the square root of the amount m of money:

• u(m) = a+ ■ y/m for m ^ 0 and

• u(m) = —a- ■ л/lml for m < 0,

for some values a+ and a-; see, e.g., [6,12,13].

We can measure the relative strength of positive and negative reinforcement by comparing the changes in utility if we add or subtract a certain amount of money m.

If we start with a neutral situation, in which we have no money, then the original utility value is 0. Then, after adding the amount m we get the utility a+ ■ /m, while after subtracting amount m, we lose the utility amount a- ■ /т. In this case, the ration of positive-to-negative reinforcement effects is

a+ ■y/m = a+

I— • (1)

a- ■ y/m a-

What if we start with a stressful situation, in which the initial amount of money is small but negative: — m0 < 0? In this case, the initial value of the utility is a- ■ /m0. After adding m, we get m — m0, with the utility a+ ■ /m — m0. Thus, the utility gain is a+ ■ /m — m0 + a- ■ /m0.

If we subtract the money amount m, then we end up with the negative amount — (m + mo), whose utility is —a- ■ /m + m0. Thus, the loss of utility is the difference a- ■ yjm + m0 — a- ■ /т0.

Thus, the ratio describing the relative strength of possible reinforcement takes the form

a+ ■ yjm — m0 + a- ■ /m0

a- ■ yjm + m0 — a- ■ /m0

(2)

Our explanation. We will show that the ratio (2) is larger than the ratio (1). This explains the empirical fact that under stress, positive reinforcement is more efficient.

Indeed, for small m0, by taking the first two terms of the corresponding Taylor series, we get

y/m — m0 = y/m — .— ■ m0 + o(m0).

m

For small m0, we have /mO > m0, thus in the first approximation, we can ignore the terms proportional to m0 and only consider terms proportional to /m0. So, the numerator of the ratio (2) takes the form

a+ ■ y/m—Too + a- ■ y/mO ~ a+ - y/m + a- ■y/m/,.

Similarly, we have

л/т + mo = Vm + —'^ ■ mQ + o(m0)

'm

and thus, in the first approximation, the denominator of the formula (2) takes the form

a- ■ \Jm + m0 — a- ■ yfm0 ~ a- ■ \fm — a- ■ y/m0. Thus, in the first approximation, the ratio (2) has the form

a+ ■ y/m + a- ■ /т0 a- ■ yfm — a- ■ /mQ

We can see that, in comparison to the ratio (1), we increased the numerator and decreased the denominator — as a result, the ratio increases. This is exactly what we wanted to explain.

Auxiliary analysis: beyond explanation. A natural question is: what if instead of considering stress, we consider euphoria, i.e., we consider situations in which we have a positive initial amount of money m0. How will this affect the relative strength of positive and negative reinforcements?

In this case, we start with the utility a+ ■ /mQ. When we add the amount m, we get the utility a+ ■ / m + m0, so the increase in utility is equal to

a+ ■ \J m + m0 — a+ ■ y/mQ.

Vice versa, if we take away the amount m, we get the new utility —a- ■ /m — m0, so the loss in utility is equal to

a- ■ \Jm — m0 + a+ ■ y/m0.

In this situation, the ratio describing the relative strength of positive and negative reinforcements takes the form

a+ ■ / m + m0 — a+ ■ /m0 a- ■у/ m — m0 + a+ ■ y/m0

Similarly to the stress case, in the first approximation, the numerator is approximately equal to a+ ■ /m — a+ ■ /m0, while the denominator is approximately equal to a- ■ /m — a+ ■ y/m0. Thus, in the first approximation, the ratio (4) takes the form

a+ ■ y/m — a+ ■ y/m0 a- ■yfm + a+ ■ /m0

We can see that, in comparison to the ratio (1), we decreased the numerator and increased the denominator — as a result, the ratio decreases.

Thus, we conclude that in happy situations, negative reinforcements are more efficient that the positive ones (but do not tell that to your bosses :-).

2. Why Optimists Study Better?

Empirical fact. It is a known fact that optimists study better; see, e.g., [22] and references therein.

Let us describe this situation in precise terms. What does optimism mean in precise terms?

According to the traditional decision theory, if for each possible alternative a, we know the probabilities pi(a) and utilities Ui(a) of different outcomes i, then a rational person should select an alternative a for which the value

u(a) =f pi(a) ■ Ui(a) is the largest [2,3, 14, 17, 19]. In such situations, there

i

is only one rational choice, there is no possibility to show optimism or pessimism.

In practice, however, we rarely know the exact probability and the exact utility of different outcomes. Usually, we only know the bounds p.(a), Pi(a), ui(a), and Ui(a) on possible values of pi(a) and Ui(a): p (a) ^ pi(a) ^ Pi(a) and ui(a) ^ Ui(a) ^ Ui(a). For different values of pi(a) and Ui(a) from the corresponding intervals, we get different values of the overall utility u(a). Thus, instead of a single value u(a), we have an interval [u(a),u(a)] of possible values. How should we make decisions if for each alternative a, we know such an interval [u(a),u(a)]?

Reasonable requirements on rationality of a decision maker lead to the following solution (first proposed by the future Nobel Prize winner Leo Hurwicz): we should select a number a e [0,1] and select an alternative for which the combination a ■ u(a) + (1 — a) ■ u(a) is the largest possible; see, e.g., [5,10,14].

When a = 1, this means that when making a decision, we only take into account the most favorable situation, when the utility u(a) attains its largest possible value u(a). This is clearly the case of extreme optimism.

When a = 0, this means that when making a decision, we only take into account the least favorable situation, when the utility u(a) attains its smallest possible value u(a). This is clearly the case of extreme pessimism.

Values a intermediate between 0 and 1 describe realistic decision makers. The larger a, the more the decision maker takes into account the most optimistic scenario and the less he/she takes into account the most pessimistic scenarios. Thus, the value a can serve as a quantitative measure of the decision maker's optimism: the larger a, the more optimistic the decision maker.

This explains why optimistic study better. In education, we invest some efforts now and get rewards in the future. Time for studying is taken from time of having fun: we have less chances to go to a movie, to watch TV, etc. So, in comparison with not studying, this part of the learning process brings less positive utility.

We do study, because we know that there will be a future reward: better knowledge, better job, etc. So, when deciding how much time we dedicate to studying (or whether to study at all), we take into account both the utility decrease now and the potential utility increase in the future.

The decrease now is clear, we thus know the value Ud < 0. About the future rewards, we are not 100% certain: now there is a demand for your major, who

knows what will happen four years from now, when we graduate with a degree? Thus, for future rewards, instead of the exact value ur, we only know the interval of possible values ur e [ur,ur]. The overall utility therefore takes all possible values form u = Ud + ur to u = Ud + ur. A person with an optimism value a selects to study if the Hurwicz combination a • u + (1 — a) • u is larger than the value 0 corresponding to not studying.

Here, as one can easily check, the Hurwicz combination is equal to

Ud + a • ur + (1 — a) • ur = Ud + ur + a • (ur — ur).

This value increases with a. If this value was larger that 0 for some a, it will be still larger than 0 for a' > a — and for a' > a, in some situations when the Hurwicz value was negative, it may becomes positive.

Thus, the larger the level a of a person's optimism, the more there are situations in which this person will start studying. This explains why optimists are better students.

3. Why People Become Restless?

Phenomenon. When a person's salary is increased, this person becomes happy. If a few years pass and the salary remains the same, then, while objectively, the person has the same good life as before, he or she becomes restless, unhappy. Why?

The situation is the same as in the past years, so why is not level of happiness the same?

Towards an explanation. It is known that our utility depends not only on what we have now, it also depends on what we expect in the future: otherwise, we would act without thinking of possible consequences. The expected future values of utility come with some discounting, usually, the exponential discounting, when — just like when you invest money in a bank - the utility T moments in the future gets multiply by for some ¡3 < 1; see, e.g., [1-5,8-11]. (For the bank, ¡3 is 1 minus interest; e.g., if the interest rate is 3%, ¡3 = 0.97.) As a result, if ot is the utility caused by the current situation at moment t, the actual utility ut at moment t is equal to

ut = ot + a • ot+i + a • ot+2 + ...

We do not know the future values, we get them by extrapolation, based on the previous several values ot, ot-i, ...

The simplest possible extrapolation is linear extrapolation which is based on the last two values ot and ot-i. Here, ot+j = ot + j • (ot — ot-i). In the year t in which a salary got increased, the difference is positive, so ot+i > ot, ot+2 > ot, etc., hence

ut = ot + a • ot+i + a2 • Ot+2 + ... > ot + a • ot + a2 • ot + ... = ot • (1 + a + a2 + ...).

A few years later, when ot = ot-i, all extrapolated values are the same:

ot+i = ot+2 = ... = ot, thus

ut = ot + a • ot + a2 • ot + ... = ot • (1 + a + a2 + ...).

We see that the utility in the first year is indeed larger than the utility a few years after — this is exactly what we observe.

Acknowledgments

This work was supported in part by the National Science Foundation grant HRD-1242122 (Cyber-ShARE Center of Excellence).

References

1. Critchfield T.S., Kollins S.H. Temporal discounting: basic research and the analysis of socially important behavior // Journal of Applied Bheavior Analysis. 2001. V. 34. P. 101-122.

2. Fishburn P.C. Utility Theory for Decision Making. John Wiley & Sons Inc., New York, 1969.

3. Fishburn P.C. Nonlinear Preference and Utility Theory. Baltimore, Maryland : The John Hopkins Press, 1988.

4. Frederick S., Loewenstein G., O'Donoghue T. Time discounting: a critical review // Journal of Economic Literature. 2002. V. 40. P. 351-401.

5. Hurwicz L. Optimality Criteria for Decision Making Under Ignorance. Cowles Commission Discussion Paper, Statistics, No. 370, 1951.

6. Kahneman D. Thinking Fast and Slow. Farrar, Straus, and Girous, New York, 2011.

7. King G.R., Logue A.W., Gleiser D. Probability and delay in reinforcement: an examination of Mazur's equivalence rule // Behavioural Processes. 1992. V. 27. P. 125-138.

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

8. Kirby K.N. Bidding on the future: evidence against normative discounting of delayed rewards // Journal of Experimental Psychology (General). 1997. V. 126. P. 54-70.

9. Konig C.J., Kleinmann M. Deadline rush: a time management phenomenon and its mathematical description. The Journal of Psychology. 2005. V. 139, No. 1. P. 33-45.

10. Kreinovich V. Decision making under interval uncertainty (and beyond) // Human-Centric Decision-Making Models for Social Sciences / P. Guo, W. Pedrycz (eds.). Springer Verlag, 2014. P. 163-193.

11. Lighthall N.R., Gorlick M.A., Schoeke A., Frank M.J., Mather M. Stress modulates reinforcement learning in younger and older adults // Psychology of Aging. 2013. V. 28, No. 1. P. 35-46.

12. Lorkowski J., Kreinovich V. Granularity helps explain seemingly irreation features of human decision making // Granular Computing and Decision-Making: Interactive and Iterative Approaches / W. Pedrycz, S.-M. Chen (eds.). Springer Verlag, Cham, Switzerland, 2015. P. 1-31.

13. Lorkowski J., Kreinovich V. Bounded Rationality in Decision Making Under Uncertainty: Towards Optimal Granularity. Springer Verlag, Cham, Switzerland, 2018.

14. Luce R.D., Raiffa R. Games and Decisions: Introduction and Critical Survey. Dover, New York, 1989.

15. Mazur J.E. An adjustment procedure for studying delayed reinforcement // Quantitative Analyses of Behavior. Vol. 5, The Effect of Delay and Intervening Events / M.L. Commons, J.E. Mazur, J.A. Nevin, H. Rachlin (eds.). Erlbaum, Hillsdale, 1987.

16. Mazur J.E. Choice, delay, probability, and conditional reinforcement // Animal Learning Behavior. 1997. V. 25. P. 131-147.

17. Nguyen H.T., Kosheleva O., Kreinovich V. Decision making beyond Arrow's 'impossibility theorem', with the analysis of effects of collusion and mutual attraction // International Journal of Intelligent Systems. 2009. V. 24, No. 1. P. 27-47.

18. Rachlin H., Raineri A., Cross D. Subjective probability and delay // Journal of the Experimental Analysis of Behavior. 1991. V. 55. P. 233-244.

19. Raiffa H. Decision Analysis. Addison-Wesley, Reading, Massachusetts, 1970.

20. Zapata F., Kosheleva O., Kreinovich V., Dumrongpokaphan T. Do it today or do it tomorrow: empirical non-exponential discounting explained by symmetry ideas // Proceedings of the International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making IUKM'2018 / V.-N. Huynh, M. Inuiguchi, D.-H. Tran, T. Denoeux (eds.). Hanoi, Vietnam, March 13-15, 2018.

21. Zimmer K., Tough choices // The Scientist. 2018. V. 32, No. 3. P. 20-21.

22. Seligman M. The Optimistic Child: A Proven Program to Safeguard Children Against Depression and Build Lifelong Resilience. Martiner Books, New York, 2007.

ПОЧЕМУ В УСЛОВИЯХ СТРЕССА ПОЛОЖИТЕЛЬНОЕ ПОДКРЕПЛЕНИЕ БОЛЕЕ ЭФФЕКТИВНО? ПОЧЕМУ ОПТИМИСТЫ ЛУЧШЕ УЧАТСЯ? ПОЧЕМУ ЛЮДИ СТАНОВЯТСЯ БЕСПОКОЙНЫМИ? ПРОСТЫЕ ПОЯСНЕНИЯ НА ОСНОВЕ ПОЛЕЗНОСТИ

Ф. Запата

к.ф.-м.н., преподаватель, e-mail: [email protected] О. Кошелева к.ф.-м.н., доцент, e-mail: [email protected] В. Крейнович

к.ф.-м.н., профессор, e-mail: [email protected]

Техасский университет в Эль Пасо, США

Аннотация. В этой статье мы используем подход к принятию решений, основанный на полезности, чтобы дать простые ответы на следующие три вопроса: почему в условиях стресса положительное подкрепление более эффективно? Почему оптимисты лучше учатся? Почему люди становятся беспокойными?

Ключевые слова: теория полезности, положительное и отрицательное подкрепление, оптимисты и пессимисты.

Дата поступления в редакцию: 05.04.2018

i Надоели баннеры? Вы всегда можете отключить рекламу.