simulation_study.Rmd 16.1 KB
Newer Older
1
---
linushof's avatar
linushof committed
2
title: 'Sampling in DfE: Simulation Study'
linushof's avatar
linushof committed
3
author: "Linus Hof, Veronika Zilker & Thorsten Pachur"
4
5
6
7
8
9
bibliography: sampling-strategies-in-dfe.bib
output:
  html_document:
    code_folding: hide
    toc: yes
    toc_float: yes
linushof's avatar
linushof committed
10
    number_sections: no
linushof's avatar
linushof committed
11
12
13
  pdf_document:
    toc: yes
csl: apa.csl
14
15
---

16
17
```{r}
# load packages
linushof's avatar
linushof committed
18
19
pacman::p_load(repro,
               tidyverse,
linushof's avatar
linushof committed
20
21
               knitr, 
               viridis)
22
23
```

linushof's avatar
linushof committed
24
# Note
25

linushof's avatar
linushof committed
26
27
-   Some of the R code is folded but can be unfolded by clicking the `Code` buttons.
-   This document was created from the commit with the hash `r repro::current_hash()`.
28

linushof's avatar
linushof committed
29
30
# Abstract

linushof's avatar
linushof committed
31
32
Synthetic choice data from decisions from experience (DfE) is generated by applying different strategies of sample integration to 2-prospect gambles. 
The synthetic data is explored for characteristic choice patterns produced by comprehensive and piecewise forms of sample integration under varying structures of the environment (gamble features) and aspects of the sampling- and decision behavior (model parameters).
linushof's avatar
linushof committed
33
34
35

# Summary

linushof's avatar
linushof committed
36
Provide short summary of simulation study results.
37

linushof's avatar
linushof committed
38
# Introduction
39

linushof's avatar
linushof committed
40
A formal introduction to sampling in DfE and the data generating models of this study can be found [here](https://arc-git.mpib-berlin.mpg.de/sampling-strategies-in-dfe/sampling-strategies-in-dfe/-/blob/main/modeling-sampling.Rmd)
41

linushof's avatar
linushof committed
42
43
# Method

linushof's avatar
linushof committed
44
## Test set
45

linushof's avatar
linushof committed
46
47
48
49
50
51
52
Under each condition, i.e., strategy-parameter combinations, all gambles are played by 100 synthetic agents. We test a set of gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*). 
Therefore, 60 gambles from an initial set of 10,000 are sampled. 
Both outcomes and probabilities are drawn from uniform distributions, ranging from 0 to 20 for outcomes and from .01 to .99 for probabilities of the lower risky outcomes $p_{low}$. 
The probabilities of the higher risky outcomes are $1-p_{low}$, respectively. 
To omit dominant prospects, safe outcomes fall between both risky outcomes. 
The table below contains the test set of 60 gambles. 
Sampling of gambles was stratified, randomly drawing an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome. Risky outcomes are considered *"rare"* if their probability is $p < .2$ and *"attractive"* (*"unattractive"*) if they are higher (lower) than the safe outcome.
53

linushof's avatar
linushof committed
54
55
56
```{r message=FALSE}
gambles <- read_csv("data/gambles/sr_subset.csv")
gambles %>% kable()
57
58
```

linushof's avatar
linushof committed
59
## Model Parameters
60

linushof's avatar
linushof committed
61
62
**Switching probability** $s$ is the probability with which agents draw the following single sample from the prospect they did not get their most recent single sample from. 
$s$ is varied between .1 to 1 in increments of .1.
63

linushof's avatar
linushof committed
64
65
The **boundary type** is either the minimum value any prospect's sample statistic must reach (absolute) or the minimum value for the difference of these statistics (relative). 
Sample statistics are sums over outcomes (comprehensive strategy) and sums over wins (piecewise strategy), respectively.
66

linushof's avatar
linushof committed
67
68
For comprehensive integration, the **boundary value** $a$ is varied between 15 to 75 in increments of 15.
For piecewise integration $a$ is varied between 1 to 5 in increments of 1.
69

linushof's avatar
linushof committed
70
```{r message=FALSE}
71
72
73
74
75
76
77
78
# read choice data 
cols <- list(.default = col_double(),
             strategy = col_factor(),
             boundary = col_factor(),
             gamble = col_factor(),
             rare = col_factor(),
             agent = col_factor(),
             choice = col_factor())
linushof's avatar
linushof committed
79
choices <- read_csv("data/choices/choices.csv", col_types = cols)
80
81
```

linushof's avatar
linushof committed
82
In sum, 2 (strategies) x 60 (gambles) x 100 (agents) x 100 (parameter combinations) = `r nrow(choices)` choices are simulated.
linushof's avatar
linushof committed
83

linushof's avatar
linushof committed
84
# Results
85

linushof's avatar
linushof committed
86
87
Because we are not interested in deviations from normative choice due to sampling artifacts (e.g., ceiling effects produced by low boundaries), we remove trials in which only one prospect was attended. 
In addition, we use relative frequencies of sampled outcomes rather than 'a priori' probabilities to compare actual against normative choice behavior. 
88
89

```{r}
linushof's avatar
linushof committed
90
91
92
# remove choices where prospects were not attended
choices <- choices %>%
  filter(!(is.na(a_ev_exp) | is.na(b_ev_exp)))
93
94
```

linushof's avatar
linushof committed
95
96
97
98
99
```{r eval = FALSE}
# remove choices where not all outcomes were sampled
choices <- choices %>% 
  filter(!(is.na(a_ev_exp) | is.na(b_ev_exp) | a_p1_exp == 0 | a_p2_exp == 0))
```
linushof's avatar
linushof committed
100

linushof's avatar
linushof committed
101
Removing the respective trials, we are left with `r nrow(choices)` choices. 
linushof's avatar
linushof committed
102

linushof's avatar
linushof committed
103
## Sample Size
linushof's avatar
linushof committed
104

linushof's avatar
linushof committed
105
106
107
108
109
110
```{r message=FALSE}
samples <- choices %>% 
  group_by(strategy, s, boundary, a) %>% 
  summarise(n_med = median(n_sample))
samples_piecewise <- samples %>% filter(strategy == "piecewise")
samples_comprehensive <- samples %>% filter(strategy == "comprehensive")
111
112
```

linushof's avatar
linushof committed
113
The median sample sizes generated by different parameter combinations ranged from `r min(samples_piecewise$n_med)` to `r max(samples_piecewise$n_med)` for piecewise integration and `r min(samples_comprehensive$n_med)` to `r max(samples_comprehensive$n_med)` for comprehensive integration.
114

linushof's avatar
linushof committed
115
### Boundary type and boundary value (a)
116

linushof's avatar
linushof committed
117
As evidence is accumulated sequentially, relative boundaries and large boundary values naturally lead to larger sample sizes, irrespective of the integration strategy. 
linushof's avatar
linushof committed
118

linushof's avatar
linushof committed
119
120
```{r message=FALSE}
group_med <- samples_piecewise %>%
linushof's avatar
linushof committed
121
  group_by(boundary, a) %>% 
linushof's avatar
linushof committed
122
  summarise(group_med = median(n_med)) # to get the median across all s values
linushof's avatar
linushof committed
123

linushof's avatar
linushof committed
124
125
samples_piecewise %>%
  ggplot(aes(a, n_med, color = a)) + 
linushof's avatar
linushof committed
126
  geom_jitter(alpha = .5, size = 2) +
linushof's avatar
linushof committed
127
128
129
  geom_point(data = group_med, aes(y = group_med), size = 3) +
  facet_wrap(~boundary) + 
  scale_color_viridis() + 
130
  labs(title = "Piecewise Integration",
linushof's avatar
linushof committed
131
       x ="a", 
linushof's avatar
linushof committed
132
       y="Sample Size", 
linushof's avatar
linushof committed
133
       col="a") + 
linushof's avatar
linushof committed
134
  theme_minimal()
linushof's avatar
linushof committed
135
```
linushof's avatar
linushof committed
136

linushof's avatar
linushof committed
137
138
```{r message=FALSE}
group_med <- samples_comprehensive %>%
linushof's avatar
linushof committed
139
  group_by(boundary, a) %>% 
linushof's avatar
linushof committed
140
  summarise(group_med = median(n_med)) 
linushof's avatar
linushof committed
141

linushof's avatar
linushof committed
142
143
samples_comprehensive %>%
  ggplot(aes(a, n_med, color = a)) + 
linushof's avatar
linushof committed
144
  geom_jitter(alpha = .5, size = 2) +
linushof's avatar
linushof committed
145
146
147
  geom_point(data = group_med, aes(y = group_med), size = 3) +
  facet_wrap(~boundary) + 
  scale_color_viridis() + 
148
  labs(title = "Comprehensive Integration",
linushof's avatar
linushof committed
149
       x ="a", 
linushof's avatar
linushof committed
150
       y="Sample Size", 
linushof's avatar
linushof committed
151
       col="a") + 
linushof's avatar
linushof committed
152
  theme_minimal()
153
154
```

linushof's avatar
linushof committed
155
### Switching probability (s)
156

linushof's avatar
linushof committed
157
158
For piecewise integration, there is an inverse relationship between switching probability and sample size. 
I.e., the lower s, the less frequent prospects are compared and thus, boundaries are only approached with larger sample sizes. 
linushof's avatar
linushof committed
159
This effect is particularly pronounced for low probabilities such that the increase in sample size accelerates as switching probability decreases. 
linushof's avatar
linushof committed
160

linushof's avatar
linushof committed
161
162
```{r message=FALSE}
group_med <- samples_piecewise %>%
linushof's avatar
linushof committed
163
  group_by(boundary, s) %>% 
linushof's avatar
linushof committed
164
  summarise(group_med = median(n_med)) # to get the median across all a values
linushof's avatar
linushof committed
165

linushof's avatar
linushof committed
166
167
168
169
170
171
samples_piecewise %>%
  ggplot(aes(s, n_med, color = s)) + 
  geom_jitter(alpha = .5, size = 2) +
  geom_point(data = group_med, aes(y = group_med), size = 3) +
  facet_wrap(~boundary) + 
  scale_color_viridis() + 
172
  labs(title = "Piecewise Integration",
linushof's avatar
linushof committed
173
       x ="s", 
linushof's avatar
linushof committed
174
       y="Sample Size", 
linushof's avatar
linushof committed
175
       col="s") + 
linushof's avatar
linushof committed
176
177
178
  theme_minimal()
```

linushof's avatar
linushof committed
179
For comprehensive integration, boundary types differ in the effects of switching probability. 
linushof's avatar
linushof committed
180
For absolute boundaries, switching probability has no apparent effect on sample size as the distance of a given prospect to its absolute boundary is not changed by switching to (and sampling from) the other prospect. 
linushof's avatar
linushof committed
181
For relative boundaries, however, samples sizes increase with switching probability. 
linushof's avatar
linushof committed
182

linushof's avatar
linushof committed
183
184
```{r message=FALSE}
group_med <- samples_comprehensive %>%
linushof's avatar
linushof committed
185
  group_by(boundary, s) %>% 
linushof's avatar
linushof committed
186
  summarise(group_med = median(n_med)) # to get the median across all a values
linushof's avatar
linushof committed
187

linushof's avatar
linushof committed
188
189
190
191
192
193
samples_comprehensive %>%
  ggplot(aes(s, n_med, color = s)) + 
  geom_jitter(alpha = .5, size = 2) +
  geom_point(data = group_med, aes(y = group_med), size = 3) +
  facet_wrap(~boundary) + 
  scale_color_viridis() + 
194
  labs(title = "Comprehensive Integration",
linushof's avatar
linushof committed
195
196
197
       x ="s",
       y = "Sample Size", 
       col="s") + 
linushof's avatar
linushof committed
198
199
200
  theme_minimal()
```

linushof's avatar
linushof committed
201
## Choice Behavior
linushof's avatar
linushof committed
202

linushof's avatar
linushof committed
203
204
205
206
207
208
209
210
211
212
213
214
215
Below, in extension to Hills and Hertwig [-@hillsInformationSearchDecisions2010], the interplay of integration strategies, gamble features, and model parameters in their effects on choice behavior in general and their contribution to underweighting of rare events in particular is investigated. 
We apply two definitions of underweighting of rare events: Considering false response rates, we define underweighting such that the rarity of an attractive (unattractive) outcome leads to choose the safe (risky) prospect although the risky (safe) prospect has a higher expected value.

```{r message=FALSE}
fr_rates <- choices %>% 
  mutate(ev_ratio_exp = round(a_ev_exp/b_ev_exp, 2), 
         norm = case_when(ev_ratio_exp > 1 ~ "A", ev_ratio_exp < 1 ~ "B")) %>% 
  filter(!is.na(norm)) %>% # exclude trials with normative indifferent options
  group_by(strategy, s, boundary, a, rare, norm, choice) %>% # group correct and incorrect responses
  summarise(n = n()) %>% # absolute numbers 
  mutate(rate = round(n/sum(n), 2), # response rates 
         type = case_when(norm == "A" & choice == "B" ~ "false safe", norm == "B" & choice == "A" ~ "false risky")) %>% 
  filter(!is.na(type)) # remove correct responses
linushof's avatar
linushof committed
216
217
```

linushof's avatar
linushof committed
218
Considering the parameters of Prelec's [-@prelecProbabilityWeightingFunction1998] implementation of the weighting function [CPT; cf. @tverskyAdvancesProspectTheory1992], underweighting is reflected by decisions weights estimated to be smaller than the corresponding objective probabilities.  
linushof's avatar
linushof committed
219

linushof's avatar
linushof committed
220
### False Response Rates
linushof's avatar
linushof committed
221

linushof's avatar
linushof committed
222
223
224
```{r message=FALSE}
fr_rates_piecewise <- fr_rates %>% filter(strategy == "piecewise")
fr_rates_comprehensive <- fr_rates %>% filter(strategy == "comprehensive")
linushof's avatar
linushof committed
225
```
226

linushof's avatar
linushof committed
227
228
The false response rates generated by different parameter combinations ranged from `r min(fr_rates_piecewise$rate)` to `r max(fr_rates_piecewise$rate)` for piecewise integration and from `r min(fr_rates_comprehensive$rate)` to `r max(fr_rates_comprehensive$rate)` for comprehensive integration. 
However, false response rates vary considerably as a function of rare events, indicating that their presence and attractiveness are large determinants of false response rates.
linushof's avatar
linushof committed
229

linushof's avatar
linushof committed
230
231
232
233
234
235
```{r message=FALSE}
fr_rates %>% 
  group_by(strategy, boundary, rare) %>% 
  summarise(min = min(rate),
            max = max(rate)) %>% 
  kable()
linushof's avatar
linushof committed
236
237
```

linushof's avatar
linushof committed
238
239
240
241
The heatmaps below show the false response rates for all strategy-parameter combinations. 
Consistent with our - somewhat rough - definition of underweighting, the rate of false risky responses is generally higher, if the unattractive outcome of the risky prospect is rare (top panel).
Conversely, if the attractive outcome of the risky prospect is rare, the rate of false safe responses is generally higher (bottom panel).
As indicated by the larger range of false response rates, the effects of rare events are considerably larger for piecewise integration. 
242

linushof's avatar
linushof committed
243
244
245
246
247
248
249
250
251
252
253
254
255
256
```{r message=FALSE}
fr_rates %>% 
  filter(strategy == "piecewise", boundary == "absolute") %>% 
  ggplot(aes(a, s, fill = rate)) + 
  facet_grid(type ~ fct_relevel(rare, "attractive", "none", "unattractive"), switch = "y") +
  geom_tile(colour="white", size=0.25) + 
  scale_x_continuous(expand=c(0,0), breaks = seq(1, 5, 1)) +
  scale_y_continuous(expand=c(0,0), breaks = seq(.1, 1, .1)) +
  scale_fill_viridis() + 
  labs(title = "Piecewise Integration | Absolute Boundary",
       x = "a", 
       y= "s", 
       fill = "% False Responses") + 
  theme_minimal() 
257
258
```

linushof's avatar
linushof committed
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
```{r message=FALSE}
fr_rates %>% 
  filter(strategy == "piecewise", boundary == "relative") %>% 
  ggplot(aes(a, s, fill = rate)) + 
  facet_grid(type ~ fct_relevel(rare, "attractive", "none", "unattractive"), switch = "y") +
  geom_tile(colour="white", size=0.25) + 
  scale_x_continuous(expand=c(0,0), breaks = seq(1, 5, 1)) +
  scale_y_continuous(expand=c(0,0), breaks = seq(.1, 1, .1)) +
  scale_fill_viridis() + 
  labs(title = "Piecewise Integration | Relative Boundary",
       x = "a", 
       y= "s", 
       fill = "% False Responses") + 
  theme_minimal() 
```
linushof's avatar
linushof committed
274

linushof's avatar
linushof committed
275
276
```{r message=FALSE}
fr_rates %>% 
linushof's avatar
linushof committed
277
  filter(strategy == "comprehensive", boundary == "absolute") %>% 
linushof's avatar
linushof committed
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
  ggplot(aes(a, s, fill = rate)) + 
  facet_grid(type ~ fct_relevel(rare, "attractive", "none", "unattractive"), switch = "y") +
  geom_tile(colour="white", size=0.25) + 
  scale_x_continuous(expand=c(0,0), breaks = seq(15, 75, 15)) +
  scale_y_continuous(expand=c(0,0), breaks = seq(.1, 1, .1)) +
  scale_fill_viridis() + 
  labs(title = "Comprehensive Integration | Absolute Boundary",
       x = "a", 
       y= "s", 
       fill = "% False Responses") + 
  theme_minimal() 
```

```{r message=FALSE}
fr_rates %>% 
linushof's avatar
linushof committed
293
  filter(strategy == "comprehensive", boundary == "relative") %>% 
linushof's avatar
linushof committed
294
295
296
297
298
299
300
301
302
303
304
  ggplot(aes(a, s, fill = rate)) + 
  facet_grid(type ~ fct_relevel(rare, "attractive", "none", "unattractive"), switch = "y") +
  geom_tile(colour="white", size=0.25) + 
  scale_x_continuous(expand=c(0,0), breaks = seq(15, 75, 15)) +
  scale_y_continuous(expand=c(0,0), breaks = seq(.1, 1, .1)) +
  scale_fill_viridis() + 
  labs(title = "Comprehensive Integration | Relative Boundary",
       x = "a", 
       y= "s", 
       fill = "% False Responses") + 
  theme_minimal() 
305
306
```

linushof's avatar
linushof committed
307
#### Switching Probability (s) and Boundary Value (a)
linushof's avatar
linushof committed
308

linushof's avatar
linushof committed
309
As for both piecewise and comprehensive integration the differences between boundary types are rather minor and of magnitude than of qualitative pattern, the remaining analyses of false response rates are summarized across absolute and relative boundaries. 
linushof's avatar
linushof committed
310

linushof's avatar
linushof committed
311
Below, the $s$ and $a$ parameter are considered as additional sources of variation in the false response pattern above and beyond the interplay of integration strategies and the rarity and attractiveness of outcomes.
linushof's avatar
linushof committed
312

linushof's avatar
linushof committed
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
```{r message=FALSE}
fr_rates %>% 
  filter(strategy == "piecewise") %>% 
  ggplot(aes(s, rate, color = a)) + 
  facet_grid(type ~ fct_relevel(rare, "attractive", "none", "unattractive"), switch = "y") +
  geom_jitter(size = 2) + 
  scale_x_continuous(breaks = seq(0, 1, .1)) +
  scale_y_continuous(breaks = seq(0, 1, .1)) +
  scale_color_viridis() + 
  labs(title = "Piecewise Integration",
       x = "s", 
       y= "% False Responses", 
       color = "a") + 
  theme_minimal() 
```
328

linushof's avatar
linushof committed
329
330
```{r message=FALSE}
fr_rates %>% 
linushof's avatar
linushof committed
331
  filter(strategy == "comprehensive") %>% 
linushof's avatar
linushof committed
332
333
334
335
336
337
  ggplot(aes(s, rate, color = a)) + 
  facet_grid(type ~ fct_relevel(rare, "attractive", "none", "unattractive"), switch = "y") +
  geom_jitter(size = 2) + 
  scale_x_continuous(breaks = seq(0, 1, .1)) +
  scale_y_continuous(breaks = seq(0, 1, .1)) +
  scale_color_viridis() + 
338
  labs(title = "Comprehensive Integration",
linushof's avatar
linushof committed
339
340
341
342
       x = "s", 
       y= "% False Responses", 
       color = "a") + 
  theme_minimal() 
343
344
```

linushof's avatar
linushof committed
345
346
347
348
For piecewise integration, switching probability is naturally related to the size of the samples on which the round-wise comparisons of prospects are based on, with low values of $s$ indicating large samples and vice versa. 
Accordingly, switching probability is positively related to false response rates.
I.e., the larger the switching probability, the smaller the round-wise sample size and the probability of experiencing a rare event within a given round. 
Because round-wise comparisons are independent of each other and binomial distributions within a given round are skewed for small samples and outcome probabilities [@kolmogorovFoundationsTheoryProbability1950], increasing boundary values do not reverse but rather amplify this relation.  
349

linushof's avatar
linushof committed
350
351
352
353
For comprehensive integration, switching probability is negatively related to false response rates, i.e., an increase in $s$ is associated with decreasing false response rates. 
This relation, however, may be the result of an artificial interaction between the $s$ and $a$ parameter. 
Precisely, in the current algorithmic implementation of sampling with a comprehensive integration mechanism, decreasing switching probabilities cause comparisons of prospects based on increasingly unequal sample sizes immediately after switching prospects. 
Consequentially, reaching (low) boundaries is rather a function of switching probability and associated sample sizes than of actual evidence for a given prospect over the other.
354

linushof's avatar
linushof committed
355
### Cumulative Prospect Theory 
linushof's avatar
linushof committed
356

linushof's avatar
linushof committed
357
In the following, we examine the possible relations between the parameters of the *choice-generating* sampling models and the *choice-describing* cumulative prospect theory. 
linushof's avatar
linushof committed
358

linushof's avatar
linushof committed
359
# References