pilot-study.Rmd 18.1 KB
Newer Older
1
2
3
4
---
title: "Sampling Strategies in DfE - Pilot study"
author: "Linus Hof"
date: "2021"
5
6
7
8
9
10
11
12
bibliography: sampling-strategies-in-dfe.bib
csl: apa.csl
output:
  html_document:
    code_folding: hide
    toc: yes
    toc_float: yes
    number_sections: yes
13
14
---

15
16
17
18
19
20
```{r}
# load packages
pacman::p_load(tidyverse, 
               knitr)
```

21
22
# Study Description

23
In this pilot study, choice data will be generated by applying the *comprehensive* and *piecewise sampling strategy* to a series of two-prospect gambles. 
24

25
The simulated data will be explored for characteristic patterns of (or differences between) sampling strategies under varying structures of the choice environment, i.e., the features of a gamble's prospects, and aspects of the sampling and decision behavior (model parameters).
26

27
## Method
28

29
### Agents
30

31
Under each condition (i.e., sampling strategy combined with all possible parameter settings), all gambles are played by 100 synthetic agents. 
32
33
34
35

```{r}
n_agents <- 100
```
36

37
### Gambles 
38

39
Two different types of two-prospect gambles will be tested: (a) Gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*). (b) Gambles, in which both prospects contain two risky outcomes (*risky-risky gambles*). 
40

41
Large parts of the procedure for generating and selecting gambles is similar to those in Rieskamp [-@rieskampProbabilisticNaturePreferential2008] and Erev et al. [-@erevChoicePredictionCompetition2010]. For both gamble types, a set of 10,000 gambles is generated. Outcomes are drawn from a uniform distribution ranging from 0 to 20, rounded to two digits after the decimal point. To omit dominant prospects, the safe outcome in safe-risky gambles must fall between the two outcomes of the risky prospect (medium outcome); in risky-risky gambles, at least one outcome of one of the prospects must fall between the two outcomes of the other prospect. For all risky prospects, the probability of the lower outcome $p_L$ is drawn from a uniform distribution ranging from .01 to .99. The probability of the higher outcome $p_H$ is $1-p_L$, respectively.
42

43
#### Safe-risky gambles
44
45
46
47
48

For the safe-risky set, an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome is randomly selected from the set of 10,000, amounting to a total of 60 gambles that are played under each condition (see above) by all agents. Here, risky outcomes are considered "rare" if their probability is $p < .2$ and "attractive" if they are higher than the safe outcome. Vice versa, risky outcomes are considered "unattractive" if they are lower than the safe outcome.

The set of 60 safe-risky gambles is given in the table below.

49
```{r eval = FALSE}
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
generate_gambles <- function(n, safe = TRUE, lower, upper) {
  
  # n: number of gambles 
  # safe: gamble typ; TRUE (default) = safe vs. risky option; FALSE = risky options only 
  # lower, upper: lower and upper boundary of outcome range
  
  ev <- function(p1, o1, o2 = 0) {
    round(p1 * o1 + (1-p1) * o2, digits = 2) # expected value for n <= 2 outcomes
    }
  
  output <- vector("list", n)   
  
  if(safe == TRUE) {
    
    # safe vs. risky gambles  
    
    output %>% 
      map(tibble, # create tibble for each gamble/list entry 
          "names" = c("a_o1", "b", "a_o2", "a_p1", "b_p"),
          "values" = c(runif(3, min = lower, max = upper) %>% round(2) %>% # generate outcomes
                         sort(), # prevent dominance: a_o1 < b < a_o2
                       runif(1, min = .01, max = .99) %>% round(2), # probabilities 
                       1) 
          ) %>%
      map(pivot_wider, names_from = "names", values_from = "values") %>% # tidy: gambles as obs (rows) 
      map_dfr(as.list) %>% # return tibble including all gambles (row-binding) 
      mutate(a_ev = ev(a_p1, a_o1, a_o2), 
             b_ev = ev(b_p, b), 
             ev_diff = round(a_ev - b_ev, 2),
             ev_ratio = round(a_ev/b_ev, 2) 
             ) %>%
      select(a_p1, a_o1, a_o2, a_ev, b_p, b, b_ev, ev_diff, ev_ratio)
    
    } else {
      
      # risky vs. risky gambles
      
       output %>% 
        map(tibble,
            "names" = c("a_o1", "b_o1", "a_o2", "b_o2", "a_p1", "b_p1"),
            "values" = c(runif(3, min = lower, max = upper) %>% round(2) %>%
                           sort(), # prevent dominance: a_o1 < b_o1 and/or b_o2 < a_o2
                         runif(1, min = lower, max = upper) %>% round(2), 
                         runif(2, min = .01, max = .99)
                         )
            ) %>%
        map(pivot_wider, names_from = "names", values_from = "values") %>% 
        map_dfr(as.list) %>% 
        mutate(a_ev = ev(a_p1, a_o1, a_o2), 
               b_ev = ev(b_p1, b_o1, b_o2),
               ev_diff = round(a_ev - b_ev, 2),
               ev_ratio = round(a_ev/b_ev, 2) 
               ) %>%
              select(a_p1, a_o1, a_o2, a_ev, b_p1, b_o1, b_o2, b_ev, ev_diff, ev_ratio)
      }
}
106
107
```

108
```{r eval=FALSE}
109
110
111
112
113
114
# generate and select subset of safe-risky gambles
set.seed(3211)
sr_gambles <- generate_gambles(n = 10000, safe = TRUE, lower = 0, upper = 20)
sr_gambles <- sr_gambles %>% mutate(rare = case_when(a_p1 >= .2 & a_p1 <= .8 ~ "None",
                                                     a_p1 < .2 ~ "Unattractive", # a_o1  < a_o2
                                                     a_p1 > .8 ~ "Attractive"))
115
write_rds(sr_gambles, "./R/data/sr_gambles.rds")
116
117
118
119
120
121
122

sr_subset <- tibble()
for(i in unique(sr_gambles$rare)) {
  type <- sr_gambles %>% filter(rare == i)
  smpl <- sample(seq_len(nrow(type)), size = 20)
  sr_subset <- bind_rows(sr_subset, type[smpl, ])
}
123
write_rds(sr_subset, "./R/data/sr_subset.rds")
124
125
126
```

```{r}
127
sr_subset <- read_rds("./R/data/sr_subset.rds")
128
129
130
kable(sr_subset)
```

131
#### Risky-risky gambles
132
133


134

135
136
137
# Choice Data

## Model Parameters 
138
139

**Switching probability:** $s$ is the probability increment added to the unbiased attendance probability $p = .5$ with which agents draw the succesive single sample from the same prospect they get their most recent single sample from. We vary $s$ between -.5 to .4 in increments of .1.    
140
141
142

**Boundary type**: Can either be the minimum value *any* prospect's sum of random variable realizations must reach (absolute boundary) or the minimum value for the difference of these sums (relative boundary).

143
**Boundary value:** To omit any strict assumptions about the internal boundaries people might apply, we start by varying the parameter value $a$ between integers 15 to 35 in increments of 5, for comprehensive sampling respectively. For piecewise sampling, we vary $a$ between 1 to 7 in increments of 2. We start with a relatively large parameter range to later explore which parameter values in combination with which other parameter settings produces plausible sample sizes - the accumulated evidence for decisions from experience indicates that people use relatively small samples [e.g. @wulffMetaanalyticReviewTwo2018].  
144

145
146
**Noise parameter:** The representation of the outcomes sampled from the probability spaces is assumed to be stochastical. Therefore, we add Gaussian noise $\epsilon \sim N(0, \sigma)$ in units of the outcomes. We start by fixing $\sigma$ to .5. 

147
## Safe-Risky Gambles
148

149
```{r}
150
151
# dataset
gambles <- sr_subset
152
```
153

154
155
156
### Comprehensive sampling

```{r}
157
158
159
160
161
# parameters
parameters <- expand_grid(s = seq(-.5, .4, .1), # switching probability
                          sigma = .5, # noise 
                          boundary = c("absolute", "relative")) # boundary type
theta_c <- expand_grid(parameters, a = c(15, 20, 25, 30, 35)) # boundaries comprehensive 
162
theta_p <- expand_grid(parameters, a = c(1, 3, 5, 7)) # boundaries piecewise
163
164
```

165
```{r} 
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
# Moving cumulative sum and mean: Extensions of 'cumsum' and 'cummean'. Other than the base functions, the extensions have an 'na.rm' argument that removes missing values and allows to continue computing the cumulative array even after a missing value occured. For 'na.rm = TRUE', otherwise missing values are replaced by the cumulative sum/mean of all available values up to the respective vector element. If all values in a cumulative array are missing, NA is returned. 

## cumsum2()
cumsum2 <- function(x, na.rm = FALSE) {
  output <- vector("double", length(x))
  for (i in seq_along(x)) {
    if(sum(is.na(x[1:i])) == length(x[1:i])) {
      output[[i]] <- NA
    } else {
       output[[i]] <- sum(x[1:i], na.rm = na.rm)
       }
  }
  output
}

# cummean2() 
cummean2 <- function(x, na.rm = FALSE) {
  output <- vector("double", length(x))
  for (i in seq_along(x)) {
    if(sum(is.na(x[1:i])) == length(x[1:i])) {
      output[[i]] <- NA
    } else {
       output[[i]] <- mean(x[1:i], na.rm = na.rm)
       }
  }
  output
}
```

195
```{r eval = FALSE}
196
197
198

# simulation

199
theta <- theta_c
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
set.seed(765)
param_list <- vector("list", length(nrow(theta)))
for (set in seq_len(nrow(theta))) {
  gamble_list <- vector("list", length(nrow(gambles)))
  for (gamble in seq_len(nrow(gambles))) {
    agents_list <- vector("list", n_agents)
    for (agent in seq_along(1:n_agents)){ 
      
      ## initial values of an agent's sampling process
      
      fd <- tibble() # state of ignorance
      p <- .5  # no attention bias  
      s <- 0  # no switching at process initiation
      init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first
      attend <- init
      boundary_reached <- FALSE
      
      ## agent's sampling process 
      
      while(boundary_reached == FALSE) {
        
        #### draw single sample
        
        if(attend == "a") {
          single_smpl <- gambles[gamble, ] %>% 
            mutate(attended = attend,
                   A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) + 
                     round(rnorm(n = 1, mean = 0, sd = theta[[set, "sigma"]]), 2), # gaussian noise 
                   B = NA)
          s <- theta[[set, "s"]] # get switching probability
          } else {
            single_smpl <- gambles[gamble, ] %>%
              mutate(attended = attend,
                     A = NA,
                     B = b +
                       round(rnorm(n = 1, mean = 0, theta[[set, "sigma"]]), 2))
            s <- -1*theta[[set, "s"]]
          }
        
        #### integrate single sample into frequency distribution
        
        fd <- bind_rows(fd, single_smpl) %>%
          mutate(A_sum = cumsum2(A, na.rm = TRUE),
                 B_sum = cumsum2(B, na.rm = TRUE))
        
        #### evaluate accumulated evidence
        
        if(theta[[set, "boundary"]] == "absolute") {
          fd <- fd %>%
            mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A",
                                      B_sum >= theta[[set, "a"]] ~ "B"))
          } else {
            fd <- fd %>%
              mutate(diff = round(A_sum - B_sum, 2),
                     choice = case_when(diff >= theta[[set, "a"]] ~ "A",
                                        diff <= -1*theta[[set, "a"]] ~ "B"))
          }
        if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) {
          boundary_reached <- TRUE
          } else {
            attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
          }
      }
      agents_list[[agent]] <- expand_grid(agent, fd)
    }
    all_agents <- agents_list %>% map_dfr(as.list)
    gamble_list[[gamble]] <- expand_grid(gamble, all_agents)
  }
  all_gambles <- gamble_list %>% map_dfr(as.list)
  param_list[[set]] <- expand_grid(theta[set, ], all_gambles)
}
271
sim_comprehensive <- param_list %>% map_dfr(as.list)
272

273
# store simulation 
274

275
write_rds(sim_comprehensive, "./R/data/sim_comprehensive.rds")
276

277
278
279
280
281
282
283
284
285
286
287
288
289
290
# summarize unique sampling processes

summary_comprehensive <- sim_comprehensive %>%
  group_by(s, sigma, boundary, a, gamble, agent) %>% # group by unique sampling process
  mutate(n_sample = n(), # number of single samples 
         switch = case_when(attended != lag(attended) ~ 1, 
                            attended == lag(attended) ~ 0),
         n_switch = sum(switch, na.rm = TRUE)) %>% # number of switches 
  filter(!is.na(choice)) %>% # only return choice data (last obs of unique sampling process)
  select(!c(attended, A, B, switch)) %>% 
  ungroup() %>% 
  mutate(strategy = "comprehensive") %>% 
  select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, A_sum, B_sum, diff, choice)
write_rds(summary_comprehensive, "./R/data/summary_comprehensive.rds")
291
292
```

293

294
295
296
### Piecewise sampling

```{r class.source = "fold-show", eval=FALSE}
297
298
299

# simulation

300
theta <- theta_p
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
set.seed(8739)
param_list <- vector("list", length(nrow(theta)))
for (set in seq_len(nrow(theta))) {
  gamble_list <- vector("list", length(nrow(gambles)))
  for (gamble in seq_len(nrow(gambles))) {
    agents_list <- vector("list", n_agents)
    for (agent in seq_along(1:n_agents)){ 
      
      ## initial values of an agent's sampling process
      
      fd <- tibble() # state of ignorance
      p <- .5  # no attention bias
      s <- 0  # no switching at process initiation
      init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first
      attend <- init
      round <- 1
      boundary_reached <- FALSE
      
      ## agent's sampling process 
      
      while(boundary_reached == FALSE) {
        
        #### sampling round
        
        smpl_round <- tibble()
        while(attend == init) {
          
          ##### draw single sample from prospect attended first
          
          if(attend == "a") {
            single_smpl <- gambles[gamble, ] %>%
              mutate(round = round,
                     attended = attend,
                     A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
                       round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2),
                     B = NA)
            s <- theta[[set, "s"]]
            } else {
              single_smpl <- gambles[gamble, ] %>%
                mutate(round = round,
                       attended = attend,
                       A = NA,
                       B = b + 
                         round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2))
              s <- -1*theta[[set, "s"]]
            }
          smpl_round <- bind_rows(smpl_round, single_smpl)
          attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
        }
        
        while(attend != init) {
          
          ##### draw single sample from prospect attended second
          
          if(attend == "a") {
            single_smpl <- gambles[gamble, ] %>%
              mutate(round = round,
                     attended = attend, 
                     A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) + 
                       round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2),
                     B = NA)
            s <- theta[[set, "s"]]
            } else {
              single_smpl <- gambles[gamble, ] %>%
                mutate(round = round,
                       attended = attend,
                       A = NA,
                       B = b + 
                         round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2))
              s <- -1*theta[[set, "s"]]
            }
          smpl_round <- bind_rows(smpl_round, single_smpl)
          attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
        }
        
        ##### compare mean outcomes 
        
        smpl_round <- smpl_round %>%
          mutate(A_rmean = cummean2(A, na.rm = TRUE),
                 B_rmean = cummean2(B, na.rm = TRUE),
                 rdiff = A_rmean - B_rmean)
        smpl_round[[nrow(smpl_round), "A_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] > 0 ~ 1,
                                                             smpl_round[[nrow(smpl_round), "rdiff"]] <= 0 ~ 0)
        smpl_round[[nrow(smpl_round), "B_win"]] <-  case_when(smpl_round[[nrow(smpl_round), "rdiff"]] >= 0 ~ 0,
                                                              smpl_round[[nrow(smpl_round), "rdiff"]] < 0 ~ 1)
        
        ##### integrate sampling round into frequency distribution
        
        fd <- bind_rows(fd, smpl_round)
        fd[[nrow(fd), "A_sum"]] <- sum(fd[["A_win"]], na.rm = TRUE)
        fd[[nrow(fd), "B_sum"]] <- sum(fd[["B_win"]], na.rm = TRUE)
        
        #### evaluate accumulated evidence
        
        if(theta[[set, "boundary"]] == "absolute") {
          fd <- fd %>%
            mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A",
                                      B_sum >= theta[[set, "a"]] ~ "B"))
          } else {
            fd[[nrow(fd), "wdiff"]] <- fd[[nrow(fd), "A_sum"]] - fd[[nrow(fd), "B_sum"]]
            fd <- fd %>%
              mutate(choice = case_when(wdiff >= theta[[set, "a"]] ~ "A",
                                        wdiff <= -1*theta[[set, "a"]] ~ "B"))
          }
        if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) {
          boundary_reached <- TRUE
          } else {
            round <- round + 1
          }
      }
      agents_list[[agent]] <- expand_grid(agent, fd)
    }
    all_agents <- agents_list %>% map_dfr(as.list)
    gamble_list[[gamble]] <- expand_grid(gamble, all_agents)
  }
  all_gambles <- gamble_list %>% map_dfr(as.list)
  param_list[[set]] <- expand_grid(theta[set, ], all_gambles)
}
sim_piecewise <- param_list %>% map_dfr(as.list)
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443

# store simulation 

write_rds(sim_piecewise, "./R/data/sim_piecewise.rds")

# summarize unique sampling processes

summary_piecewise <- sim_piecewise %>%
  group_by(s, sigma, boundary, a, gamble, agent) %>% 
  mutate(n_sample = n(),  
         A_rmean = case_when(is.na(A_win) == FALSE ~ A_rmean),
         B_rmean = case_when(is.na(A_win) == FALSE ~ B_rmean),
         A_rmean_sum = sum(A_rmean, na.rm = TRUE),
         B_rmean_sum = sum(B_rmean, na.rm = TRUE)) %>%
  ungroup() %>%
  filter(!is.na(choice)) %>% 
  mutate(strategy = "piecewise",
         n_switch = (round*2)-1,
         mean_n_round = round(n_sample/round, 2),
         mean_A_round = round(A_rmean_sum/round, 2),
         mean_B_round = round(B_rmean_sum/round, 2)) %>% 
  select(!c(attended, A, B, A_rmean, B_rmean, rdiff, A_rmean_sum, B_rmean_sum, A_win, B_win)) %>% 
  select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, mean_n_round, mean_A_round, mean_B_round, A_sum, B_sum, wdiff, choice)
write_rds(summary_piecewise, "./R/data/summary_piecewise.rds")
444
445
446
447
```

## Risky-risky gambles

448
449
450
# Descriptive Analysis


451
# References
452