pilot-study.Rmd 23.8 KB
 linushof committed Jul 01, 2021 1 2 3 4 --- title: "Sampling Strategies in DfE - Pilot study" author: "Linus Hof" date: "2021"  linushof committed Jul 02, 2021 5 6 7 8 9 10 11 12 bibliography: sampling-strategies-in-dfe.bib csl: apa.csl output: html_document: code_folding: hide toc: yes toc_float: yes number_sections: yes  linushof committed Jul 01, 2021 13 14 ---  linushof committed Jul 02, 2021 15 16 {r} # load packages  17 pacman::p_load(tidyverse,  linushof committed Jul 02, 2021 18 19 20  knitr)   linushof committed Jul 01, 2021 21 22 # Study Description  23 Choice data will be generated by applying the *comprehensive-* and *piecewise sampling strategy* to a series of 2-prospect gambles.  linushof committed Jul 01, 2021 24   25 26 27 The simulated data will be explored for characteristic patterns of sampling strategies under varying structures of the choice environment, i.e., features of a gamble's prospects, and aspects of the sampling and decision behavior (model parameters). # Choice Data  linushof committed Jul 01, 2021 28   linushof committed Jul 12, 2021 29 ## Method  linushof committed Jul 01, 2021 30   31 ### Agents  linushof committed Jul 01, 2021 32   33 Under each condition, i.e., strategy-parameter combinations, all gambles are played by 100 synthetic agents.  linushof committed Jul 02, 2021 34 35 36 37  {r} n_agents <- 100   linushof committed Jul 01, 2021 38   39 ### Gambles  linushof committed Jul 02, 2021 40   41 A set of gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*) will be tested. Therefore, 60 gambles from a initial set of 10,000 are sampled. Both, outcomes and probabilities are drawn from uniform distributions, ranging from 0 to 20 for outcomes and from .01 to .99 for probabilities of the lower risky outcomes $p_L$. The probabilities of the higher risky outcomes are $1-p_L$, respectively. To omit dominant prospects, safe outcome fall between both risky outcomes. The table below contains the test set of 60 gambles. Sampling of gambles was stratified, randomly drawing an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome. Risky outcomes are considered *"rare"* if their probability is $p < .2$ and *"attractive"* (*"unattractive"*) if they are higher (lower) than the safe outcome.  linushof committed Jul 02, 2021 42   43 {r eval = FALSE}  linushof committed Jul 02, 2021 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 generate_gambles <- function(n, safe = TRUE, lower, upper) { # n: number of gambles # safe: gamble typ; TRUE (default) = safe vs. risky option; FALSE = risky options only # lower, upper: lower and upper boundary of outcome range ev <- function(p1, o1, o2 = 0) { round(p1 * o1 + (1-p1) * o2, digits = 2) # expected value for n <= 2 outcomes } output <- vector("list", n) if(safe == TRUE) { # safe vs. risky gambles output %>% map(tibble, # create tibble for each gamble/list entry "names" = c("a_o1", "b", "a_o2", "a_p1", "b_p"), "values" = c(runif(3, min = lower, max = upper) %>% round(2) %>% # generate outcomes sort(), # prevent dominance: a_o1 < b < a_o2 runif(1, min = .01, max = .99) %>% round(2), # probabilities 1) ) %>% map(pivot_wider, names_from = "names", values_from = "values") %>% # tidy: gambles as obs (rows) map_dfr(as.list) %>% # return tibble including all gambles (row-binding) mutate(a_ev = ev(a_p1, a_o1, a_o2), b_ev = ev(b_p, b), ev_diff = round(a_ev - b_ev, 2), ev_ratio = round(a_ev/b_ev, 2) ) %>% select(a_p1, a_o1, a_o2, a_ev, b_p, b, b_ev, ev_diff, ev_ratio) } else { # risky vs. risky gambles output %>% map(tibble, "names" = c("a_o1", "b_o1", "a_o2", "b_o2", "a_p1", "b_p1"), "values" = c(runif(3, min = lower, max = upper) %>% round(2) %>% sort(), # prevent dominance: a_o1 < b_o1 and/or b_o2 < a_o2 runif(1, min = lower, max = upper) %>% round(2), runif(2, min = .01, max = .99) ) ) %>% map(pivot_wider, names_from = "names", values_from = "values") %>% map_dfr(as.list) %>% mutate(a_ev = ev(a_p1, a_o1, a_o2), b_ev = ev(b_p1, b_o1, b_o2), ev_diff = round(a_ev - b_ev, 2), ev_ratio = round(a_ev/b_ev, 2) ) %>% select(a_p1, a_o1, a_o2, a_ev, b_p1, b_o1, b_o2, b_ev, ev_diff, ev_ratio) } }  linushof committed Jul 01, 2021 100 101   linushof committed Jul 12, 2021 102 {r eval=FALSE}  linushof committed Jul 02, 2021 103 104 105 106 107 108 # generate and select subset of safe-risky gambles set.seed(3211) sr_gambles <- generate_gambles(n = 10000, safe = TRUE, lower = 0, upper = 20) sr_gambles <- sr_gambles %>% mutate(rare = case_when(a_p1 >= .2 & a_p1 <= .8 ~ "None", a_p1 < .2 ~ "Unattractive", # a_o1 < a_o2 a_p1 > .8 ~ "Attractive"))  linushof committed Jul 12, 2021 109 write_rds(sr_gambles, "./R/data/sr_gambles.rds")  linushof committed Jul 02, 2021 110 111 112 113 114 115 116  sr_subset <- tibble() for(i in unique(sr_gambles$rare)) { type <- sr_gambles %>% filter(rare == i) smpl <- sample(seq_len(nrow(type)), size = 20) sr_subset <- bind_rows(sr_subset, type[smpl, ]) }  linushof committed Jul 12, 2021 117 write_rds(sr_subset, "./R/data/sr_subset.rds")  118 119 120  {r}  linushof committed Jul 12, 2021 121 sr_subset <- read_rds("./R/data/sr_subset.rds")  linushof committed Jul 02, 2021 122 123 124 kable(sr_subset)   linushof committed Jul 12, 2021 125   126 ### Parameters  linushof committed Jul 02, 2021 127   128 **Switching probability:** In the simulation framework below,$s$is the probability increment added to the unbiased probability$p = .5$with which agents draw the succesive single sample from the same prospect they get their most recent single sample from.$s$is varied between -.5 to .4 in increments of .1. To ease interpretation during data analysis,$s$is transformed by$s_{rec}= 1-(p + s)$after the simulation such that it ranges from 0 to .9.  linushof committed Jul 01, 2021 129   130 **Boundary type**: Is either the minimum value *any* prospect's sample sum must reach (absolute) or the minimum value for the difference of these sums (relative).  linushof committed Jul 01, 2021 131   132 **Boundary value:** To omit any strict assumptions about the internal boundaries people might apply, we start by varying the parameter value$a$between integers 15 to 35 in increments of 5, for comprehensive sampling respectively. For piecewise sampling, we vary$a$between 1 to 7 in increments of 2. We start with a relatively large parameter range to later explore which parameter values in combination with which other parameter settings produces plausible sample sizes - the accumulated evidence for decisions from experience indicates that people use relatively small samples [e.g. @wulffMetaanalyticReviewTwo2018].  linushof committed Jul 01, 2021 133   134 **Noise parameter:** Representations of the sampled outcomes are assumed to be stochastical. Therefore, Gaussian noise$\epsilon \sim N(0, \sigma)$in units of outcomes is added. To reduce computational load$\sigma$is fixed to .5.  135   136 ## Simulation  linushof committed Jul 01, 2021 137   linushof committed Jul 12, 2021 138 {r}  linushof committed Jul 02, 2021 139 140 # dataset gambles <- sr_subset  linushof committed Jul 12, 2021 141   linushof committed Jul 01, 2021 142   linushof committed Jul 12, 2021 143 144 145 ### Comprehensive sampling {r}  linushof committed Jul 02, 2021 146 147 148 149 150 # parameters parameters <- expand_grid(s = seq(-.5, .4, .1), # switching probability sigma = .5, # noise boundary = c("absolute", "relative")) # boundary type theta_c <- expand_grid(parameters, a = c(15, 20, 25, 30, 35)) # boundaries comprehensive  linushof committed Jul 12, 2021 151 theta_p <- expand_grid(parameters, a = c(1, 3, 5, 7)) # boundaries piecewise  linushof committed Jul 01, 2021 152 153   linushof committed Jul 12, 2021 154 {r}  linushof committed Jul 02, 2021 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 # Moving cumulative sum and mean: Extensions of 'cumsum' and 'cummean'. Other than the base functions, the extensions have an 'na.rm' argument that removes missing values and allows to continue computing the cumulative array even after a missing value occured. For 'na.rm = TRUE', otherwise missing values are replaced by the cumulative sum/mean of all available values up to the respective vector element. If all values in a cumulative array are missing, NA is returned. ## cumsum2() cumsum2 <- function(x, na.rm = FALSE) { output <- vector("double", length(x)) for (i in seq_along(x)) { if(sum(is.na(x[1:i])) == length(x[1:i])) { output[[i]] <- NA } else { output[[i]] <- sum(x[1:i], na.rm = na.rm) } } output } # cummean2() cummean2 <- function(x, na.rm = FALSE) { output <- vector("double", length(x)) for (i in seq_along(x)) { if(sum(is.na(x[1:i])) == length(x[1:i])) { output[[i]] <- NA } else { output[[i]] <- mean(x[1:i], na.rm = na.rm) } } output }   linushof committed Jul 12, 2021 184 {r eval = FALSE}  linushof committed Jul 02, 2021 185 186 187  # simulation  linushof committed Jul 12, 2021 188 theta <- theta_c  linushof committed Jul 02, 2021 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 set.seed(765) param_list <- vector("list", length(nrow(theta))) for (set in seq_len(nrow(theta))) { gamble_list <- vector("list", length(nrow(gambles))) for (gamble in seq_len(nrow(gambles))) { agents_list <- vector("list", n_agents) for (agent in seq_along(1:n_agents)){ ## initial values of an agent's sampling process fd <- tibble() # state of ignorance p <- .5 # no attention bias s <- 0 # no switching at process initiation init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first attend <- init boundary_reached <- FALSE ## agent's sampling process while(boundary_reached == FALSE) { #### draw single sample if(attend == "a") { single_smpl <- gambles[gamble, ] %>% mutate(attended = attend, A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) + round(rnorm(n = 1, mean = 0, sd = theta[[set, "sigma"]]), 2), # gaussian noise B = NA) s <- theta[[set, "s"]] # get switching probability } else { single_smpl <- gambles[gamble, ] %>% mutate(attended = attend, A = NA, B = b + round(rnorm(n = 1, mean = 0, theta[[set, "sigma"]]), 2)) s <- -1*theta[[set, "s"]] } #### integrate single sample into frequency distribution fd <- bind_rows(fd, single_smpl) %>% mutate(A_sum = cumsum2(A, na.rm = TRUE), B_sum = cumsum2(B, na.rm = TRUE)) #### evaluate accumulated evidence if(theta[[set, "boundary"]] == "absolute") { fd <- fd %>% mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A", B_sum >= theta[[set, "a"]] ~ "B")) } else { fd <- fd %>% mutate(diff = round(A_sum - B_sum, 2), choice = case_when(diff >= theta[[set, "a"]] ~ "A", diff <= -1*theta[[set, "a"]] ~ "B")) } if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) { boundary_reached <- TRUE } else { attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) } } agents_list[[agent]] <- expand_grid(agent, fd) } all_agents <- agents_list %>% map_dfr(as.list) gamble_list[[gamble]] <- expand_grid(gamble, all_agents) } all_gambles <- gamble_list %>% map_dfr(as.list) param_list[[set]] <- expand_grid(theta[set, ], all_gambles) }  260 sim_comprehensive <- param_list %>% map_dfr(as.list)  linushof committed Jul 02, 2021 261   linushof committed Jul 12, 2021 262 # store simulation  linushof committed Jul 02, 2021 263   linushof committed Jul 12, 2021 264 write_rds(sim_comprehensive, "./R/data/sim_comprehensive.rds")  265   linushof committed Jul 12, 2021 266 267 268 269 270 271 272 # summarize unique sampling processes summary_comprehensive <- sim_comprehensive %>% group_by(s, sigma, boundary, a, gamble, agent) %>% # group by unique sampling process mutate(n_sample = n(), # number of single samples switch = case_when(attended != lag(attended) ~ 1, attended == lag(attended) ~ 0),  273 274 275  n_switch = sum(switch, na.rm = TRUE), # number of switches a_ev_exp = round(mean(A, na.rm = TRUE), 2), # experienced expected value b_ev_exp = round(mean(B, na.rm = TRUE), 2)) %>%  linushof committed Jul 12, 2021 276 277 278  filter(!is.na(choice)) %>% # only return choice data (last obs of unique sampling process) select(!c(attended, A, B, switch)) %>% ungroup() %>%  279 280 281  mutate(strategy = "comprehensive", s = 1-(s+.5)) %>% select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, A_sum, B_sum, diff, a_ev_exp, b_ev_exp, choice)  linushof committed Jul 12, 2021 282 write_rds(summary_comprehensive, "./R/data/summary_comprehensive.rds")  283 284   linushof committed Jul 02, 2021 285   linushof committed Jul 12, 2021 286 287 288 ### Piecewise sampling {r class.source = "fold-show", eval=FALSE}  linushof committed Jul 02, 2021 289 290 291  # simulation  linushof committed Jul 12, 2021 292 theta <- theta_p  linushof committed Jul 02, 2021 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 set.seed(8739) param_list <- vector("list", length(nrow(theta))) for (set in seq_len(nrow(theta))) { gamble_list <- vector("list", length(nrow(gambles))) for (gamble in seq_len(nrow(gambles))) { agents_list <- vector("list", n_agents) for (agent in seq_along(1:n_agents)){ ## initial values of an agent's sampling process fd <- tibble() # state of ignorance p <- .5 # no attention bias s <- 0 # no switching at process initiation init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first attend <- init round <- 1 boundary_reached <- FALSE ## agent's sampling process while(boundary_reached == FALSE) { #### sampling round smpl_round <- tibble() while(attend == init) { ##### draw single sample from prospect attended first if(attend == "a") { single_smpl <- gambles[gamble, ] %>% mutate(round = round, attended = attend, A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) + round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2), B = NA) s <- theta[[set, "s"]] } else { single_smpl <- gambles[gamble, ] %>% mutate(round = round, attended = attend, A = NA, B = b + round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2)) s <- -1*theta[[set, "s"]] } smpl_round <- bind_rows(smpl_round, single_smpl) attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) } while(attend != init) { ##### draw single sample from prospect attended second if(attend == "a") { single_smpl <- gambles[gamble, ] %>% mutate(round = round, attended = attend, A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) + round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2), B = NA) s <- theta[[set, "s"]] } else { single_smpl <- gambles[gamble, ] %>% mutate(round = round, attended = attend, A = NA, B = b + round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2)) s <- -1*theta[[set, "s"]] } smpl_round <- bind_rows(smpl_round, single_smpl) attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) } ##### compare mean outcomes smpl_round <- smpl_round %>% mutate(A_rmean = cummean2(A, na.rm = TRUE), B_rmean = cummean2(B, na.rm = TRUE), rdiff = A_rmean - B_rmean) smpl_round[[nrow(smpl_round), "A_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] > 0 ~ 1, smpl_round[[nrow(smpl_round), "rdiff"]] <= 0 ~ 0) smpl_round[[nrow(smpl_round), "B_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] >= 0 ~ 0, smpl_round[[nrow(smpl_round), "rdiff"]] < 0 ~ 1) ##### integrate sampling round into frequency distribution fd <- bind_rows(fd, smpl_round) fd[[nrow(fd), "A_sum"]] <- sum(fd[["A_win"]], na.rm = TRUE) fd[[nrow(fd), "B_sum"]] <- sum(fd[["B_win"]], na.rm = TRUE) #### evaluate accumulated evidence if(theta[[set, "boundary"]] == "absolute") { fd <- fd %>% mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A", B_sum >= theta[[set, "a"]] ~ "B")) } else { fd[[nrow(fd), "wdiff"]] <- fd[[nrow(fd), "A_sum"]] - fd[[nrow(fd), "B_sum"]] fd <- fd %>% mutate(choice = case_when(wdiff >= theta[[set, "a"]] ~ "A", wdiff <= -1*theta[[set, "a"]] ~ "B")) } if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) { boundary_reached <- TRUE } else { round <- round + 1 } } agents_list[[agent]] <- expand_grid(agent, fd) } all_agents <- agents_list %>% map_dfr(as.list) gamble_list[[gamble]] <- expand_grid(gamble, all_agents) } all_gambles <- gamble_list %>% map_dfr(as.list) param_list[[set]] <- expand_grid(theta[set, ], all_gambles) } sim_piecewise <- param_list %>% map_dfr(as.list)  linushof committed Jul 12, 2021 412 413 414 415 416 417 418 419 420 421  # store simulation write_rds(sim_piecewise, "./R/data/sim_piecewise.rds") # summarize unique sampling processes summary_piecewise <- sim_piecewise %>% group_by(s, sigma, boundary, a, gamble, agent) %>% mutate(n_sample = n(),  422 423  a_ev_exp = mean(A, na.rm = TRUE), b_ev_exp = mean(B, na.rm = TRUE)) %>%  linushof committed Jul 12, 2021 424 425 426  ungroup() %>% filter(!is.na(choice)) %>% mutate(strategy = "piecewise",  427 428 429 430 431  s = 1-(s+.5), diff = wdiff, n_switch = (round*2)-1) %>% select(!c(attended, A, B, A_rmean, B_rmean, rdiff, A_win, B_win, wdiff)) %>% select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, A_sum, B_sum, diff, a_ev_exp, b_ev_exp, choice)  linushof committed Jul 12, 2021 432 write_rds(summary_piecewise, "./R/data/summary_piecewise.rds")  linushof committed Jul 02, 2021 433 434   435 436 437 438 439 440 441 442 ### Summary {r eval = FALSE} sr_data <- bind_rows(summary_comprehensive, summary_piecewise) %>% mutate(across(c(strategy, boundary, a, gamble, agent, rare, choice), as.factor)) # convert to factor write_rds(sr_data, "./R/data/sr_data.rds")   linushof committed Jul 02, 2021 443 444 ## Risky-risky gambles  linushof committed Jul 12, 2021 445 446 # Descriptive Analysis  447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 {r} data <- read_rds("./R/data/sr_data.rds") gambles <- read_rds("./R/data/sr_subset.rds")  ## Determinants of Sample size {r} data %>% group_by(strategy, boundary, a, s) %>% summarise(median = round(median(n_sample), 0), min = min(n_sample), max = max(n_sample)) %>% arrange(desc(median)) %>% View()  Below, median sample sizes of all strategy-parameter combinations (circles), ranging from$2 \leq \tilde{x} \leq 162\$, are plotted. {r} data %>% group_by(strategy, boundary, a, s) %>% summarise(g = as.factor(cur_group_id()), m = round(median(n_sample), 0)) %>% select(g, everything()) %>% ggplot(.) + geom_point(aes(x = reorder(g, m), y = m, color = a, size = s), alpha = .3) + guides(color = guide_legend(override.aes = list(size = 3) ) ) + facet_grid(boundary~strategy, switch = "y", scales = "free_x") + coord_flip() + scale_x_discrete(breaks = NULL, name = NULL) + scale_y_continuous(breaks = seq(0, 170, 5), name = "Median Sample Size")  ### Boundary value Both sampling strategies show a similiar effect of boundary value (coloring) on sample size. I.e., large boundary values lead to larger sample sizes, reflected by the clustering of colors. The plot below depicts the immediate consequence of the sequential process of evidence accumulation. {r} data %>% group_by(strategy, boundary, a, s) %>% summarise(g = as.factor(cur_group_id()), m = round(median(n_sample), 0)) %>% ggplot(.) + geom_col(aes(x = a, y = m, fill = a)) + facet_grid(boundary~strategy, switch = "y", scales = "free")  ### Boundary type For both sampling strategies, relative (as compared to absolute) boundaries lead to larger sample sizes, which is explained by the fact that sequential sampling can either stabilize or reduce a prospects' distance to absolute boundaries, while the distance to relative boundaries can also be increased. Below this regularity is shown for each gamble. {r} data %>% group_by(strategy, boundary, gamble) %>% summarise(m = median(n_sample)) %>% ggplot(.) + geom_bar(aes(x = strategy, y = m, fill = boundary), stat = "identity", position = "dodge") + facet_wrap(~gamble, nrow = 6) + scale_x_discrete(labels = c("comp", "piece"))  ### Switching probability For piecewise sampling, there is an inverse relationship between switching probability (circle size) and sample size. I.e., the lower the switching probability, the the less frequent prospects are compared and thus, boundaries are only approached with larger sample sizes. The regression plot below shows that this effect is particularly pronounced for low probabilities such that the increase in sample size accelerates as switching probability decreases. As a consequence, the magnitude of the effect of the boundary value increases. {r} data %>% filter(strategy == "piecewise") %>% ggplot(., aes(x = s, y = n_sample, color = a)) + geom_smooth() + scale_x_continuous(name = "Switching Probability") + scale_y_continuous(limits = c(0, 250), name = "Sample Size") + facet_wrap(~boundary, nrow = 2)  For comprehensive sampling, boundary types differ in the effects of switching probability. Regarding absolute boundaries, switching probability has no apparent effect on sample size which can be seen by the clustering of different sized circles, for given boundary values respectively. I.e., the distance of a given prospect to its absolute boundary is not changed by switching to (and sampling from) the other prospect. ... beyond the mechanical effect that with each switch an additional sample must be drawn. For relative boundaries, however, switching probability has a more nuanced effect on sample size. Particularly, regressing sample size on switching probability across all gambles produces the odd behavior of an decelariting inverse relationship for small probabilities and an accelarating positive relationship for larger probabilities. {r} data %>% filter(strategy == "comprehensive") %>% ggplot(., aes(x = s, y = n_sample, color = a)) + geom_smooth() + scale_x_continuous(name = "Switching Probability") + scale_y_continuous(name = "Sample Size") + facet_wrap(~boundary, nrow = 2)  Inspecting gambles separately, one does not observe a U-shaped relation but rather two gamble clusters, one of which shows an inverse relation and the other a positive: {r} data %>% filter(strategy == "comprehensive" & boundary == "relative") %>% ggplot(., aes(x = s, y = n_sample, color = a)) + geom_smooth(method = "lm") + scale_x_continuous(name = "Switching Probability") + scale_y_continuous(name = "Sample Size") + facet_wrap(~gamble, nrow = 5)  Looking at the qualitative difference of the gamble features, one does observe that the positive cluster shows small differences in the expected value (EV) of prospects, whereas the inverse cluster is indicated by larger EV differences. Specifically, for gambles of all kind the distance of a given prospect to its relative boundary is reduced by switching and sampling from the other prospect. If the EV difference is low, then frequent switching will lead to a more oscillating behavior of prospects approaching and moving away from the relative border. Specifically, for small EV differences frequent switching reduces the probability that the relative boundary can be reached with the subsequent sample(s) of a given prospect. {r} data %>% filter(strategy == "comprehensive" & boundary == "relative") %>% filter(ev_ratio >= .5 & ev_ratio <= 1.5) %>% ggplot(., aes(x = s, y = n_sample, color = a)) + geom_smooth(method = "lm") + scale_x_continuous(name = "Switching Probability") + scale_y_continuous(name = "Sample Size") + facet_wrap(~gamble, nrow = 5)  In contrast, a necessary (although not sufficient) condition for large EV differences are large differences between the outcomes of the risky prospect, indicating that one prospect is significantly better than the other. Less frequent switching may thus lead to larger sample sizes than are demanded by the diagnosticity of extreme outcomes. {r} data %>% filter(strategy == "comprehensive" & boundary == "relative") %>% filter(ev_ratio < .5 | ev_ratio > 1.5) %>% ggplot(., aes(x = s, y = n_sample, color = a)) + geom_smooth(method = "lm") + scale_x_continuous(name = "Switching Probability") + scale_y_continuous(name = "Sample Size") + facet_wrap(~gamble, nrow = 5)   linushof committed Jul 12, 2021 578   linushof committed Jul 02, 2021 579 # References  linushof committed Jul 01, 2021 580