Commit 4d0168e6 authored by linushof's avatar linushof
Browse files

Set up of pilot study (comprehensive sampling)

parent 945ae0f3
......@@ -299,7 +299,9 @@ The representation of the outcomes sampled from the probability spaces is assume
## Simulation
```{r class.source = "fold-show", cache = TRUE}
Below, `code` for the computational framework of sampling strategies is displayed, including the parameters discussed above. However, parameter values are chosen arbitrarily.
```{r class.source = "fold-show", eval = FALSE}
# dataset
......
......@@ -2,43 +2,405 @@
title: "Sampling Strategies in DfE - Pilot study"
author: "Linus Hof"
date: "2021"
output: html_document
bibliography: sampling-strategies-in-dfe.bib
csl: apa.csl
output:
html_document:
code_folding: hide
toc: yes
toc_float: yes
number_sections: yes
---
```{r}
# load packages
pacman::p_load(tidyverse,
knitr)
```
# Study Description
In this pilot study, choice data will be generated by applying the *comprehensive* and *piecewise sampling strategy* and hybrids thereof to a series of two-prospect gambles.
The simulated data will be explored for characteristic patterns of (or differences between) sampling strategies under varying structures of the environment, i.e., the features of a gamble's prospects, and other aspects of the sampling and decision behavior (model parameters).
The simulated data will be explored for characteristic patterns of (or differences between) sampling strategies under varying structures of the choice environment, i.e., the features of a gamble's prospects, and other aspects of the sampling and decision behavior (model parameters).
# Dataset
## Agents
Under each condition (sampling strategy x all possible parameter settings), all gambles are played by N = 150 synthetic agents.
Under each condition (sampling strategy combined with all possible parameter settings), all gambles are played by 100 synthetic agents.
```{r}
n_agents <- 100
```
## Gambles
Two different types of two-prospect gambles will be tested: (a) Gambles, in which one of the prospects contains a safe outcome only and the other two risky outcomes (safe/risky gambles). (b) Gambles, in which both prospects contain of two risky outcomes (risky/risky gambles).
Two different types of two-prospect gambles will be tested: (a) Gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*). (b) Gambles, in which both prospects contain two risky outcomes (*risky-risky gambles*).
All outcomes are in the gain range $\omega_i \geq 0$.
Large parts of the procedure for generating and selecting gambles is similar to those in Rieskamp [-@rieskampProbabilisticNaturePreferential2008] and Erev et al. [-@erevChoicePredictionCompetition2010]. For both gamble types, a set of 10,000 gambles is generated. Outcomes are drawn from a uniform distribution ranging from 0 to 20, rounded to two digits after the decimal point. To omit dominant prospects, the safe outcome in safe-risky gambles must fall between the two outcomes of the risky prospect; in risky-risky gambles, at least one outcome of one of the prospects must fall between the two outcomes of the other prospect. For all risky prospects, the probability of the lower outcome $p_L$ is drawn from a uniform distribution ranging from .01 to .99. The probability of the higher outcome $p_H$ is $1-p_L$, respectively.
```{r}
n_agents <- 150
### Safe-risky gambles
For the safe-risky set, an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome is randomly selected from the set of 10,000, amounting to a total of 60 gambles that are played under each condition (see above) by all agents. Here, risky outcomes are considered "rare" if their probability is $p < .2$ and "attractive" if they are higher than the safe outcome. Vice versa, risky outcomes are considered "unattractive" if they are lower than the safe outcome.
The set of 60 safe-risky gambles is given in the table below.
```{r eval=FALSE}
generate_gambles <- function(n, safe = TRUE, lower, upper) {
# n: number of gambles
# safe: gamble typ; TRUE (default) = safe vs. risky option; FALSE = risky options only
# lower, upper: lower and upper boundary of outcome range
ev <- function(p1, o1, o2 = 0) {
round(p1 * o1 + (1-p1) * o2, digits = 2) # expected value for n <= 2 outcomes
}
output <- vector("list", n)
if(safe == TRUE) {
# safe vs. risky gambles
output %>%
map(tibble, # create tibble for each gamble/list entry
"names" = c("a_o1", "b", "a_o2", "a_p1", "b_p"),
"values" = c(runif(3, min = lower, max = upper) %>% round(2) %>% # generate outcomes
sort(), # prevent dominance: a_o1 < b < a_o2
runif(1, min = .01, max = .99) %>% round(2), # probabilities
1)
) %>%
map(pivot_wider, names_from = "names", values_from = "values") %>% # tidy: gambles as obs (rows)
map_dfr(as.list) %>% # return tibble including all gambles (row-binding)
mutate(a_ev = ev(a_p1, a_o1, a_o2),
b_ev = ev(b_p, b),
ev_diff = round(a_ev - b_ev, 2),
ev_ratio = round(a_ev/b_ev, 2)
) %>%
select(a_p1, a_o1, a_o2, a_ev, b_p, b, b_ev, ev_diff, ev_ratio)
} else {
# risky vs. risky gambles
output %>%
map(tibble,
"names" = c("a_o1", "b_o1", "a_o2", "b_o2", "a_p1", "b_p1"),
"values" = c(runif(3, min = lower, max = upper) %>% round(2) %>%
sort(), # prevent dominance: a_o1 < b_o1 and/or b_o2 < a_o2
runif(1, min = lower, max = upper) %>% round(2),
runif(2, min = .01, max = .99)
)
) %>%
map(pivot_wider, names_from = "names", values_from = "values") %>%
map_dfr(as.list) %>%
mutate(a_ev = ev(a_p1, a_o1, a_o2),
b_ev = ev(b_p1, b_o1, b_o2),
ev_diff = round(a_ev - b_ev, 2),
ev_ratio = round(a_ev/b_ev, 2)
) %>%
select(a_p1, a_o1, a_o2, a_ev, b_p1, b_o1, b_o2, b_ev, ev_diff, ev_ratio)
}
}
```
# Model parameters
**Switching probability:** $s$ is the positive (negative) probability increment added to (subtracted from) the unbiased attendance probability $p = .5$ with which agents draw the succesive single sample from the prospect they did not get their most recent single sample from. We vary $s$ between 0 to .4 in increments of .1.
```{r class.source = "fold-show", eval=FALSE}
# generate and select subset of safe-risky gambles
set.seed(3211)
sr_gambles <- generate_gambles(n = 10000, safe = TRUE, lower = 0, upper = 20)
sr_gambles <- sr_gambles %>% mutate(rare = case_when(a_p1 >= .2 & a_p1 <= .8 ~ "None",
a_p1 < .2 ~ "Unattractive", # a_o1 < a_o2
a_p1 > .8 ~ "Attractive"))
write_rds(sr_gambles, "sr_gambles.rds")
```
```{r class.source = "fold-show"}
sr_gambles <- read_rds("sr_gambles.rds")
sr_subset <- tibble()
for(i in unique(sr_gambles$rare)) {
type <- sr_gambles %>% filter(rare == i)
smpl <- sample(seq_len(nrow(type)), size = 20)
sr_subset <- bind_rows(sr_subset, type[smpl, ])
}
kable(sr_subset)
```
### Risky-risky gambles
# Choice Data
## Model Parameters
**Switching probability:** $s$ is the probability increment added to the unbiased attendance probability $p = .5$ with which agents draw the succesive single sample from the same prospect they get their most recent single sample from. We vary $s$ between -.5 to .4 in increments of .1.
**Boundary type**: Can either be the minimum value *any* prospect's sum of random variable realizations must reach (absolute boundary) or the minimum value for the difference of these sums (relative boundary).
**Boundary value:**
**Boundary value:** see below
**Noise parameter:**
**Noise parameter:** The representation of the outcomes sampled from the probability spaces is assumed to be stochastical. Therefore, we add Gaussian noise $\epsilon \sim N(0, \sigma)$ in units of the outcomes. We fix $\sigma$ to .5.
```{r}
```{r class.source = "fold-show"}
# dataset
gambles <- sr_subset
# parameters
parameters <- expand_grid(s = seq(-.5, .4, .1), # switching probability
sigma = .5, # noise
boundary = c("absolute", "relative")) # boundary type
theta_c <- expand_grid(parameters, a = c(15, 20, 25, 30, 35)) # boundaries comprehensive
```
## Safe-risky gambles
### Comprehensive sampling
```{r class.source = "fold-show"}
# Moving cumulative sum and mean: Extensions of 'cumsum' and 'cummean'. Other than the base functions, the extensions have an 'na.rm' argument that removes missing values and allows to continue computing the cumulative array even after a missing value occured. For 'na.rm = TRUE', otherwise missing values are replaced by the cumulative sum/mean of all available values up to the respective vector element. If all values in a cumulative array are missing, NA is returned.
## cumsum2()
cumsum2 <- function(x, na.rm = FALSE) {
output <- vector("double", length(x))
for (i in seq_along(x)) {
if(sum(is.na(x[1:i])) == length(x[1:i])) {
output[[i]] <- NA
} else {
output[[i]] <- sum(x[1:i], na.rm = na.rm)
}
}
output
}
# cummean2()
cummean2 <- function(x, na.rm = FALSE) {
output <- vector("double", length(x))
for (i in seq_along(x)) {
if(sum(is.na(x[1:i])) == length(x[1:i])) {
output[[i]] <- NA
} else {
output[[i]] <- mean(x[1:i], na.rm = na.rm)
}
}
output
}
```
```{r class.source = "fold-show", eval = FALSE}
theta <- theta_c
n_agents <- 2
# simulation
set.seed(765)
param_list <- vector("list", length(nrow(theta)))
for (set in seq_len(nrow(theta))) {
gamble_list <- vector("list", length(nrow(gambles)))
for (gamble in seq_len(nrow(gambles))) {
agents_list <- vector("list", n_agents)
for (agent in seq_along(1:n_agents)){
## initial values of an agent's sampling process
fd <- tibble() # state of ignorance
p <- .5 # no attention bias
s <- 0 # no switching at process initiation
init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first
attend <- init
boundary_reached <- FALSE
## agent's sampling process
while(boundary_reached == FALSE) {
#### draw single sample
if(attend == "a") {
single_smpl <- gambles[gamble, ] %>%
mutate(attended = attend,
A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
round(rnorm(n = 1, mean = 0, sd = theta[[set, "sigma"]]), 2), # gaussian noise
B = NA)
s <- theta[[set, "s"]] # get switching probability
} else {
single_smpl <- gambles[gamble, ] %>%
mutate(attended = attend,
A = NA,
B = b +
round(rnorm(n = 1, mean = 0, theta[[set, "sigma"]]), 2))
s <- -1*theta[[set, "s"]]
}
#### integrate single sample into frequency distribution
fd <- bind_rows(fd, single_smpl) %>%
mutate(A_sum = cumsum2(A, na.rm = TRUE),
B_sum = cumsum2(B, na.rm = TRUE))
#### evaluate accumulated evidence
if(theta[[set, "boundary"]] == "absolute") {
fd <- fd %>%
mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A",
B_sum >= theta[[set, "a"]] ~ "B"))
} else {
fd <- fd %>%
mutate(diff = round(A_sum - B_sum, 2),
choice = case_when(diff >= theta[[set, "a"]] ~ "A",
diff <= -1*theta[[set, "a"]] ~ "B"))
}
if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) {
boundary_reached <- TRUE
} else {
attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
}
}
agents_list[[agent]] <- expand_grid(agent, fd)
}
all_agents <- agents_list %>% map_dfr(as.list)
gamble_list[[gamble]] <- expand_grid(gamble, all_agents)
}
all_gambles <- gamble_list %>% map_dfr(as.list)
param_list[[set]] <- expand_grid(theta[set, ], all_gambles)
}
sim_comprehensive_2 <- param_list %>% map_dfr(as.list)
object.size(sim_comprehensive_1)
```
### Piecewise sampling
```{r eval=FALSE, include=FALSE}
theta <- theta_p
# simulation
set.seed(8739)
param_list <- vector("list", length(nrow(theta)))
for (set in seq_len(nrow(theta))) {
gamble_list <- vector("list", length(nrow(gambles)))
for (gamble in seq_len(nrow(gambles))) {
agents_list <- vector("list", n_agents)
for (agent in seq_along(1:n_agents)){
## initial values of an agent's sampling process
fd <- tibble() # state of ignorance
p <- .5 # no attention bias
s <- 0 # no switching at process initiation
init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first
attend <- init
round <- 1
boundary_reached <- FALSE
## agent's sampling process
while(boundary_reached == FALSE) {
#### sampling round
smpl_round <- tibble()
while(attend == init) {
##### draw single sample from prospect attended first
if(attend == "a") {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2),
B = NA)
s <- theta[[set, "s"]]
} else {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = NA,
B = b +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2))
s <- -1*theta[[set, "s"]]
}
smpl_round <- bind_rows(smpl_round, single_smpl)
attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
}
while(attend != init) {
##### draw single sample from prospect attended second
if(attend == "a") {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2),
B = NA)
s <- theta[[set, "s"]]
} else {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = NA,
B = b +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2))
s <- -1*theta[[set, "s"]]
}
smpl_round <- bind_rows(smpl_round, single_smpl)
attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
}
##### compare mean outcomes
smpl_round <- smpl_round %>%
mutate(A_rmean = cummean2(A, na.rm = TRUE),
B_rmean = cummean2(B, na.rm = TRUE),
rdiff = A_rmean - B_rmean)
smpl_round[[nrow(smpl_round), "A_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] > 0 ~ 1,
smpl_round[[nrow(smpl_round), "rdiff"]] <= 0 ~ 0)
smpl_round[[nrow(smpl_round), "B_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] >= 0 ~ 0,
smpl_round[[nrow(smpl_round), "rdiff"]] < 0 ~ 1)
##### integrate sampling round into frequency distribution
fd <- bind_rows(fd, smpl_round)
fd[[nrow(fd), "A_sum"]] <- sum(fd[["A_win"]], na.rm = TRUE)
fd[[nrow(fd), "B_sum"]] <- sum(fd[["B_win"]], na.rm = TRUE)
#### evaluate accumulated evidence
if(theta[[set, "boundary"]] == "absolute") {
fd <- fd %>%
mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A",
B_sum >= theta[[set, "a"]] ~ "B"))
} else {
fd[[nrow(fd), "wdiff"]] <- fd[[nrow(fd), "A_sum"]] - fd[[nrow(fd), "B_sum"]]
fd <- fd %>%
mutate(choice = case_when(wdiff >= theta[[set, "a"]] ~ "A",
wdiff <= -1*theta[[set, "a"]] ~ "B"))
}
if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) {
boundary_reached <- TRUE
} else {
round <- round + 1
}
}
agents_list[[agent]] <- expand_grid(agent, fd)
}
all_agents <- agents_list %>% map_dfr(as.list)
gamble_list[[gamble]] <- expand_grid(gamble, all_agents)
}
all_gambles <- gamble_list %>% map_dfr(as.list)
param_list[[set]] <- expand_grid(theta[set, ], all_gambles)
}
sim_piecewise <- param_list %>% map_dfr(as.list)
```
## Risky-risky gambles
# References
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment