Commit 2327f360 authored by Linus Hof's avatar Linus Hof
Browse files

Delete deprecated .Rmd file

parent 7ad48031
---
title: "Sampling Strategies in DfE"
author: "Linus Hof"
date: "2021"
bibliography: sampling-strategies-in-dfe.bib
csl: apa.csl
output:
html_document:
code_folding: hide
toc: yes
toc_float: yes
number_sections: yes
---
Some of the `R code` is folded but can be unfolded by clicking the `Code` buttons.
```{r}
# load packages
pacman::p_load(tidyverse,
knitr)
```
# Prospects
Let a prospect be a *probability space* $(\Omega, P)$ and $\Omega$ the *sample space* containing a finite set of possible outcomes $\{\omega_1, ..., \omega_n\}$ [cf. @kolmogorovFoundationsTheoryProbability1950]. P is then a *probability mass function* (PMF) $P: \Omega \mapsto [0,1]$ which assigns each outcome $\omega_i$ a probability of $0 \leq p_i \leq 1$ with $\sum_{i=1}^{n} p(\omega_i) = 1$.
## Example
Below, each row of the table represents a choice problem with a risky prospect `A` and a safe prospect `B` (one outcome only, "sure thing"), where each outcome falls in the gain range of 0 to 10. Variable strings `_p`, `_o`, and `ev` denote probabilities, outcomes, and the expected values.
```{r}
source("./functions/fun_gambles.R") # call generate_gambles() function
n_gambles <- 5 # number of gambles
set.seed(3211)
gambles <- generate_gambles(n = n_gambles, safe = TRUE, lower = 0, upper = 10)
kable(gambles)
```
# Sampling in Decisions from Experience
In *decisions from experience* [DfE; @hertwigDecisionsExperienceEffect2004], where no summary description of prospects' probability spaces are provided, agents can either first explore them before arriving to a final choice (*sampling paradigm*), or, exploration and exploitation occur simultaneously (*partial-* or *full-feedback paradigm*) [cf. @hertwigDescriptionExperienceGap2009]. Below, only the sampling paradigm is considered.
## Sampling strategies
In the context of gambles, a *single sample* represents an outcome obtained when randomly drawing from a prospect's sample space $\Omega$. Thus, a single sample is the realization of a discrete random variable $X$ defined on $(\Omega, P)$, which can take the value of any real-valued outcome in $\Omega$ according to $P$:
$$\begin{equation}
X: \Omega \mapsto \mathbb{R}
\end{equation}$$
In general terms, we define a *sampling strategy* as a systematic approach to generate a sequence of single samples from a gamble's prospects as a means of exploring their probability spaces. Single samples that are generated from the same prospect reflect a sequence of realizations of random variables that are independent and identically distributed.
## Comprehensive sampling strategy
In *comprehensive sampling* [@hillsInformationSearchDecisions2010], for each prospect single samples are drawn in direct succession before sampling from another prospect.
### Integration and decision strategy
In comprehensive sampling, single samples of the same prospect are assumed to be integrated into an empirical outcome distribution. This can be considered a special instance of the more general case that, irrespective of the sampling strategy, realizations of the *random variables of interest* are integrated into a prospect's frequency distribution. We then know from the law of large numbers that the relative frequencies of outcomes in these distributions should approximate the probabilities of the respective random variable as the sample size increases - for comprehensive sampling, it follows that the mean of the frequency distribution associated with a prospect approximates its expected value (EV). Prospects are consequentially assumend to be chosen on the basis of a mean comparison [*"summary strategy"*, @hillsInformationSearchDecisions2010].
### Example
A synthetic agent applies a comprehensive sampling strategy to explore the probability spaces of the first gamble (see above) and applies the associated integration- and decision strategy. For demonstrative purposes, it is assumed that the agent draws five consecutive single samples from each prospect.
The table below summarizes the simulated process. Each row represents a single sample drawn from one of the prospects. Outcomes associated with a single sample are given in column `A` and `B`. `A_mean` and `B_mean` represent the cumulative means across outcomes. `diff` is the difference of means. `choice` indicates which prospect is chosen on the basis of a mean comparison.
```{r}
source("./functions/fun_moving_stats.R") # call function cumsum2() and cummean2
fd <- tibble() # frequency distribution (start sampling in a state of ignorance)
n_smpls <- 5 # number of single samples
set.seed(345)
# draw series of single samples from prospect A
for (i in seq_along(1:n_smpls)) {
single_smpl <- gambles[1, ] %>% # get gamble features
mutate(A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)),
B = NA)
fd <- bind_rows(fd, single_smpl) %>%
mutate(A_mean = cummean2(A, na.rm = TRUE),
B_mean = cummean2(B, na.rm = TRUE))
}
# draw series of single samples from prospect B
for (i in seq_along(1:n_smpls)) {
single_smpl <- gambles[1, ] %>%
mutate(A = NA,
B = b)
fd <- bind_rows(fd, single_smpl) %>%
mutate(A_mean = cummean2(A, na.rm = TRUE),
B_mean = cummean2(B, na.rm = TRUE))
}
# choose option with larger mean
fd[[nrow(fd), "diff"]] <- fd[[nrow(fd), "A_mean"]] - fd[[nrow(fd), "B_mean"]]
fd[[nrow(fd), "choice"]] <- case_when(fd[[nrow(fd), "diff"]] > 0 ~ "A",
fd[[nrow(fd), "diff"]] < 0 ~ "B")
kable(fd)
```
Assuming a perfectly unnoisy sampling-, integration-, and decision process, the synthetic agent chooses prospect `A` over prospect `B`.
## Piecewise sampling strategy
In *piecewise sampling* [@hillsInformationSearchDecisions2010], single samples from different prospects are drawn in direct succession.
### Integration and decision strategy
In piecewise sampling, it is assumed that single samples of different prospects are compared against each other [@hillsInformationSearchDecisions2010]. Here, we define a new discrete random variable on a probability space $(\Omega, \Sigma, P)$, where $\Omega$ is a set of all possible combinations of outcomes from different prospects in a gamble written as a fraction. $\Sigma$ is a set of subsets of $\Omega$, i.e., the event space $\{\varsigma_1, ...,\varsigma_n\}$, and $P$ is the joint probability mass function of a gamble's prospects. The random variable maps $\Sigma$ to the measurable space $E = \{0, 1\}$ as follows:
$$\begin{equation}
X(\omega_i) = \left\{
\begin{array}{l}
0, & if\ \omega_i\ \in\ \varsigma_s\ \leq\ 1 ,\\
1, & if\ \omega_i\ \in\ \varsigma_g\ >\ 1,
\end{array}
\right.
\end{equation}$$
where subset $\varsigma_s$ contains all fractions $\omega_i \leq 1$, i.e., an outcome of a given prospect is smaller or equal to the outcome of the other prospect. $\varsigma_g$ contains all $\omega_i > 1$. Since the measurable space consists of only two values $\{0, 1\}$, in piecewise sampling the frequency distribution of the random variable of interest (i.e., win vs. no win) is always a Bernoulli distribution, irrespective of the number of different outcomes of a prospect. Prospects are consequentially assumend to be chosen on the basis of a comparison of the number of wins [*"round-wise strategy"*, @hillsInformationSearchDecisions2010].
### Example
By alternating back and forth between prospects `A` and `B`, a synthetic agent applies a piecewise sampling strategy (and the associated integration- and decision strategy) while exploring the probability spaces of the same gamble as before when comprehensive sampling was applied. Again, five single samples are drawn from each prospect.
The table below summarizes the simulated sampling process. Each row represents a single sample drawn from on of the prospects. Outcomes associated with a single sample are given in column `A` and `B`. After every second single sample, i.e., when an outcome from `A` and `B` is drawn, prospects are compared: `diff` is the difference between outcomes; `A_win` and `B_win` denote which of the outcomes was larger, i.e., the realizations of the random variable. `A_sum` and `B_sum` denote the cumulative number of comparisons in favor of a prospect. `choice` indicates which prospect is chosen on the basis of all comparisons.
```{r echo=TRUE}
fd <- tibble() # state of ignorance
n_smpls <- 5 # number of single samples
set.seed(345)
# sampling round
for (i in seq_along(1:n_smpls)) {
smpl_round <- tibble()
## draw single sample from prospect A
single_smpl <- gambles[1, ] %>%
mutate(A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)),
B = NA)
smpl_round <- bind_rows(smpl_round, single_smpl)
## draw single sample from prospect B
single_smpl <- gambles[1, ] %>%
mutate(A = NA,
B = b)
smpl_round <- bind_rows(smpl_round, single_smpl)
## compare outcomes
smpl_round <- smpl_round %>%
mutate(diff = cummean2(A, na.rm = TRUE) - cummean2(B, na.rm = TRUE),
A_win = case_when(diff > 0 ~ 1,
diff <= 0 ~ 0),
B_win = case_when(diff < 0 ~ 1,
diff >= 0 ~ 0))
# integrate sampling round into frequency distribution
fd <- bind_rows(fd, smpl_round)
}
# choose prospect that won more comparisons
fd <- fd %>% mutate(A_sum = cumsum2(A_win, na.rm = TRUE),
B_sum = cumsum2(B_win, na.rm = TRUE))
fd[[nrow(fd), "choice"]] <- case_when(fd[[nrow(fd), "A_sum"]] > fd[[nrow(fd), "B_sum"]] ~ "A",
fd[[nrow(fd), "A_sum"]] < fd[[nrow(fd), "B_sum"]] ~ "B")
kable(fd)
```
Assuming a perfectly unnoisy sampling-, integration-, and decision process, the agent chooses prospect `B` over prospect `A`. Thus, as previously demonstrated by Hills and Hertwig [-@hillsInformationSearchDecisions2010], different sampling strategies can in theory produce different choices on the basis of the same set of single samples.
The very starting point of these eventual variations in choice behavior are assumed to be differences in the random experiments that are repeatedly performed under the comprehensive and the piecewise sampling approach. These differences in the underlying random process are assumed to interact with the structure of the environment, i.e., the features of a gamble's prospects, and other aspects of the sampling and decision behavior.
# Computational Framework for Sequential Sampling Strategies
Within the scope of this work, both comprehensive and piecewise sampling are assumed to be *sequential sampling* strategies. I.e., by generating sequences of single samples of a gamble's prospects, agents sequentially accumulate information about the probability spaces and thereby form a preference for one prospect over the other.
Below, the framework of comprehensive and piecewise sampling is extended to hybrids therof, acknowledging that people are likely to apply sampling strategies that deviate from the pure cases [cf. @hillsInformationSearchDecisions2010] and accounting for modeling issues that arise when applying them to situations in which the number of single samples is not fixed *a priori*.
## Autonomous sampling and model parameters
The above proposition of a sequential sampling process does not require sampling to be *autonomous*, i.e., no *a priori* fixed number of single samples, however, in many experimental paradigms, and arguably in the majority of real world situations, this is the case. For such cases, it is assumed that termination of the sampling process and choice are determined by reaching a boundary at which a preference for one of the prospects could be formed.
### Boundaries
Such a boundary can be defined as the minimum value a count or other form of summary statistic over the sequences of realized random variables must arrive at. We will compare different types of boundaries (absolute vs. relative) and introduce the boundary parameter $a$ (denoting the boundary value) into the computational model of sampling strategies.
### Switching probability
Due to the reasons described above (autonomous sampling, deviations from pure cases, etc.), we introduce a switching probability parameter $s$ to allow for variation in the probability with which agents draw the succesive single sample from the same prospect they got their most recent single sample from. For $\lim \limits_{s \to 0}$ perfect comprehensive sampling is approximated, for $\lim \limits_{s \to 1}$, perfect piecewise sampling is approximated.
As values of $s < 1$ allow for drawing consecutive single samples from the same prospect, it was elsewhere [see Notes in @hillsInformationSearchDecisions2010] proposed that for piecewise sampling, round-wise comparisons between prospects can also be made on the basis of the means of multiple single samples. As a downside, this somewhat complicates the definition of the respective random variable. As an upside, however, both sampling strategies can be considered special instances of one another.
### Noise
The representation of the outcomes sampled from the probability spaces is assumed to be stochastical. Therefore, we add Gaussian noise $\epsilon \sim N(0, \sigma)$ in units of the outcomes.
## Simulation
Below, `code` for the computational framework of both sampling strategies is displayed, including the parameters discussed above. However, parameter values are chosen arbitrarily and must be adapted according to the particularities of the investigation.
```{r eval=FALSE, class.source = "fold-show"}
n_agents <- 1 # number of gambles
gambles <- gambles # a tibble of gamble features (see above)
# parameters
parameters <- expand_grid(s = 0, # probability increment to sampling probability of p = .5
sigma = .1, # noise
boundary = c("absolute", "relative")) # boundary type
theta_c <- expand_grid(parameters, a = 10) # boundaries comprehensive (in units of outcomes)
theta_p <- expand_grid(parameters, a = 1) # boundaries piecewise (in units of wins)
```
### Comprehensive Sampling
```{r eval = FALSE, class.source = "fold-show"}
# simulation
theta <- theta_c
set.seed(765)
param_list <- vector("list", length(nrow(theta)))
for (set in seq_len(nrow(theta))) {
gamble_list <- vector("list", length(nrow(gambles)))
for (gamble in seq_len(nrow(gambles))) {
agents_list <- vector("list", n_agents)
for (agent in seq_along(1:n_agents)){
## initial values of an agent's sampling process
fd <- tibble() # state of ignorance
p <- .5 # no attention bias
s <- 0 # no switching at process initiation
init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first
attend <- init
boundary_reached <- FALSE
## agent's sampling process
while(boundary_reached == FALSE) {
#### draw single sample
if(attend == "a") {
single_smpl <- gambles[gamble, ] %>%
mutate(attended = attend,
A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
round(rnorm(n = 1, mean = 0, sd = theta[[set, "sigma"]]), 2), # gaussian noise
B = NA)
s <- theta[[set, "s"]] # get switching probability
} else {
single_smpl <- gambles[gamble, ] %>%
mutate(attended = attend,
A = NA,
B = b + # for safe-risky gambles
round(rnorm(n = 1, mean = 0, theta[[set, "sigma"]]), 2))
s <- -1*theta[[set, "s"]]
}
#### integrate single sample into frequency distribution
fd <- bind_rows(fd, single_smpl) %>%
mutate(A_sum = cumsum2(A, na.rm = TRUE),
B_sum = cumsum2(B, na.rm = TRUE))
#### evaluate accumulated evidence
if(theta[[set, "boundary"]] == "absolute") {
fd <- fd %>%
mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A",
B_sum >= theta[[set, "a"]] ~ "B"))
} else {
fd <- fd %>%
mutate(diff = round(A_sum - B_sum, 2),
choice = case_when(diff >= theta[[set, "a"]] ~ "A",
diff <= -1*theta[[set, "a"]] ~ "B"))
}
if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) {
boundary_reached <- TRUE
} else {
attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
}
}
agents_list[[agent]] <- expand_grid(agent, fd)
}
all_agents <- agents_list %>% map_dfr(as.list)
gamble_list[[gamble]] <- expand_grid(gamble, all_agents)
}
all_gambles <- gamble_list %>% map_dfr(as.list)
param_list[[set]] <- expand_grid(theta[set, ], all_gambles)
}
sim_comprehensive <- param_list %>% map_dfr(as.list)
```
### Piecewise Sampling
```{r eval = FALSE, class.source = "fold-show"}
# simulation
theta <- theta_p
set.seed(8739)
param_list <- vector("list", length(nrow(theta)))
for (set in seq_len(nrow(theta))) {
gamble_list <- vector("list", length(nrow(gambles)))
for (gamble in seq_len(nrow(gambles))) {
agents_list <- vector("list", n_agents)
for (agent in seq_along(1:n_agents)){
## initial values of an agent's sampling process
fd <- tibble() # state of ignorance
p <- .5 # no attention bias
s <- 0 # no switching at process initiation
init <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s)) # prospect attended first
attend <- init
round <- 1
boundary_reached <- FALSE
## agent's sampling process
while(boundary_reached == FALSE) {
#### sampling round
smpl_round <- tibble()
while(attend == init) {
##### draw single sample from prospect attended first
if(attend == "a") {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2),
B = NA)
s <- theta[[set, "s"]]
} else {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = NA,
B = b +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2))
s <- -1*theta[[set, "s"]]
}
smpl_round <- bind_rows(smpl_round, single_smpl)
attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
}
while(attend != init) {
##### draw single sample from prospect attended second
if(attend == "a") {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = sample(x = c(a_o1, a_o2), size = 1, prob = c(a_p1, 1-a_p1)) +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2),
B = NA)
s <- theta[[set, "s"]]
} else {
single_smpl <- gambles[gamble, ] %>%
mutate(round = round,
attended = attend,
A = NA,
B = b +
round(rnorm(1, mean = 0, sd = theta[[set, "sigma"]]), 2))
s <- -1*theta[[set, "s"]]
}
smpl_round <- bind_rows(smpl_round, single_smpl)
attend <- sample(c("a", "b"), size = 1, prob = c(p + s, p - s))
}
##### compare mean outcomes
smpl_round <- smpl_round %>%
mutate(A_rmean = cummean2(A, na.rm = TRUE),
B_rmean = cummean2(B, na.rm = TRUE),
rdiff = A_rmean - B_rmean)
smpl_round[[nrow(smpl_round), "A_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] > 0 ~ 1,
smpl_round[[nrow(smpl_round), "rdiff"]] <= 0 ~ 0)
smpl_round[[nrow(smpl_round), "B_win"]] <- case_when(smpl_round[[nrow(smpl_round), "rdiff"]] >= 0 ~ 0,
smpl_round[[nrow(smpl_round), "rdiff"]] < 0 ~ 1)
##### integrate sampling round into frequency distribution
fd <- bind_rows(fd, smpl_round)
fd[[nrow(fd), "A_sum"]] <- sum(fd[["A_win"]], na.rm = TRUE)
fd[[nrow(fd), "B_sum"]] <- sum(fd[["B_win"]], na.rm = TRUE)
#### evaluate accumulated evidence
if(theta[[set, "boundary"]] == "absolute") {
fd <- fd %>%
mutate(choice = case_when(A_sum >= theta[[set, "a"]] ~ "A",
B_sum >= theta[[set, "a"]] ~ "B"))
} else {
fd[[nrow(fd), "wdiff"]] <- fd[[nrow(fd), "A_sum"]] - fd[[nrow(fd), "B_sum"]]
fd <- fd %>%
mutate(choice = case_when(wdiff >= theta[[set, "a"]] ~ "A",
wdiff <= -1*theta[[set, "a"]] ~ "B"))
}
if(is.na(fd[[nrow(fd), "choice"]]) == FALSE) {
boundary_reached <- TRUE
} else {
round <- round + 1
}
}
agents_list[[agent]] <- expand_grid(agent, fd)
}
all_agents <- agents_list %>% map_dfr(as.list)
gamble_list[[gamble]] <- expand_grid(gamble, all_agents)
}
all_gambles <- gamble_list %>% map_dfr(as.list)
param_list[[set]] <- expand_grid(theta[set, ], all_gambles)
}
sim_piecewise <- param_list %>% map_dfr(as.list)
```
# References
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment