Commit d5091b9c authored by linushof's avatar linushof
Browse files

Changes in data transformation (summary data sets), data analysis (draft sample size)

parent acb96eac
......@@ -14,37 +14,31 @@ output:
```{r}
# load packages
pacman::p_load(tidyverse,
pacman::p_load(tidyverse,
knitr)
```
# Study Description
In this pilot study, choice data will be generated by applying the *comprehensive* and *piecewise sampling strategy* to a series of two-prospect gambles.
Choice data will be generated by applying the *comprehensive-* and *piecewise sampling strategy* to a series of 2-prospect gambles.
The simulated data will be explored for characteristic patterns of (or differences between) sampling strategies under varying structures of the choice environment, i.e., the features of a gamble's prospects, and aspects of the sampling and decision behavior (model parameters).
The simulated data will be explored for characteristic patterns of sampling strategies under varying structures of the choice environment, i.e., features of a gamble's prospects, and aspects of the sampling and decision behavior (model parameters).
# Choice Data
## Method
### Agents
### Agents
Under each condition (i.e., sampling strategy combined with all possible parameter settings), all gambles are played by 100 synthetic agents.
Under each condition, i.e., strategy-parameter combinations, all gambles are played by 100 synthetic agents.
```{r}
n_agents <- 100
```
### Gambles
Two different types of two-prospect gambles will be tested: (a) Gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*). (b) Gambles, in which both prospects contain two risky outcomes (*risky-risky gambles*).
Large parts of the procedure for generating and selecting gambles is similar to those in Rieskamp [-@rieskampProbabilisticNaturePreferential2008] and Erev et al. [-@erevChoicePredictionCompetition2010]. For both gamble types, a set of 10,000 gambles is generated. Outcomes are drawn from a uniform distribution ranging from 0 to 20, rounded to two digits after the decimal point. To omit dominant prospects, the safe outcome in safe-risky gambles must fall between the two outcomes of the risky prospect (medium outcome); in risky-risky gambles, at least one outcome of one of the prospects must fall between the two outcomes of the other prospect. For all risky prospects, the probability of the lower outcome $p_L$ is drawn from a uniform distribution ranging from .01 to .99. The probability of the higher outcome $p_H$ is $1-p_L$, respectively.
#### Safe-risky gambles
### Gambles
For the safe-risky set, an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome is randomly selected from the set of 10,000, amounting to a total of 60 gambles that are played under each condition (see above) by all agents. Here, risky outcomes are considered "rare" if their probability is $p < .2$ and "attractive" if they are higher than the safe outcome. Vice versa, risky outcomes are considered "unattractive" if they are lower than the safe outcome.
The set of 60 safe-risky gambles is given in the table below.
A set of gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*) will be tested. Therefore, 60 gambles from a initial set of 10,000 are sampled. Both, outcomes and probabilities are drawn from uniform distributions, ranging from 0 to 20 for outcomes and from .01 to .99 for probabilities of the lower risky outcomes $p_L$. The probabilities of the higher risky outcomes are $1-p_L$, respectively. To omit dominant prospects, safe outcome fall between both risky outcomes. The table below contains the test set of 60 gambles. Sampling of gambles was stratified, randomly drawing an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome. Risky outcomes are considered *"rare"* if their probability is $p < .2$ and *"attractive"* (*"unattractive"*) if they are higher (lower) than the safe outcome.
```{r eval = FALSE}
generate_gambles <- function(n, safe = TRUE, lower, upper) {
......@@ -128,23 +122,18 @@ sr_subset <- read_rds("./R/data/sr_subset.rds")
kable(sr_subset)
```
#### Risky-risky gambles
# Choice Data
## Model Parameters
### Parameters
**Switching probability:** $s$ is the probability increment added to the unbiased attendance probability $p = .5$ with which agents draw the succesive single sample from the same prospect they get their most recent single sample from. We vary $s$ between -.5 to .4 in increments of .1.
**Switching probability:** In the simulation framework below, $s$ is the probability increment added to the unbiased probability $p = .5$ with which agents draw the succesive single sample from the same prospect they get their most recent single sample from. $s$ is varied between -.5 to .4 in increments of .1. To ease interpretation during data analysis, $s$ is transformed by $s_{rec}= 1-(p + s)$ after the simulation such that it ranges from 0 to .9.
**Boundary type**: Can either be the minimum value *any* prospect's sum of random variable realizations must reach (absolute boundary) or the minimum value for the difference of these sums (relative boundary).
**Boundary type**: Is either the minimum value *any* prospect's sample sum must reach (absolute) or the minimum value for the difference of these sums (relative).
**Boundary value:** To omit any strict assumptions about the internal boundaries people might apply, we start by varying the parameter value $a$ between integers 15 to 35 in increments of 5, for comprehensive sampling respectively. For piecewise sampling, we vary $a$ between 1 to 7 in increments of 2. We start with a relatively large parameter range to later explore which parameter values in combination with which other parameter settings produces plausible sample sizes - the accumulated evidence for decisions from experience indicates that people use relatively small samples [e.g. @wulffMetaanalyticReviewTwo2018].
**Noise parameter:** The representation of the outcomes sampled from the probability spaces is assumed to be stochastical. Therefore, we add Gaussian noise $\epsilon \sim N(0, \sigma)$ in units of the outcomes. We start by fixing $\sigma$ to .5.
**Noise parameter:** Representations of the sampled outcomes are assumed to be stochastical. Therefore, Gaussian noise $\epsilon \sim N(0, \sigma)$ in units of outcomes is added. To reduce computational load $\sigma$ is fixed to .5.
## Safe-Risky Gambles
## Simulation
```{r}
# dataset
......@@ -281,12 +270,15 @@ summary_comprehensive <- sim_comprehensive %>%
mutate(n_sample = n(), # number of single samples
switch = case_when(attended != lag(attended) ~ 1,
attended == lag(attended) ~ 0),
n_switch = sum(switch, na.rm = TRUE)) %>% # number of switches
n_switch = sum(switch, na.rm = TRUE), # number of switches
a_ev_exp = round(mean(A, na.rm = TRUE), 2), # experienced expected value
b_ev_exp = round(mean(B, na.rm = TRUE), 2)) %>%
filter(!is.na(choice)) %>% # only return choice data (last obs of unique sampling process)
select(!c(attended, A, B, switch)) %>%
ungroup() %>%
mutate(strategy = "comprehensive") %>%
select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, A_sum, B_sum, diff, choice)
mutate(strategy = "comprehensive",
s = 1-(s+.5)) %>%
select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, A_sum, B_sum, diff, a_ev_exp, b_ev_exp, choice)
write_rds(summary_comprehensive, "./R/data/summary_comprehensive.rds")
```
......@@ -427,26 +419,162 @@ write_rds(sim_piecewise, "./R/data/sim_piecewise.rds")
summary_piecewise <- sim_piecewise %>%
group_by(s, sigma, boundary, a, gamble, agent) %>%
mutate(n_sample = n(),
A_rmean = case_when(is.na(A_win) == FALSE ~ A_rmean),
B_rmean = case_when(is.na(A_win) == FALSE ~ B_rmean),
A_rmean_sum = sum(A_rmean, na.rm = TRUE),
B_rmean_sum = sum(B_rmean, na.rm = TRUE)) %>%
a_ev_exp = mean(A, na.rm = TRUE),
b_ev_exp = mean(B, na.rm = TRUE)) %>%
ungroup() %>%
filter(!is.na(choice)) %>%
mutate(strategy = "piecewise",
n_switch = (round*2)-1,
mean_n_round = round(n_sample/round, 2),
mean_A_round = round(A_rmean_sum/round, 2),
mean_B_round = round(B_rmean_sum/round, 2)) %>%
select(!c(attended, A, B, A_rmean, B_rmean, rdiff, A_rmean_sum, B_rmean_sum, A_win, B_win)) %>%
select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, mean_n_round, mean_A_round, mean_B_round, A_sum, B_sum, wdiff, choice)
s = 1-(s+.5),
diff = wdiff,
n_switch = (round*2)-1) %>%
select(!c(attended, A, B, A_rmean, B_rmean, rdiff, A_win, B_win, wdiff)) %>%
select(strategy, s:gamble, rare, a_p1:ev_ratio, agent, n_sample, n_switch, A_sum, B_sum, diff, a_ev_exp, b_ev_exp, choice)
write_rds(summary_piecewise, "./R/data/summary_piecewise.rds")
```
### Summary
```{r eval = FALSE}
sr_data <- bind_rows(summary_comprehensive, summary_piecewise) %>%
mutate(across(c(strategy, boundary, a, gamble, agent, rare, choice), as.factor)) # convert to factor
write_rds(sr_data, "./R/data/sr_data.rds")
```
## Risky-risky gambles
# Descriptive Analysis
```{r}
data <- read_rds("./R/data/sr_data.rds")
gambles <- read_rds("./R/data/sr_subset.rds")
```
## Determinants of Sample size
```{r}
data %>%
group_by(strategy, boundary, a, s) %>%
summarise(median = round(median(n_sample), 0),
min = min(n_sample),
max = max(n_sample)) %>%
arrange(desc(median)) %>%
View()
```
Below, median sample sizes of all strategy-parameter combinations (circles), ranging from $2 \leq \tilde{x} \leq 162$, are plotted.
```{r}
data %>%
group_by(strategy, boundary, a, s) %>%
summarise(g = as.factor(cur_group_id()),
m = round(median(n_sample), 0)) %>%
select(g, everything()) %>%
ggplot(.) +
geom_point(aes(x = reorder(g, m), y = m, color = a, size = s), alpha = .3) +
guides(color = guide_legend(override.aes = list(size = 3) ) ) +
facet_grid(boundary~strategy, switch = "y", scales = "free_x") +
coord_flip() +
scale_x_discrete(breaks = NULL, name = NULL) +
scale_y_continuous(breaks = seq(0, 170, 5), name = "Median Sample Size")
```
### Boundary value
Both sampling strategies show a similiar effect of boundary value (coloring) on sample size. I.e., large boundary values lead to larger sample sizes, reflected by the clustering of colors. The plot below depicts the immediate consequence of the sequential process of evidence accumulation.
```{r}
data %>%
group_by(strategy, boundary, a, s) %>%
summarise(g = as.factor(cur_group_id()),
m = round(median(n_sample), 0)) %>%
ggplot(.) +
geom_col(aes(x = a, y = m, fill = a)) +
facet_grid(boundary~strategy, switch = "y", scales = "free")
```
### Boundary type
For both sampling strategies, relative (as compared to absolute) boundaries lead to larger sample sizes, which is explained by the fact that sequential sampling can either stabilize or reduce a prospects' distance to absolute boundaries, while the distance to relative boundaries can also be increased. Below this regularity is shown for each gamble.
```{r}
data %>%
group_by(strategy, boundary, gamble) %>%
summarise(m = median(n_sample)) %>%
ggplot(.) +
geom_bar(aes(x = strategy, y = m, fill = boundary), stat = "identity", position = "dodge") +
facet_wrap(~gamble, nrow = 6) +
scale_x_discrete(labels = c("comp", "piece"))
```
### Switching probability
For piecewise sampling, there is an inverse relationship between switching probability (circle size) and sample size. I.e., the lower the switching probability, the the less frequent prospects are compared and thus, boundaries are only approached with larger sample sizes.
The regression plot below shows that this effect is particularly pronounced for low probabilities such that the increase in sample size accelerates as switching probability decreases. As a consequence, the magnitude of the effect of the boundary value increases.
```{r}
data %>%
filter(strategy == "piecewise") %>%
ggplot(., aes(x = s, y = n_sample, color = a)) +
geom_smooth() +
scale_x_continuous(name = "Switching Probability") +
scale_y_continuous(limits = c(0, 250), name = "Sample Size") +
facet_wrap(~boundary, nrow = 2)
```
For comprehensive sampling, boundary types differ in the effects of switching probability. Regarding absolute boundaries, switching probability has no apparent effect on sample size which can be seen by the clustering of different sized circles, for given boundary values respectively. I.e., the distance of a given prospect to its absolute boundary is not changed by switching to (and sampling from) the other prospect. ... beyond the mechanical effect that with each switch an additional sample must be drawn.
For relative boundaries, however, switching probability has a more nuanced effect on sample size. Particularly, regressing sample size on switching probability across all gambles produces the odd behavior of an decelariting inverse relationship for small probabilities and an accelarating positive relationship for larger probabilities.
```{r}
data %>%
filter(strategy == "comprehensive") %>%
ggplot(., aes(x = s, y = n_sample, color = a)) +
geom_smooth() +
scale_x_continuous(name = "Switching Probability") +
scale_y_continuous(name = "Sample Size") +
facet_wrap(~boundary, nrow = 2)
```
Inspecting gambles separately, one does not observe a U-shaped relation but rather two gamble clusters, one of which shows an inverse relation and the other a positive:
```{r}
data %>%
filter(strategy == "comprehensive" & boundary == "relative") %>%
ggplot(., aes(x = s, y = n_sample, color = a)) +
geom_smooth(method = "lm") +
scale_x_continuous(name = "Switching Probability") +
scale_y_continuous(name = "Sample Size") +
facet_wrap(~gamble, nrow = 5)
```
Looking at the qualitative difference of the gamble features, one does observe that the positive cluster shows small differences in the expected value (EV) of prospects, whereas the inverse cluster is indicated by larger EV differences. Specifically, for gambles of all kind the distance of a given prospect to its relative boundary is reduced by switching and sampling from the other prospect. If the EV difference is low, then frequent switching will lead to a more oscillating behavior of prospects approaching and moving away from the relative border. Specifically, for small EV differences frequent switching reduces the probability that the relative boundary can be reached with the subsequent sample(s) of a given prospect.
```{r}
data %>%
filter(strategy == "comprehensive" & boundary == "relative") %>%
filter(ev_ratio >= .5 & ev_ratio <= 1.5) %>%
ggplot(., aes(x = s, y = n_sample, color = a)) +
geom_smooth(method = "lm") +
scale_x_continuous(name = "Switching Probability") +
scale_y_continuous(name = "Sample Size") +
facet_wrap(~gamble, nrow = 5)
```
In contrast, a necessary (although not sufficient) condition for large EV differences are large differences between the outcomes of the risky prospect, indicating that one prospect is significantly better than the other. Less frequent switching may thus lead to larger sample sizes than are demanded by the diagnosticity of extreme outcomes.
```{r}
data %>%
filter(strategy == "comprehensive" & boundary == "relative") %>%
filter(ev_ratio < .5 | ev_ratio > 1.5) %>%
ggplot(., aes(x = s, y = n_sample, color = a)) +
geom_smooth(method = "lm") +
scale_x_continuous(name = "Switching Probability") +
scale_y_continuous(name = "Sample Size") +
facet_wrap(~gamble, nrow = 5)
```
# References
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment