Synthetic choice data from decisions from experience (DfE) is generated by applying different strategies of sample integration to 2-prospect gambles.

Synthetic choice data from decisions from experience is generated by applying different strategies of sample integration to choice problems of 2-prospects.

The synthetic data is explored for characteristic choice patterns produced by comprehensive and piecewise forms of sample integration under varying structures of the environment (gamble features) and aspects of the sampling- and decision behavior (model parameters).

# Summary

...

...

@@ -37,19 +40,49 @@ Provide short summary of simulation study results.

# Introduction

A formal introduction to sampling in DfE and the data generating models of this study can be found [here](https://arc-git.mpib-berlin.mpg.de/sampling-strategies-in-dfe/sampling-strategies-in-dfe/-/blob/main/modeling-sampling.Rmd)

## Prospects

Let a single prospect be a *probability space* $(\Omega, \Sigma, P)$ [cf. @kolmogorovFoundationsTheoryProbability1950]. $\Omega$ is the *sample space* containing a finite set of possible outcomes $\{\omega_1, ..., \omega_n\}$.

$\Sigma$ is a set of subsets of $\Omega$, i.e., the *event space*.

$P$ is then a *probability mass function* (PMF) which maps the event space to the set of real numbers in the interval between 0 and 1: $P: \Sigma \mapsto [0,1]$.

I.e., the PMF assigns each event $\varsigma_i$ a probability of $0 \leq p_i \leq 1$ with $\sum_{i=1}^{n} p(\varsigma_i) = 1$.

The PMF also fulfills the condition $P(\Omega) = 1$.

## Monetary Prospects as Random Variables

We can define a random variable on the probability space of a prospect by defining a function that maps the sample space to a measurable space: $X: \Omega \mapsto E$, where $E = \mathbb{R}$.

Hence, every subset of $E$ has a preimage in $\Sigma$ and can be assigned a probability.

In choice problems, where agents are asked to make a decision between $n$ monetary prospects, the mapping $\Omega \mapsto E$ is often implicit since all elements of $\Omega$ are real numbered (monetary gains or losses) and usually equal to the elements in $\Sigma$.

## Sampling in Decisions from Experience (DFE)

In DFE [@hertwigDecisionsExperienceEffect2004], where no summary description of prospects' probability spaces are provided, agents can either first explore them before arriving to a final choice (*sampling paradigm*), or, exploration and exploitation occur simultaneously (*partial-* or *full-feedback paradigm*) [cf. @hertwigDescriptionExperienceGap2009].

Below, only the sampling paradigm is considered.

In the context of choice problems between monetary gambles, we define a *single sample* as an outcome obtained when randomly drawing from a prospect's sample space $\Omega$.

Technically, a single sample is thus the realization of a discrete random variable $X$, which fulfills the conditions outlined above.

In general terms, we define a *sampling strategy* as a systematic approach to generate a sequence of single samples from a choice problem's prospects as a means of exploring their probability spaces.

Single samples that are generated from the same prospect reflect a sequence of realizations of random variables that are independent and identically distributed.

### Sampling Strategies and Sample Integration

...

# Method

## Test set

Under each condition, i.e., strategy-parameter combinations, all gambles are played by 100 synthetic agents. We test a set of gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*).

Therefore, 60 gambles from an initial set of 10,000 are sampled.

Both outcomes and probabilities are drawn from uniform distributions, ranging from 0 to 20 for outcomes and from .01 to .99 for probabilities of the lower risky outcomes $p_{low}$.

The probabilities of the higher risky outcomes are $1-p_{low}$, respectively.

To omit dominant prospects, safe outcomes fall between both risky outcomes.

The table below contains the test set of 60 gambles.

Sampling of gambles was stratified, randomly drawing an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome. Risky outcomes are considered *"rare"* if their probability is $p < .2$ and *"attractive"* (*"unattractive"*) if they are higher (lower) than the safe outcome.

Under each condition, i.e., strategy-parameter combinations, all gambles are played by 100 synthetic agents.

We test a set of gambles, in which one of the prospects contains a safe outcome and the other two risky outcomes (*safe-risky gambles*).

Therefore, 60 gambles from an initial set of 10,000 are sampled.

Both outcomes and probabilities are drawn from uniform distributions, ranging from 0 to 20 for outcomes and from .01 to .99 for probabilities of the lower risky outcomes $p_{low}$.

The probabilities of the higher risky outcomes are $1-p_{low}$, respectively.

To omit dominant prospects, safe outcomes fall between both risky outcomes.

The table below contains the test set of 60 gambles.

Sampling of gambles was stratified, randomly drawing an equal number of 20 gambles with no, an attractive, and an unattractive rare outcome.

Risky outcomes are considered *"rare"* if their probability is $p < .2$ and *"attractive"* (*"unattractive"*) if they are higher (lower) than the safe outcome.

```{r message=FALSE}

gambles <- read_csv("data/gambles/sr_subset.csv")

...

...

@@ -58,10 +91,10 @@ gambles %>% kable()

## Model Parameters

**Switching probability** $s$ is the probability with which agents draw the following single sample from the prospect they did not get their most recent single sample from.

**Switching probability** $s$ is the probability with which agents draw the following single sample from the prospect they did not get their most recent single sample from.

$s$ is varied between .1 to 1 in increments of .1.

The **boundary type** is either the minimum value any prospect's sample statistic must reach (absolute) or the minimum value for the difference of these statistics (relative).

The **boundary type** is either the minimum value any prospect's sample statistic must reach (absolute) or the minimum value for the difference of these statistics (relative).

Sample statistics are sums over outcomes (comprehensive strategy) and sums over wins (piecewise strategy), respectively.

For comprehensive integration, the **boundary value** $a$ is varied between 15 to 75 in increments of 15.

...

...

@@ -83,8 +116,8 @@ In sum, 2 (strategies) x 60 (gambles) x 100 (agents) x 100 (parameter combinatio

# Results

Because we are not interested in deviations from normative choice due to sampling artifacts (e.g., ceiling effects produced by low boundaries), we remove trials in which only one prospect was attended.

In addition, we use relative frequencies of sampled outcomes rather than 'a priori' probabilities to compare actual against normative choice behavior.

Because we are not interested in deviations from normative choice due to sampling artifacts (e.g., ceiling effects produced by low boundaries), we remove trials in which only one prospect was attended.

In addition, we use relative frequencies of sampled outcomes rather than 'a priori' probabilities to compare actual against normative choice behavior.

```{r}

# remove choices where prospects were not attended

Removing the respective trials, we are left with `r nrow(choices)` choices.

Removing the respective trials, we are left with `r nrow(choices)` choices.

## Sample Size

...

...

@@ -114,7 +147,7 @@ The median sample sizes generated by different parameter combinations ranged fro

### Boundary type and boundary value (a)

As evidence is accumulated sequentially, relative boundaries and large boundary values naturally lead to larger sample sizes, irrespective of the integration strategy.

As evidence is accumulated sequentially, relative boundaries and large boundary values naturally lead to larger sample sizes, irrespective of the integration strategy.

```{r message=FALSE}

group_med <- samples_piecewise %>%

...

...

@@ -154,9 +187,9 @@ samples_comprehensive %>%

### Switching probability (s)

For piecewise integration, there is an inverse relationship between switching probability and sample size.

I.e., the lower s, the less frequent prospects are compared and thus, boundaries are only approached with larger sample sizes.

This effect is particularly pronounced for low probabilities such that the increase in sample size accelerates as switching probability decreases.

For piecewise integration, there is an inverse relationship between switching probability and sample size.

I.e., the lower s, the less frequent prospects are compared and thus, boundaries are only approached with larger sample sizes.

This effect is particularly pronounced for low probabilities such that the increase in sample size accelerates as switching probability decreases.

```{r message=FALSE}

group_med <- samples_piecewise %>%

...

...

@@ -176,9 +209,9 @@ samples_piecewise %>%

theme_minimal()

```

For comprehensive integration, boundary types differ in the effects of switching probability.

For absolute boundaries, switching probability has no apparent effect on sample size as the distance of a given prospect to its absolute boundary is not changed by switching to (and sampling from) the other prospect.

For relative boundaries, however, samples sizes increase with switching probability.

For comprehensive integration, boundary types differ in the effects of switching probability.

For absolute boundaries, switching probability has no apparent effect on sample size as the distance of a given prospect to its absolute boundary is not changed by switching to (and sampling from) the other prospect.

For relative boundaries, however, samples sizes increase with switching probability.

```{r message=FALSE}

group_med <- samples_comprehensive %>%

...

...

@@ -200,7 +233,7 @@ samples_comprehensive %>%

## Choice Behavior

Below, in extension to Hills and Hertwig [-@hillsInformationSearchDecisions2010], the interplay of integration strategies, gamble features, and model parameters in their effects on choice behavior in general and their contribution to underweighting of rare events in particular is investigated.

Below, in extension to Hills and Hertwig [-@hillsInformationSearchDecisions2010], the interplay of integration strategies, gamble features, and model parameters in their effects on choice behavior in general and their contribution to underweighting of rare events in particular is investigated.

We apply two definitions of underweighting of rare events: Considering false response rates, we define underweighting such that the rarity of an attractive (unattractive) outcome leads to choose the safe (risky) prospect although the risky (safe) prospect has a higher expected value.

```{r message=FALSE}

...

...

@@ -215,7 +248,7 @@ fr_rates <- choices %>%

filter(!is.na(type)) # remove correct responses

```

Considering the parameters of Prelec's [-@prelecProbabilityWeightingFunction1998] implementation of the weighting function [CPT; cf. @tverskyAdvancesProspectTheory1992], underweighting is reflected by decisions weights estimated to be smaller than the corresponding objective probabilities.

Considering the parameters of Prelec's [-@prelecProbabilityWeightingFunction1998] implementation of the weighting function [CPT; cf. @tverskyAdvancesProspectTheory1992], underweighting is reflected by decisions weights estimated to be smaller than the corresponding objective probabilities.

The false response rates generated by different parameter combinations ranged from `r min(fr_rates_piecewise$rate)` to `r max(fr_rates_piecewise$rate)` for piecewise integration and from `r min(fr_rates_comprehensive$rate)` to `r max(fr_rates_comprehensive$rate)` for comprehensive integration.

The false response rates generated by different parameter combinations ranged from `r min(fr_rates_piecewise$rate)` to `r max(fr_rates_piecewise$rate)` for piecewise integration and from `r min(fr_rates_comprehensive$rate)` to `r max(fr_rates_comprehensive$rate)` for comprehensive integration.

However, false response rates vary considerably as a function of rare events, indicating that their presence and attractiveness are large determinants of false response rates.

```{r message=FALSE}

...

...

@@ -235,10 +268,10 @@ fr_rates %>%

kable()

```

The heatmaps below show the false response rates for all strategy-parameter combinations.

The heatmaps below show the false response rates for all strategy-parameter combinations.

Consistent with our - somewhat rough - definition of underweighting, the rate of false risky responses is generally higher, if the unattractive outcome of the risky prospect is rare (top panel).

Conversely, if the attractive outcome of the risky prospect is rare, the rate of false safe responses is generally higher (bottom panel).

As indicated by the larger range of false response rates, the effects of rare events are considerably larger for piecewise integration.

As indicated by the larger range of false response rates, the effects of rare events are considerably larger for piecewise integration.

```{r message=FALSE}

fr_rates %>%

...

...

@@ -306,7 +339,7 @@ fr_rates %>%

#### Switching Probability (s) and Boundary Value (a)

As for both piecewise and comprehensive integration the differences between boundary types are rather minor and of magnitude than of qualitative pattern, the remaining analyses of false response rates are summarized across absolute and relative boundaries.

As for both piecewise and comprehensive integration the differences between boundary types are rather minor and of magnitude than of qualitative pattern, the remaining analyses of false response rates are summarized across absolute and relative boundaries.

Below, the $s$ and $a$ parameter are considered as additional sources of variation in the false response pattern above and beyond the interplay of integration strategies and the rarity and attractiveness of outcomes.

...

...

@@ -342,18 +375,346 @@ fr_rates %>%

theme_minimal()

```

For piecewise integration, switching probability is naturally related to the size of the samples on which the round-wise comparisons of prospects are based on, with low values of $s$ indicating large samples and vice versa.

For piecewise integration, switching probability is naturally related to the size of the samples on which the round-wise comparisons of prospects are based on, with low values of $s$ indicating large samples and vice versa.

Accordingly, switching probability is positively related to false response rates.

I.e., the larger the switching probability, the smaller the round-wise sample size and the probability of experiencing a rare event within a given round.

Because round-wise comparisons are independent of each other and binomial distributions within a given round are skewed for small samples and outcome probabilities [@kolmogorovFoundationsTheoryProbability1950], increasing boundary values do not reverse but rather amplify this relation.

I.e., the larger the switching probability, the smaller the round-wise sample size and the probability of experiencing a rare event within a given round.

Because round-wise comparisons are independent of each other and binomial distributions within a given round are skewed for small samples and outcome probabilities [@kolmogorovFoundationsTheoryProbability1950], increasing boundary values do not reverse but rather amplify this relation.

For comprehensive integration, switching probability is negatively related to false response rates, i.e., an increase in $s$ is associated with decreasing false response rates.

This relation, however, may be the result of an artificial interaction between the $s$ and $a$ parameter.

Precisely, in the current algorithmic implementation of sampling with a comprehensive integration mechanism, decreasing switching probabilities cause comparisons of prospects based on increasingly unequal sample sizes immediately after switching prospects.

For comprehensive integration, switching probability is negatively related to false response rates, i.e., an increase in $s$ is associated with decreasing false response rates.

This relation, however, may be the result of an artificial interaction between the $s$ and $a$ parameter.

Precisely, in the current algorithmic implementation of sampling with a comprehensive integration mechanism, decreasing switching probabilities cause comparisons of prospects based on increasingly unequal sample sizes immediately after switching prospects.

Consequentially, reaching (low) boundaries is rather a function of switching probability and associated sample sizes than of actual evidence for a given prospect over the other.

### Cumulative Prospect Theory

### Cumulative Prospect Theory

In the following, we examine the possible relations between the parameters of the *choice-generating* sampling models and the *choice-describing* cumulative prospect theory.

For each distinct strategy-parameter combination, we ran 20 chains of 40,000 iterations each, after a warm-up period of 1000 samples.

To reduce potential autocorrelation during the sampling process, we only kept every 20th sample (thinning).

labs(title = "Piecewise Integration: Value functions",

x = "p",

y= "w(p)") +

scale_color_viridis() +

theme_minimal()

```

#### Comprehensive Integration

##### Weighting function w(p)

We start by plotting the weighting curves for all parameter combinations under piecewise integration.

In the following, we examine the possible relations between the parameters of the *choice-generating* sampling models and the *choice-describing* cumulative prospect theory.