aind_behavior_gym.dynamic_foraging.task package¶

Submodules¶

aind_behavior_gym.dynamic_foraging.task.base module¶

A general gymnasium environment for dynamic foraging tasks in AIND.

Adapted from Han’s code for the project in Neuromatch Academy: Deep Learning https://github.com/hanhou/meta_rl/blob/bd9b5b1d6eb93d217563ff37608aaa2f572c08e6/han/environment/dynamic_bandit_env.py

See also Po-Chen Kuo’s implementation: https://github.com/pckuo/meta_rl/blob/main/environments/bandit/bandit.py

class aind_behavior_gym.dynamic_foraging.task.base.DynamicForagingTaskBase(reward_baiting: bool = False, allow_ignore: bool = False, num_arms: int = 2, num_trials: int = 1000, seed=None)[source]¶

Bases: Env

A general gymnasium environment for dynamic bandit task

Adapted from https://github.com/thinkjrs/gym-bandit-environments/blob/master/gym_bandits/bandit.py # noqa E501

generate_new_trial()[source]¶: Generate p_reward for a new trial Note that self.trial already increased by 1 here

generate_reward(action)[source]¶: Compute reward, could be overridden by subclasses for more complex reward structures

get_choice_history()[source]¶: Return the history of actions in format that is compatible with other library such as aind_dynamic_foraging_basic_analysis

get_p_reward()[source]¶: Return the reward probabilities for each arm in each trial which is compatible with other library such as aind_dynamic_foraging_basic_analysis

get_reward_history()[source]¶: Return the history of rewards in format that is compatible with other library such as aind_dynamic_foraging_basic_analysis

reset(options={})[source]¶: The reset method will be called to initiate a new episode. You may assume that the step method will not be called before reset has been called. Moreover, reset should be called whenever a done signal has been issued. This should NOT automatically reset the task! Resetting the task is handled in the wrapper.

step(action)[source]¶: Execute one step in the environment. Should return: (observation, reward, terminated, truncated, info) If terminated or truncated is true, the user needs to call reset().

aind_behavior_gym.dynamic_foraging.task.coupled_block_task module¶

Couple block task for dynamic bandit environment This is very close to the task used in mice training.

First coded by Han for the project in Neuromatch Academy: Deep Learning https://github.com/hanhou/meta_rl/blob/bd9b5b1d6eb93d217563ff37608aaa2f572c08e6/han/environment/dynamic_bandit_env.py

class aind_behavior_gym.dynamic_foraging.task.coupled_block_task.CoupledBlockTask(block_min: int = 40, block_max: int = 80, block_beta: int = 20, p_reward_pairs: List[List[float]] | None = None, **kwargs)[source]¶

Bases: DynamicForagingTaskBase

Coupled block task for dynamic foraging

This default setting roughly matches what has been used in this paper: https://www.sciencedirect.com/science/article/pii/S089662731930529X

generate_new_trial()[source]¶: Override the base class method to generate the next trial for coupled block task.

reset()[source]¶: Reset the task

aind_behavior_gym.dynamic_foraging.task.coupled_block_task.generate_trunc_exp(lower, upper, beta, n=1, rng=None)[source]¶: Generate n samples from a truncated exponential distribution

aind_behavior_gym.dynamic_foraging.task.random_walk_task module¶

Random walk task for the dynamic bandit environment.

class aind_behavior_gym.dynamic_foraging.task.random_walk_task.RandomWalkTask(p_min=[0, 0], p_max=[1, 1], sigma=[0.15, 0.15], mean=[0, 0], **kwargs)[source]¶

Bases: DynamicForagingTaskBase

Generate reward schedule with random walk

(see Miller et al. 2021, https://www.biorxiv.org/content/10.1101/461129v3.full.pdf)

generate_new_trial()[source]¶: Generate a new trial. Overwrite the base class method.

plot_reward_schedule()[source]¶: Plot the reward schedule and compute the auto-correlation.

reset(seed=None)[source]¶: Reset the task, remember to call the base class reset at the end.

aind_behavior_gym.dynamic_foraging.task.random_walk_task.auto_corr(data)[source]¶: Util function to compute the auto-correlation of the data.

aind_behavior_gym.dynamic_foraging.task.uncoupled_block_task module¶

Uncoupled task for dynamic bandit environment

see /test/test_uncoupled_block_task.py for usage

class aind_behavior_gym.dynamic_foraging.task.uncoupled_block_task.UncoupledBlockTask(rwd_prob_array=[0.1, 0.5, 0.9], block_min=20, block_max=35, persev_add=True, perseverative_limit=4, max_block_tally=4, **kwargs)[source]¶

Bases: DynamicForagingTaskBase

Generate uncoupled block reward schedule (by on-line updating)

adapted from Cohen lab’s Arduino code (with some bug fixes) https://github.com/JeremiahYCohenLab/arduinoLibraries/blob/master/libraries/task_operantMatchingDecoupledBait/task_operantMatchingDecoupledBait.cpp # noqa E501

See Grossman et al. 2022:

In the final stage of the task, the reward probabilities assigned to each lick spout were drawn pseudorandomly from the set {0.1, 0.5, 0.9} in all the mice from the behavior experiments (n=46), all the mice from the DREADDs experiments (n=10), and half of the mice from the electrophysiology experiments (n=2). The other half of mice from the electrophysiology experiments (n=2) were run on a version of the task with probabilities drawn from the set {0.1, 0.4, 0.7}. The probabilities were assigned to each spout individually with block lengths drawn from a uniform distribution of 20–35 trials. To stagger the blocks of probability assignment for each spout, the block length for one spout in the first block of each session was drawn from a uniform distribution of 6–21 trials. For each spout, probability assignments could not be repeated across consecutive blocks. To maintain task engagement, reward probabilities of 0.1 could not be simultaneously assigned to both spouts. If one spout was assigned a reward probability greater than or equal to the reward probability of the other spout for 3 consecutive blocks, the probability of that spout was set to 0.1 to encourage switching behavior and limit the creation of a direction bias. If a mouse perseverated on a spout with a reward probability of 0.1 for 4 consecutive trials, 4 trials were added to the length of both blocks. This procedure was implemented to keep mice from choosing one spout until the reward probability became high again.

auto_shape_perseverance()[source]¶

Anti-perseverance mechanism

See Grossman et al. 2022:

If a mouse perseverated on a spout with reward probability of 0.1 for 4 consecutive trials, 4 trials were added to the length of both blocks. This procedure was implemented to keep mice from choosing one spout until the reward probability became high again.

generate_first_block()[source]¶: Generate the first block. Note the stagger is applied.

generate_new_trial()[source]¶: Generate a new trial. Overwrite the base class method.

generate_next_block(side, check_higher_in_a_row=True, check_both_lowest=True)[source]¶: Generate the next block for both sides (yes, very complicated logic…)

plot_reward_schedule()[source]¶: Plot the reward schedule with annotations showing forced block switches

reset(seed=None)[source]¶: Reset the task

Module contents¶

Task module