aind_behavior_gym.dynamic_foraging.task package¶
Submodules¶
aind_behavior_gym.dynamic_foraging.task.base module¶
A general gymnasium environment for dynamic foraging tasks in AIND.
Adapted from Han’s code for the project in Neuromatch Academy: Deep Learning https://github.com/hanhou/meta_rl/blob/bd9b5b1d6eb93d217563ff37608aaa2f572c08e6/han/environment/dynamic_bandit_env.py
See also Po-Chen Kuo’s implementation: https://github.com/pckuo/meta_rl/blob/main/environments/bandit/bandit.py
- class aind_behavior_gym.dynamic_foraging.task.base.DynamicForagingTaskBase(reward_baiting: bool = False, allow_ignore: bool = False, num_arms: int = 2, num_trials: int = 1000, seed=None)[source]¶
Bases:
EnvA general gymnasium environment for dynamic bandit task
Adapted from https://github.com/thinkjrs/gym-bandit-environments/blob/master/gym_bandits/bandit.py # noqa E501
- generate_new_trial()[source]¶
Generate p_reward for a new trial Note that self.trial already increased by 1 here
- generate_reward(action)[source]¶
Compute reward, could be overridden by subclasses for more complex reward structures
- get_choice_history()[source]¶
Return the history of actions in format that is compatible with other library such as aind_dynamic_foraging_basic_analysis
- get_p_reward()[source]¶
Return the reward probabilities for each arm in each trial which is compatible with other library such as aind_dynamic_foraging_basic_analysis
- get_reward_history()[source]¶
Return the history of rewards in format that is compatible with other library such as aind_dynamic_foraging_basic_analysis
- reset(options={})[source]¶
The reset method will be called to initiate a new episode. You may assume that the step method will not be called before reset has been called. Moreover, reset should be called whenever a done signal has been issued. This should NOT automatically reset the task! Resetting the task is handled in the wrapper.
aind_behavior_gym.dynamic_foraging.task.coupled_block_task module¶
Couple block task for dynamic bandit environment This is very close to the task used in mice training.
First coded by Han for the project in Neuromatch Academy: Deep Learning https://github.com/hanhou/meta_rl/blob/bd9b5b1d6eb93d217563ff37608aaa2f572c08e6/han/environment/dynamic_bandit_env.py
- class aind_behavior_gym.dynamic_foraging.task.coupled_block_task.CoupledBlockTask(block_min: int = 40, block_max: int = 80, block_beta: int = 20, p_reward_pairs: List[List[float]] | None = None, **kwargs)[source]¶
Bases:
DynamicForagingTaskBaseCoupled block task for dynamic foraging
This default setting roughly matches what has been used in this paper: https://www.sciencedirect.com/science/article/pii/S089662731930529X
aind_behavior_gym.dynamic_foraging.task.random_walk_task module¶
Random walk task for the dynamic bandit environment.
- class aind_behavior_gym.dynamic_foraging.task.random_walk_task.RandomWalkTask(p_min=[0, 0], p_max=[1, 1], sigma=[0.15, 0.15], mean=[0, 0], **kwargs)[source]¶
Bases:
DynamicForagingTaskBaseGenerate reward schedule with random walk
(see Miller et al. 2021, https://www.biorxiv.org/content/10.1101/461129v3.full.pdf)
aind_behavior_gym.dynamic_foraging.task.uncoupled_block_task module¶
Uncoupled task for dynamic bandit environment
see /test/test_uncoupled_block_task.py for usage
- class aind_behavior_gym.dynamic_foraging.task.uncoupled_block_task.UncoupledBlockTask(rwd_prob_array=[0.1, 0.5, 0.9], block_min=20, block_max=35, persev_add=True, perseverative_limit=4, max_block_tally=4, **kwargs)[source]¶
Bases:
DynamicForagingTaskBaseGenerate uncoupled block reward schedule (by on-line updating)
adapted from Cohen lab’s Arduino code (with some bug fixes) https://github.com/JeremiahYCohenLab/arduinoLibraries/blob/master/libraries/task_operantMatchingDecoupledBait/task_operantMatchingDecoupledBait.cpp # noqa E501
See Grossman et al. 2022:
In the final stage of the task, the reward probabilities assigned to each lick spout were drawn pseudorandomly from the set {0.1, 0.5, 0.9} in all the mice from the behavior experiments (n=46), all the mice from the DREADDs experiments (n=10), and half of the mice from the electrophysiology experiments (n=2). The other half of mice from the electrophysiology experiments (n=2) were run on a version of the task with probabilities drawn from the set {0.1, 0.4, 0.7}. The probabilities were assigned to each spout individually with block lengths drawn from a uniform distribution of 20–35 trials. To stagger the blocks of probability assignment for each spout, the block length for one spout in the first block of each session was drawn from a uniform distribution of 6–21 trials. For each spout, probability assignments could not be repeated across consecutive blocks. To maintain task engagement, reward probabilities of 0.1 could not be simultaneously assigned to both spouts. If one spout was assigned a reward probability greater than or equal to the reward probability of the other spout for 3 consecutive blocks, the probability of that spout was set to 0.1 to encourage switching behavior and limit the creation of a direction bias. If a mouse perseverated on a spout with a reward probability of 0.1 for 4 consecutive trials, 4 trials were added to the length of both blocks. This procedure was implemented to keep mice from choosing one spout until the reward probability became high again.
- auto_shape_perseverance()[source]¶
Anti-perseverance mechanism
See Grossman et al. 2022:
If a mouse perseverated on a spout with reward probability of 0.1 for 4 consecutive trials, 4 trials were added to the length of both blocks. This procedure was implemented to keep mice from choosing one spout until the reward probability became high again.
- generate_next_block(side, check_higher_in_a_row=True, check_both_lowest=True)[source]¶
Generate the next block for both sides (yes, very complicated logic…)
Module contents¶
Task module