This is often called fictive learning ( Hayden et al., 2009). When sequences switched,
actions in the sequence following the switch that were the same as actions in the sequence that preceded the switch were given the value they had before the sequence switched. In other words the values were copied into the new block. This was consistent with the fact that the animal did not know when the sequence switched and so it could not update its action values until it received feedback that the previous action was no longer correct. Actions from the previously correct sequence that were not possible in the new sequence were given a Obeticholic Acid value of 0. The learning rate parameters ρf and an additional inverse temperature parameter, β, were estimated separately for each session by minimizing the log-likelihood of the animals’ decisions using fminsearch in Matlab, as we have done previously ( Djamshidian et al., 2011). If β is small, then the animal is less likely to pick the higher value target whereas if β is large the animal is more likely to pick DAPT clinical trial the higher value target,
for a fixed difference in target values. To estimate the log-likelihood we first calculated choice probabilities using: equation(Equation 2) di(t)=eβvi(t)∑j=12eβvj(t). The sum is over the two actions possible at each point in the sequence. We then calculated the log likelihood (ll) of the animal’s decision sequence as equation(Equation 3) ll=−∑t=1Tlog(di(t)ci(t)+(1−di(t))(1−ci(t))). The sum is over all decisions in a recording session, T. The variable ci(t) models the chosen action and has a value of one for action 1 and 0 for action 2. Average optimal values for β were 1.858 ± 0.03
and 1.910 ± 0.025 for monkeys 1 (n = 34 sessions) and 2 (n = 61 sessions), respectively. Average optimal values for ρf = positive were 0.440 ± 0.015 and 0.359 ± 0.008 for monkeys 1 and 2. Average optimal values for ρf = negative were 1.042 ± 0.03 and 0.656 ± 0.013 for monkeys 1 and 2. The value of the action that was taken, vi(t), was then correlated with neural activity in the ANOVA model. We modeled the integration of sequence or learned action value and color bias information why on choices in the fixed condition. We used action value as an estimate of sequence learning, because knowing the sequence entails knowing the actions. Although it is possible that some actions are known before the complete sequence, the structure of the task is such that knowing actions and sequences are highly correlated. Further, we found that the behavioral weight estimated by action value significantly predicted sequence representation in lPFC neurons (Figure 9). We estimated the relative influence of action value and color bias information by using logistic regression to predict the behavioral performance (fraction correct or fc) as a function of color bias (CB) and action value.
No related posts.