自发行为是通过强化而没有明确奖励来构成的

微信号:8149027
不接反杀,想去别人群里开挂,开不了不用加。
复制微信号
  扩展数据表1中提供了试剂和资源列表。   所有实验程序均由哈佛医学院机构动物护理和使用委员会(协议编号04930)批准,并遵守哈佛大学的道德法规以及《动物保健和实验动物使用指南》。   Moseq(前面参考文献4,27,60中描述)是一种无监督的机器学习方法,它可以识别出小鼠自发执行的简短,重复使用的行为基序。Moseq将其作为小鼠的输入3D成像数据,并返回一组表征这些小鼠表达行为的行为“音节”,以及控制这些音节在实验中表达的顺序的统计数据。使用Moseq,因为它最初被描述为探索内源性DLS多巴胺释放和行为之间的关系。如下所述,该技术进一步适应了实时音节识别,用于对神经活动的闭环操纵。重要的是,本研究中使用的“离线”和“在线”的“离线”和“在线”变体的潜在拟合自回归的隐藏模型(AR-HMM)相同,从而可以比较与在多个实验中识别和执行的与音节相关的神经活动的比较。   Moseq由两个基本工作流程组成:一个用于预处理深度数据,并将其转换为一个描述姿势动力学的低维度序列,另一个用于对低维时序列数据进行建模。如前所述,为了专注于姿势动力学,将原始的深度框架首先提取,以将深度单元从地板(以毫米为单位)从距离转换为高度。接下来,通过使用OpenCV FindContours功能找到最大区域的轮廓的质心来识别鼠标的位置。在确定的质心周围绘制了一个80×80像素边界框,并使用椭圆拟合估算了方向(先前描述的校正为±180度歧义4,27)。鼠标在边界框中旋转以面向右侧。然后,使用了面向,定向小鼠的80×80像素深度视频来估计姿势动力学。   为了在线音节估计和深度图像中的其他变化来源中容纳噪声,而不是由于姿势动态的变化(例如,阻塞诸如光纤电缆之类的对象),我们设计了一个DeNoisising卷积自动编码器。该网络是使用TensorFlow设计的,以处理图像 <33 ms, the time between frame captures on the Microsoft Kinect V261. On the encoder side, 4 layers of 2D convolutions (ReLu activation) followed by max pooling were used to downsample the 80 × 80 images to 5 × 5. Another 4 layers of 2D convolutions with successive upsampling layers were used on the decoding side to reconstruct the 80 × 80 images (10,310,041 total parameters). Batch normalization was used during training with a batch size of 128. In order to train the network, we used a size- and age-matched dataset (7–8 weeks of age). Mouse images were corrupted through rotation, position jitter, zooming in and out (that is, changing size), and superimposing depth images of fibre-optic cables. The network was fed corrupted mouse images as input and was trained to minimize the reconstruction loss of the original, corresponding uncorrupted mouse images (Extended Data Fig. 8a–c). The model was trained for 100 epochs using stochastic gradient descent with early stopping. Both online and offline variants of MoSeq included the size-normalizing network to ensure results were comparable.   In order to represent pose dynamics in a common space for all experiments, principal components and an AR-HMM time-series model were trained offline on a sample dataset of genotype- and age-matched mice. The parameters describing the principal components and AR-HMM model were saved. All depth videos acquired for this paper were then projected onto these same principal components for all experiments, whether they used the online or offline variant. As previously described, principal components were estimated from cropped, oriented depth videos, and the AR-HMM was trained on the top 10 principal components. Since the denoising autoencoder was used for all experiments, mouse videos from the size-and-age-matched dataset were fed through the denoising autoencoder prior to principal component estimation.   In the offline variant, the Viterbi algorithm was used to estimate the most probable discrete latent state sequence according to the trained AR-HMM for each experiment post hoc. This variant was used to analyse all data except for the Opto-DA experiments shown in Figs. 3 and  4.   In the online variant, syllable likelihoods were computed and updated by computing the forward probabilities of the discrete latent states for each frame as they arrived from the depth sensor. To avoid spurious syllable detections, the targeted syllable probability had to cross a user-defined threshold for three consecutive frames.   Mice were euthanized following completion of behavioural tests. Mice were first perfused with cold 1× PBS and subsequently with 4% paraformaldehyde. Fifty-micrometre sections of extracted brain tissue were sliced on a Leica VT1000 vibratome. All slices were mounted on glass slides using Vectashield with DAPI (Vector Laboratories) and imaged with an Olympus VS120 Virtual Slide Microscope.   dLight1.1 was selected to visualize dopamine release dynamics in the DLS owing to its rapid rise and decay times, comparatively lower dopamine affinity (so as to not saturate binding), as well as its responsiveness over much of the physiological range of known DA concentrations in freely moving rodents31,62,63,64.   Since dopamine-free and dopamine-bound excitation spectra have yet to be reported for the dLight1.1 sensor, a series of in vitro experiments was performed to identify an excitation wavelength whose fluorescence was stable and independent of dopamine levels, and which therefore could be used for post hoc motion artefact correction. Like GCaMP, dLight1.1 uses cpGFP as a chromophore, and various generations of GCaMP have been shown to: (1) have an increase in ligand-free fluorescence when excited with 400 nm wavelengths and (2) have an isosbestic wavelength in the UV to blue region65,66,67. To test whether UV excitation could be a suitable reference wavelength for dLight1.1, HEK 293 cells (ATCC, cells were validated by ATCC via short tandem repeat analysis and were not tested for mycoplasma) were transfected with the dLight1.1 plasmid (Addgene 111067-AAV5) using Mirus TransIT-LT1 (MIR 2304). Cells were imaged using an Olympus BX51W I upright microscope and a LUMPlanFl/IR 60×/0.90W objective. Excitation light was delivered by an AURA light engine (Lumencor) at 400 and 480 nm with 50 ms exposure time. Emission light was split with an FF395/495/610-Di01 dichroic mirror and bandpass filtered with an FF01-425/527/685 filter (all filter optics from Semrock). Images were collected with a CCD camera (IMAGO-QE, Thermo Fisher Scientific), at a rate of one frame every two seconds, alternating the excitation wavelengths in each frame. Image acquisition and analysis were performed using custom-built software written in MATLAB68 (Mathworks). Cells were segmented from maximum-projection fluorescence images using Cellpose69. Cells with a diameter of less than 30 pixels were excluded from downstream analysis. Fluorescence traces were denoised using a hampel filter (window size 10 and threshold set to 2 median absolute deviations from the median) and normalized to ΔF/F0. Cells were included if their maximum ΔF/F0 exceeded 5%. F0 was computed by fitting a bi-exponential function to the time series.   Eight- to ten-week-old C57BL/6J (n = 6 mice, The Jackson Laboratory stock no. 000664) mice of either sex were anaesthetized using 1–2% isofluorane in oxygen, at a flow rate of 1 l min−1 for the duration of the procedure. AAV5.CAG.dLight1.1 (Addgene #111067, titre: 4.85 × 1012) was injected at a 1:2 dilution (either sterile PBS or sterile Ringer’s solution) into the DLS (AP 0.260; ML 2.550; DV −2.40), in a total volume of 400 nl per injection. For all stereotaxic implants, AP and ML were zeroed relative to bregma, DV was zeroed relative to the pial surface, and coordinates are in units of mm. Injections were performed by a Nanoject II or a Nanoject III (Drummond) at a rate of 10 nl per 10 s, unilaterally in each mouse. A single 200-µm diameter, 0.37–0.57 NA fibre cannula was implanted 200 µm above the injection site at the DLS (DV −2.20) for photometry data collection. Finally, medical-grade titanium headbars (South Shore Manufacturing) were secured to the skull with cyanoacrylate glue (Loctite 454).   Mice were group-housed prior to stereotaxic surgery procedures, and following surgery were individually housed on a 12-hour dark–light cycle (09:00–21:00). All behavioural recordings were done between 010:00 and 17:00.   Six- to 12-week old DAT-IRES-cre mice (n = 10 mice, The Jackson Laboratory stock no. 006660) of either sex were injected with the same dLight1.1 virus described above into the right hemisphere DLS. Additionally, using the same previously described surgical procedure, 350 nl of AAV1.Syn.Flex.ChrimsonR.tdTomato (UNC Vector Core, titre: 4.1 × 1012) was injected into the right hemisphere SNc (AP −3.160; ML 1.400; DV −4.200 from pia), in a 1:2 dilution for calibration and stimulation experiments (see below). Mice were implanted unilaterally with a 200 µm core 0.37–0.57 NA fibre over the DLS for simultaneous stimulation and photometric data collection.   Two of the ten mice were used to calibrate optogenetic stimulation (see ‘dLight calibration experiments’). The other 8 mice injected with dLight and ChrimsonR were also run through the 3 complete closed-loop experiments described in ‘Closed-loop DLS dopamine stimulation experiments’ (one experiment with 250 ms continuous wave (CW) stimulation, one with 2 s CW stimulation, and another with 3 pulsed stimulation, 25 Hz frequency with 5 ms pulse width). Baseline data from these experiments were combined with mice described in ‘Fibre Photometry for dLight recordings’, thus yielding a total of n = 14 mice. Two of the 12 dLight only mice did not pass our quality control criteria for dLight recordings and were thus excluded from all dLight analysis (note that they were included in Extended Data Fig. 2a–b,d only, which strictly used behavioural data). Baseline data were considered data from the day prior to a stimulation day, or the day after with the targeted syllable excluded (yielding n = 378 experiments total). If the targeted syllable could not be reasonably excluded then data from the day after a stimulation day was excluded entirely.   Depth videos of mouse behaviour were acquired at 30 Hz using a Kinect 2 for Windows (Microsoft) using a custom user interface written in Python (similar to ref. 60) on a Linux computer. For all OFA experiments, except where noted, mice were placed in a circular open field (US Plastics 14317) in the dark for 30 min per experiment, for 2 experiments per day. As described previously, the open field was sanded and painted black with spray paint (Acryli-Quik Ultra Flat Black; 132496) to eliminate reflective artefacts in the depth video.   To assess whether spontaneous dLight transients in the DLS were of appreciable magnitude compared to reward consumption-related transients, a series of separate dLight photometry experiments were run to measure reward consumption-related transient magnitudes (n = 6 mice). For two days prior to the experiment, mice were habituated to the open field arena for two 30-min experiments on each day. On the morning of the experiment, to increase the salience of food reward, mice were habituated to the experimental room and food and water restricted for 3–5 h prior to beginning the experiment. Mice were placed in the arena, and behaviour and photometry data were simultaneously acquired. Chocolate chips (Nestle Toll House Milk Chocolate) were divided into quarters and introduced into the arena at random intervals and locations decided by the experimenter (with an average of 1 chocolate chip piece every 4 min) for mice to freely consume for a total of 30 min. To identify reward consumption-related responses, a human observer indicated each moment in time during the experiment where mice began to consume the chocolate via post hoc inspection of the infrared video captured by the Kinect. Photometry signal peaks for Fig. 2a were identified at the onset of consumption. Mean spontaneous transient peak had observed magnitudes of 2.12 ± 0.80 ΔF/F0 (z) (n = 5,247 transients). By comparison, mean reward consumption-associated transients had an approximate magnitude of 2.36 ± 0.92 ΔF/F0 (z) (n = 10 transients).   Photometry and behavioural data were collected simultaneously. A digital lock-in amplifier was implemented using a TDT RX8 digital signal processor as previously described27. A 470 nm (blue) LED and a 405nM (UV) LED (Mightex) were sinusoidally modulated at 161 Hz and 381 Hz, respectively (these frequencies were chosen to avoid harmonic cross-talk). Modulated excitation light was passed through a three-colour fluorescence mini-cube (Doric Lenses FMC7_E1(400-410)_F1(420-450)_E2(460-490)_F2(500-540)_E3(550-575)_F3(600-680)_S), then through a pigtailed rotary joint (Doric Lenses B300-0089, FRJ_1x1_PT_200/220/LWMJ-0.37_1.0m_FCM_0.08m_FCM) and finally into a low-autofluorescence fibre-optic patch cord (Doric Lenses MFP_200/230/900-0.37_0.75m_FCM-MF1.25_LAF or MFP_200/230/900-0.57_0.75m_FCM-MF1.25_LAF) connected to the optical implant in the freely moving mouse. Emission light was collected through the same patch cord, then passed back through the mini-cube. Light on the F2 port was bandpass filtered for green emission (500–540 nm) and sent to a silicon photomultiplier with an integrated transimpedance amplifier (SensL MiniSM-30035-X08). Voltages from the SensL unit were collected through the TDT Active X interface using 24-bit analogue-to-digital convertors at >6 kHz以及驱动紫外线和蓝色LED的电压信号也存储在离线分析中。   然后将PMT的输出解调为由蓝色和紫外线LED产生的组件。电压信号乘以两个驱动信号(与蓝色和紫外线LED激励分别相对应),并使用三阶椭圆滤波器(最大旋转:0.1;停止衰减:40 dB;角频;角频率:8 Hz)。UV组件用于参考信号。   为了使光度法和行为数据保持一致,实现了基于自定义的IR LED同步系统。两组3套IR(850 nm)LED(Mouser零件#720-SFH4550)连接到录音桶的壁上,并针对Kinect深度传感器。用于为LED供电的信号被数字复制到TDT。使用Arduino为每个LED组生成一系列脉冲。一个LED设置每2秒钟之间和关闭状态之间过渡,而另一个LED设置每2-5 s随机转变为一个状态,并保持在状态1 s。在Kinect捕获的TDT和IR视频中获取的光度计数据中检测到了每个LED集合的序列。将序列的时间戳在每个记录方式上对齐,并将光度计记录降至30 Hz,以匹配深度视频采样率。相同的机制被用来将光度计数据与扩展数据中的关键点对齐。   通过首先计算ΔF/F0将解调的光度法轨迹归一化。通过使用5-S滑动窗口计算光度计幅度的第10个百分位数来估算F0,以说明Dlight和UV参考通道之间的缓慢,相关的荧光变化。使用此过程将DLIMPL和参考通道都标准化。由于紫外线参考信号捕获荧光中的非配体相关波动(源自血液动力学,pH变化,自动荧光,运动伪像,机械移动等),因此从Dlight通道中减去了拟合参考信号(请参阅“光量表有效参考”)。最后,使用20-S滑动窗口对引用的Dlight痕迹进行Z得分,并在整个实验中具有单个样本尺寸滑动,以消除由于长时间效果而导致的ΔF/F0振幅趋势缓慢的趋势,例如,光泽。仅包括最大百分比ΔF/f0超过1.5的实验,而参考相关性的Dlight则包括在0.6以下进行进一步分析。   为了从下游分析中删除运动和机械伪像的效果,如“光度法预处理” 31,54中最初提到的(扩展数据图1G)中最初提到的拟合参考信号被减去拟合参考信号。首先,用二阶Butterworth滤波器(3 Hz角频率)进行低通滤波。接下来,为了说明增益或直流偏移的差异,使用RANSAC普通最小二乘回归来找到斜率和偏置来转换参考信号以最大程度地减少参考和Dlight光度法痕迹之间的差异。最后,从Dlight Trace中减去转换后的参考跟踪。   为了捕获3D关键点,在带有透明地板和墙壁的多相机开放式竞技场中记录了小鼠。从六台相机(Microsoft Azure Kinect;摄像机放置在上方,下方和四个主要的主要方向上)的近红外视频记录。训练了带有HRNET架构的单独的深神经网络,以使用约1,000个手持标签的帧70来检测每个视图(顶部,底部和侧面)中的关键点。框架标签是通过商业服务(Scale AI)众包,其中包括尾尖,尾部,脊椎沿着三个点,每个后肢的脚踝和脚趾,前臂,耳朵,鼻子,鼻子和植入物。在每个摄像头检测到2D关键点后,对3D关键点坐标进行了三角测量,然后使用gimbal(一种基于模型的方法来利用解剖学约束和运动连续性71。世界性需要学习解剖模型,然后将模型应用于多相机行为记录。对于模型拟合,我们遵循参考文献中描述的方法。71,使用50个姿势状态,并使用Sklearn的Ellipticenvelope方法排除了异常值。为了将gimbal应用于行为记录,我们再次遵循71,将参数obs_outlier_variance,obs_inlier_variance和pos_dt_variance和pos_dt_variance分别为所有键盘设置为1E6、10和10。   To compute 2D translational velocity, the centroid of the keypoints associated with the spine (approximating whole-body movement) was computed for the x and y planes (the z plane was disregarded). Then, the velocity was computed from the difference in position between every 2 frames and divided by 2 (to provide a smoother estimate of velocity). 3D translational velocity was computed the same way, except the z plane was included in the calculation. The average velocity of the keypoints associated with the forepaws were used to compute 3D forelimb velocity.   To compute the relationship between dLight and forelimb velocity, other kinematic parameters known to be correlated with dLight were partialed out of the dLight fluorescence signal. Specifically, 2D velocity, 3D velocity and height were partialed out of dLight using linear regression. Then, the correlation between the partialed dLight signal and 3D forelimb velocity were computed and compared to 1,000 bootstrapped shuffles.   A changepoint detection algorithm was used to find moments where mice transitioned from periods of relative stillness to movement. To capture long bouts of movement, the velocity of the 2D centroid of the mouse was z-scored across each experiment and then smoothed with a 50-point (1.67s) boxcar window. To find sharp changes in velocity, the derivative of smoothed velocity trace was computed, and the result was raised to the third power. Peaks in this velocity changepoint score were discovered using SciPy’s findpeaks function with the following parameters: height 1, width 1, prominence 1 so that consecutive data points around each peak were disregarded.   为了说明音节持续时间的可变性,用于扩展数据的时间扭曲时间图4A。在这里,使用numpy.interp函数将所有Dlight Trace插值插值,持续时间为0.83 s或25个样本。因此,超过0.83 s的音节被线性压缩,并且短于0.83 s的音节线性扩展。我们获得了相似的结果翘曲轨迹至0.4 s;因此,持续时间扭曲实例不会影响对后续分析的解释。   对于图1F,顶部和底部所示的DLIGHT波形,H,I,K和扩展数据图。4C – G,5C,F和7C,使用平均值和S.D对Z得分进行了首次开始对齐的波形。发作后10 s之前10 s的荧光值。接下来,为了说明每个平均值的音节实例数(试验)的差异,相对于平均值和s.d的z得分,波形还通过z得分进行了归一化。在1,000个洗牌平均值中,在平均之前进行了单个试验。   为了从Dlight波形或DLIMPLE峰解码音节身份,随机森林分类器72(curf = 1,000棵树,最大深度= 1,000,bin的数量,bin = 128,在5个数据的5倍上进行了交叉验证),以预测在固定数据上的音节和群体标识(类似于参考文献27)。音节组是通过基于成对Moseq距离的层次聚类音节(见下文)创建的,并增加阈值,距离截止的步骤为0.2。随机森林分类器的输入是:(1)每个音节实例的音节发作后的最大Z尺寸的Dlight值从音节发作到300 ms,或(2)Dlight波形及其衍生物在单个音节实例中最高300毫秒的音节发作,最高为300 ms。将持有的精度与100个音节身份的散布进行了比较。   为了从Dlight波形解码转向方向(扩展数据图5C),对线性支持向量机进行了训练,以对特定的音节实例进行分类是使用五倍数据折叠的交叉验证的左 - 向右转动音节。为了采样转弯音节的行为空间,选择了八个具有最大角速度的音节,每个转弯方向为四个。该模型适合于在单个音节实例发作后至300毫秒内从音节发作开始的波形及其衍生物,并在持有数据上进行了测试。   如先前所述27,计算了两个音节之间的Moseq距离。简而言之,每个音节的估计自回旋矩阵用于通过主成分空间生成合成轨迹(即,在由深度视频的前十个主要成分定义的空间中)。然后,计算了所有成对音节的轨迹之间的相关距离。由于Moseq的在线和离线变体使用了相同的自回归矩阵,因此这些距离在在线和离线变体中相当。   与音节转变相关的Dlight荧光被计算为每个音节发作后的最大Z尺寸Dlight值,从音节开始到300毫秒,以说明多巴胺释放中的抖动或在定义音节变更点的技术抖动中的抖动。在整个文本中,我们将与音节相关的波形峰幅度称为Z得分ΔF/F0单元的“音节相关的Dlight”。然后为每个音节和每个实验平均这些灯光值。为了评估与音节相关的DLIMPLE和音节计数之间的相关性,在每个实验中,Z得分跨音节均得到z得分。这些归一化的DLIMPLE峰表示在给定实验期间音节是相对较高还是更低的。最后,然后在每只小鼠的实验之间平均实验归一化的灯光值以及音节计数,从而为每只小鼠和每个音节留下一个值。   为了衡量Dlight峰值和音节计数之间的线性关系,使用Huber Recressor73的稳健线性回归预测了平均Dlight Dlight峰的平均音节计数。使用五倍的交叉验证重复100次评估回归模型。报告的相关值估计1J和2在持有数据上。通过将持有的相关值与根据在洗牌数据计算的线性模型估计的相关值进行比较来估计p值。为了删除由于有限尺寸效应而变化的音节,只包括每只鼠标所有实验的总数至少100倍的音节。   为了计算音节熵(估计与每个音节相关的传出过渡的随机性),通过计算出现在实验中向所有其他的事件的出现数量来计算与每个鼠标每个音节相关的传出过渡概率,并将其表示为概率分布。接下来,根据每个音节的传出过渡概率估算了香农熵。最后,使用用于音节计数的完全相同的过程估算线性回归。   这一系列分析总共查询了379个实验。为了捕获音节相关的Dlight峰与音节相关的行为特征(音节频率,音节熵)之间的相关性,但在每个实验中,首先,计算了每个音节过渡的音节发作后的最大Z得分dlight振幅从音节开始到300 ms。对于每个实验和音节,将这些音节相关的Dlight峰取平均值。然后,每个音节的Dlight峰值平均值和小鼠在实验中分别对z得分。此外,为了将每个音节在相同的尺度上放置每个音节的变化,对于每个音节,每个音节和鼠标在实验中也是z得出的(图2B,i,底部)。接下来,为了消除计算中的可变性,为每个实验在音节中汇总值,从而为每个实验和鼠标留下一个值。为了删除由于有限尺寸效应而变化的音节,首先仅考虑每次一次至少50次的音节进行下游分析。线性模型(HUBER回归器)适合所得的平均Dlight峰,音节频率和音节熵,并如上一节所述进行了评估。   这一系列分析总共询问了760个音节 - 实验对。通过在每个音节过渡开始后从发作到300毫秒的最大Dlight值来估计Dlight峰值值。每个音节的速度,音节计数和Dlight峰值在扩展的bin尺寸上平均;也就是说,在过渡值后的随后的n个音节中估算了速度,音节计数和Dlight峰值值,计算了dlight值,其中n不变从5个音节到400个(图2E)。为了避免有限的尺寸效应,为序列随机性,每个音节的dlight值都被归纳为20个均匀间隔的bin(图2K)。然后,将过渡矩阵组合在每个鼠标的所有音节中和每个时间箱中。最后,然后在Dlight值和每个bin尺寸估计的行为特征之间计算Pearson相关值。Pearson系数使用平均值和S.D. z得分。从洗牌峰值值后估计的皮尔逊系数。   请注意,为了防止行为的一致性非平稳性,为了防止测量的影响,这些相关性是在图2E中虚线所示的五次段中的五个段中计算的。然后,平均分段相关性。   通过使用Scipy的CurveFit函数将指数衰减曲线拟合到每个bin尺寸的相关值74,可以估算与dlight值和行为特征之间相关性相关的时构与行为特征之间相关性相关的相关性。衰减功能适合数据的1,000多个引导重新示例。所描述的分布在每个重新样本上都适合taus。   与给定音节的所有实例相关的Dlight荧光在三分钟的窗口(基于图2F中的衰减选择)上进行了分组,并与在3分钟窗口中使用相同的音节的使用相关,其中窗口移动指示量(x轴)。相关值(在图2g,h中)是使用平均值和s.d进行z得分的。从洗牌。通过洗牌测试估算P值。   通过手工标签人群视频手动将音节分为6个类,汇总模型输出4,27,60。然后,为每个类中的所有音节平均与音节相关的Dlight平均。   与线性回归分析(上一节)一样,通过将最大z得分振幅从音节发作到发作后300毫秒来估计。在“分析实验中Dlight和音节统计数据之间的关系”中所述,对每个bin大小进行计算,在每个垃圾箱大小上计算每个过渡之后,行为特征(熵,速度和音节计数)。使用的垃圾箱尺寸为5、10、25、50、100、200、200、400、800和1,600个音节。每个实验的音节频率,音节熵和速度平均,每个垃圾箱尺寸的音节。然后,为每只小鼠分别对这些音节和实验范围的平均值分开z得分,然后为每只鼠标和每个音节平均。为了消除行为特征之间的相关性,使用零相分量分析(ZCA)美白,它们是白色的。然后,将白色的行为特征送入贝叶斯线性回归模型,以根据以下方程来预测每个音节的平均DLIMPLE峰值幅度:每只鼠标:   其中x定义为特征,β为回归系数,y是dlight峰值,σ为s.d。,n是正态分布。在回归系数上放置了正常的先验,并将指数的先验放在S.D上。使用numpyro(n = 1,000个热身样品,然后n = 3,000个样品)75通过NOMPYRO(n = 1,000个热身样品,然后n = 3,000个样品),通过无掉头采样器(螺母)绘制来自后部的样品。为了评估行为特征和dlight之间的时间关系,每个滞后都拟合一个单独的模型(在此,每个滞后中分别白色的特征,扩展数据图6C)。通过以近似最佳的bin尺寸为模型的特征来量化整体模型性能。对于运动学参数和熵,该垃圾箱大小(滞后)为10个时间段;对于音节计数,此垃圾箱大小(滞后)为100个时间段(在音节时间)。然后,分别馈送每个功能以量化特征子集的性能。   为了预测音节计数,音节熵,速度(2D,角度和高度速度)和加速度的瞬时Dlight振幅,估算了一系列卷积内核,每个卷积内核都从每个行为特征中映射到Dlight幅度。从数学上讲,模型可以如下写:   dlight(t)对应于时间t步骤t,f(t)是时间步骤t的行为特征,而β是卷积内核的重量。使用JAX Library 76使用HUBER损失优化了内核重量。也就是说,每次样本都可以通过卷积内核来预测样品的振幅幅度,然后将其汇总到跨特征中的结果。通过记录实验对该模型进行了训练和评估,并在预测的振幅幅度和实际幅度之间的Pearson相关性在固定实验中评估。为了消除高频噪声对训练和评估的影响,在进行训练和评估之前,使用60样本(2-S)的盒装滤波器对Dlight痕迹进行平滑。   解码模型旨在捕获多巴胺对行为统计的两种主要影响 - 使用和测序。解码模型的目的是预测过去多巴胺的一系列音节的可能性。该模型包括两个关键特征:(1)通过过去的音节相关多巴胺缩放音节用法的组件,以及(2)通过过去的全局多巴胺来缩放下一个音节选择的随机性。这可以用以下等式来概括:   where st is the syllable a mouse performs at time t during a behaviour experiment, dat is the peak dLight recorded for syllable st, τa and τb describe the timescale of the usage and choice randomness component respectively, αa and αb scale the usage and choice randomness components respectively, and δ is the Dirac delta function (that is, one-hot encoding) that returns 1 when st − 1 = i and 0 otherwise.   使用行为数据分析的近似值固定参数αb,τa和τb(图2),仅通过最大程度地提高上述功能的可能性来学习αA,因为鉴于跨每个实验的syllables小鼠的序列在每种实验中进行了跨实验和峰值Dlight序列测量。这是通过评估函数在αA多个值上的可能性来完成的。τa(描述多巴胺对未来音节用法/计数的影响)固定在100个音节时间段,τb(描述多巴胺对音节序列熵的影响)固定在10个音节时间段上。这些值是从图2中报告的中值τ值近似的。   To test model performance, data were split into 5 folds of training and test experiments and repeated 100 times using repeated K-fold cross-validation. We then computed the Pearson correlation between syllable counts from model simulations and actual syllable counts after smoothing with a 50-point rolling average. The one free parameter was fit using the training dataset and assessed on the test dataset. To avoid degradation in performance due to syllable sparsity, the top 10 syllables were used. The model was compared to a suite of control models, each evaluated over the same folds. The dopamine phase shift model was evaluated on the same data, but with all dopamine traces circularly shifted by a random integer between 1 and 1,000, and the noise model was evaluated with dopamine traces replaced by numbers drawn from a unit variance random normal distribution (since the traces were z-scored). In order to determine the maximum possible performance, the per experiment number of counts per syllable was correlated with the across-experiment average. Here, the model performed significantly better than controls. Median Pearson correlation between held-out predictions and observed data: actual model r = 0.20, phase shift control r = 0.04, noise model r = 0.04. Comparison between actual model and controls, P = 7 × 10−18, U = 2,500, f = 1, Mann–Whitney U test, n = 50 model restarts.   为了测试内源性和外源多巴胺线性结合以改变单个行为音节的使用情况的假设,对当前的解码模型进行了修改。在捕获试验中观察到的(图4G – I)时,在添加(或减去)多巴胺(称为“额外DA”)时(称为“额外DA”)在添加(或减去)多巴胺(称为“额外DA”)时确定了最大相关性(图4G – I)。然后计算了基于模型的对数刺激日实验中的固定音节选择的可能性。该模型的其他版本(如图4H所示)包括:(1)一个控制模型,其中未在模型中添加“额外DA”(“无偏移”),(2)使用Dlight Tract(“随机移位”)的相移版本的控件,以及(3)使用与正态分布的随机数中的模型,并匹配正态分布,并匹配了均匀的信号(“'noice))。   为了表征开放场中诱发的多巴胺瞬变的速度和幅度,使用表达Chrimsonr的DLS中的SNC轴突的简短光遗传学刺激引起了Dlight瞬变,而小鼠则自由地探索了开放式田地竞技场77。测试了许多刺激参数,使用不同的光强度,刺激长度以及是否作为单个连续波脉冲传递或作为多个快速短脉冲传递。A single, short (250 ms; roughly the timescale of syllables), continuous stimulation pulse of red light at 10 mW (Opto Engine MRL-III-635; SKU: RD-635-00500-CWM-SD-03-LED-0) most effectively matched the amplitude and dynamics of endogenous dLight transients observed in the open field.在2.18±0.85ΔF/F0(Z)时测量平均光DA峰,平均自发峰= 2.23±0.62±0.62ΔF/F0(Z)和99个百分位自发峰= 3.40ΔF/F0(Z)的脉冲刺激也可能会导致刺激的脉冲,并导致脉冲的刺激量可能会导致脉冲次数。唤起长时间的发行版78,79,80。请注意,当用635 nm的光激发时,表达Chrimsonr的神经元中光引起尖峰的效率类似于效率,蓝光引起了表达CHR277的神经元中尖峰的效率。   初步选择10 MW光的单个连续脉冲,作为所需的光遗传学刺激,以从DLS多巴胺轴突中释放多巴胺,在10个刺激参数中,在10个对Dlight和Dlight和Chrimsonr注射的总小鼠中,在开放式场中进行了另一轮开环刺激。在这两只小鼠中,刺激时间之间的间隔是通过为每种刺激的6到17 s的整数延迟而随机选择的。选择此范围是为了确保每只动物在实验过程中至少受到100个刺激。这种能够对具有预期参数的更多刺激试验的分析,以验证诱发瞬变的幅度与自发诱发的瞬变相同的数量级内(图3C)。   作为建立DLS多巴胺编码特异性的一系列控制实验,使用上述相同的技术在DMS中进行了Dlight记录。在AP:0.26,ML:1.5和DV:-2.2上,对性别(C57BL/6J,n = 8)的野生型小鼠(C57BL/6J,n = 8)进行了立体定向注射。光度法(在C57BL/6J小鼠中,n = 8,n = 64记录实验)以上述方式植入坐标:AP:0.26,ML:1.5,DV:-2.0。如上所述,对这些数据进行了开放的野外行为记录和编码模型。   使用了八到十五周大的dat-ires-cre :: ai32小鼠,由dat-ires-cre小鼠(杰克逊实验室,006660)和AI32小鼠(Jackson Lab,012569)产生。双转基因DAT-IRES-CRE :: AI32小鼠系以前已用于进行特定的多巴胺能神经元激活10,81,82。如上所述,使用了类似的手术程序,除了两个200 µm 0.37 Na多模光纤在DLS上植入DLS(AP 0.260; ML 2.550; DV-2.300),在Dat-IRES-CRE :: AI32小鼠(n = 20)中。两性的对照动物(Dat-ires-cre小鼠,n = 12)在同一坐标上植入了双侧,其中6只动物植入了伏隔核(AP 1.300; ML 1.000; ML 1.000; DV-4.000)。这些动物在整个手稿中统称为“无粘蛋白控制”。使用氰基丙烯酸酯将医疗级钛头栏固定在头骨上。然后在手术后2-3周进行光学刺激实验。   在闭环刺激时间表之前的两天(图3D),每天将小鼠习惯于桶进行两个30分钟的实验。为了通过音节触发的光遗传刺激测试特定音节的统计数据的变化,针对六个选择的目标音节中的每一个都在为期三天的时间表中进行了实验。在第一天,对每只鼠标进行了两个30分钟的实验,以表征基线目标音节用法。在第二天,每只小鼠进行了两个30分钟的“刺激”实验。在这些实验中,蓝光(470 nm,10 mW,一个250毫米连续的波脉冲)在75%的目标音节检测中传递。刺激没有在目标前发生的音节上进行调节。最后,在第三天,重复基线实验记录,以评估增强后的音节用法记忆和用法衰减。对于每只小鼠的一半靶向音节(跨小鼠随机),刺激前的基线实验与不同音节的刺激后基线实验相同(见图3D)。选择了为期三天的节奏每天进行多个短行为记录实验,以最大程度地减少实验中音节用法中的非平稳性,也不要将小鼠暴露于行为竞技场每天总小时以上。为了控制对目标音节用法变化的顺序影响,随着时间的推移,动物被随机分为两组,每组在三周节奏的六个刺激日内都有独特的目标音节订购。每天第一个实验与第二个小鼠的第二个实验之间的时间间隔(记录或刺激)平均为195分钟±58分钟(S.D.)。完成行为测试后,将小鼠安乐死,并使用上述程序进行组织学。   为了评估增加多巴胺释放的效果,这些实验在n = 3 dat-cre :: ai32和n = 2(dat-ires-cre)对照动物中重复使用3-S脉冲刺激(25 Hz,5 ms脉冲宽度)。   dat-ires-cre :: AI32小鼠(n = 5)进行了90分钟的记录和操纵实验。在最初的30分钟内,我们估计了特定目标音节的速度分布。然后,在接下来的30分钟内,当根据我们的闭环系统表达音节以及动物的音节特异性速度超过第75个百分位或低于25%时,触发了光遗传刺激。仅当小鼠至少收到50个刺激并增加目标音节的使用相对于其平均基线(通过单独的记录实验而没有刺激而建立),才对实验进行分析。   首先,计算每个30分钟刺激实验的30-S滑动窗口(非重叠)中执行目标音节的次数。然后,累积总和。为了将结果转变为对目标数量过多的估计,还从最近的基线日的早晨和晚上实验中计算了累积总和。最后,对早晨和晚上的基线估计的平均值进行了平均并减去。   “学习者”小鼠被定义为小鼠,其目标计数的平均变化超过所有音节的基线超过了No-Oppin Control Animals所表现出的目标计数的最大平均变化。这些n = 9只动物用于随后分析目标运动学和学习特异性(扩展数据图10)。   为了评估由于光遗传学刺激的结果是否在时间上毗邻目标的音节,平均是及时接近目标的音节。具体而言,计算了所有非目标音节与目标之间的平均时间,以及它们在基线上方的计数上的变化。然后,当音节平均相对于目标单元中的目标单元中的目标单元的平均值时,在目标单元中从目标单元中进行了归纳。最后,对于每个实验,计算了每个垃圾箱中所有音节的基线计数的加权平均值。   为了了解是否也加强了与目标相似的音节,在实验中计算了每个音节的平均速度从发作到偏移的平均速度。然后,从每个音节的平均速度中减去目标的平均速度。最后,每个音节的基线计数的变化都是由其目标速度差异归纳的。   为了量化光学参数和序列随机性的影响,在五个音节长的非重叠bins中估计了从刺激开始的五个音节长的非重叠bin中估计。选择此窗口以最大程度地减少下游计算中的噪声,同时保留合理的时间分辨率。为了补偿整个实验中的行为非平稳性,小鼠和目标音节,熵,速度和加速度前刺激性发作是从刺激后的值中减去的。最后,使用平均值和S.D对这些基线提取的值进行z评分。根据捕获试验估计。   如上所述,还通过闭环增强实验运行了八只注射了Dlight和Chrimsonr的小鼠。增强实验以250 ms的10MW CW刺激进行了解码分析,分析了在添加“额外DA”的实验过程中,外源多巴胺释放如何改变了音节的使用(图4G – I)。   为了使用解码模型来预测外源诱发的Dlight荧光的大小,在小鼠表达目标音节和接收刺激的每种实例上,在捕获试验上观察到的目标音节的平均Dlight荧光代替了,在该试验中,没有光学刺激。然后将一个偏移量(称为“额外DA”)添加到每个音节的实例中,其中鼠标接收到刺激。计算了一系列额外的DA偏移量(因此是一系列外源添加的多巴胺),计算了在光DA实验期间表达的音节序列的可能性。使用“解码模型预测Dlight行为”中描述的完全相同的步骤评估了该模型,除了重复的K折(5倍拆分重复100次)在刺激实验中进行了。将模型的“额外DA”输出与从表达Chrimsonr介导的闭环增强的动物收集的经验光度数据进行了比较(图4i)。   为了评估多巴胺在基线上的影响是否可以预测Opto-DA加固,我们在实验中使用了多巴胺波动与音节统计(使用和熵)之间的相关性。具体而言,我们计算了“ Dlight级别和用法之间的相关性”,如“分析实验中Dlight和音节统计之间的时刻之间的关系”(图2E,K)(图2E,K),除了评估了每个鼠标和每个音节的相关性。使用平均值和s.d对每个bin尺寸的值进行z得分。根据改组数据计算的相关性。在这里,n = 100个散打物用于与熵的相关性,以提高计算效率。为了确定每个小鼠和音节的这些相关曲线的调制深度,我们使用了S.D.跨垃圾箱尺寸的相关值。这导致了一个反映多巴胺对所有音节 - 小鼠对的使用(endo-da计数)和熵(endo-da熵)的短期影响的值。最后,对于图4b,图4D,将这些估计值平均为图4b,C,每个音节。然后,相对于基线天数,刺激天数的log2折叠变化被用作光学学习的估计值。为了减轻鼠标对小鼠的变异性,通过计算目标计数的log2倍数变化与每对每只鼠标的所有非刺激天数,将目标计数中的log2倍数变化归一化。平均和S.D.该分布中的每只鼠标用于z得分opto-da学习。   贝叶斯线性回归模型在图4b,c。将正常的先验放在回归系数上,方差上的指数先验。使用numpyro(n = 1,000个热身样品,n = 2,000个样品)75通过NO掉头采样器(螺母)绘制来自后部的样品。使用休假两次交叉验证评估性能。图4F中介绍的线性回归模型利用了Huber Recressor73。使用五倍的交叉验证重复五次评估Huber回归器的性能。   RL模型具有四个关键组成部分:一个奖励信号,一个状态,一组可用的行动集和政策(控制方式的选择)。在这里,具有SoftMax策略的简单Q学习剂旨在将鼠标行为模拟开放式的鼠标行为,以作为内源性多巴胺级别的RL过程44。我们的模型是重铸(特别是具有软马克斯策略的Q学习剂),用于使用内源性多巴胺(即音节相关的Dlight)作为奖励信号,行为音节作为状态以及行为音节作为动作之间的过渡。给定时间t + 1的音节,在时间t的音节期间出现的Dlight峰被认为是“奖励”。该模型的Q台用均匀矩阵初始化,将对角线设置为0,因为根据定义,我们的数据中没有自我泄漏。对于每个仿真的每个步骤,给定当前表达的音节(即状态),模型根据行为策略和与每个音节过渡相关的预期dlight瞬态幅度(预期奖励,指定的Q-table指定)示例可能的未来音节(动作)。然后,模型根据SoftMax方程选择了动作   其中τ是温度。该模型是实际数据的30分钟实验。数据格式化为一系列状态和音节相关多巴胺。鉴于当前状态,该模型根据SoftMax方程选择动作。为了更新Q表并模拟内源多巴胺作为奖励的效果,在标准Q学习方程中将与音节相关的多巴胺作为奖励表示为奖励。具体来说,然后根据   其中q是在状态s中定义动作概率的Q表,α是学习率,r是与动作A和状态s相关的奖励(dlight峰值(在音节A和音节s之间的过渡时的dlight峰值),γ是折现因子。通过在模拟结束时与模型产生的Q表与实验数据中观察到的经验过渡矩阵之间的Pearson相关性进行评估。在这里,在计算Pearson相关性之前,每行经验过渡矩阵和Q-table分别对z得出。请注意,在该公式中,学到的Q桌子在功能上等同于过渡矩阵。为了避免由于音节稀疏性而导致的性能降解,使用了前10个音节。   为了说明多巴胺对序列随机性的短期影响,将多巴胺依赖性术语添加到基线模型的策略中   现在温度与时间相关并根据   和,   在这里,τdecay对应于多巴胺对温度衰减的影响的时间常数,τbaseline是基线温度,ν是如果R(t)高于阈值λ,则温度升高的量,而N是阈值交叉后的时间段的数量。通过双重交叉验证将实验分为训练和测试数据集,并使用训练集拟合所有免费参数。为了将动态与仅增强模型进行比较,V设置为0-关闭动态模型的温度变化组件。请注意,我们在替代公式下观察到定性上相似的结果。我们允许该模型自由选择动作,而不是喂食30分钟的实际数据,而是从与该动作相关的Dlight Peak中随机获取奖励。   使用观察到的多巴胺幅度作为(1)奖励项(请参见上文)或(2)奖励预测误差项拟合模型。对于每种模型类型,跨α(学习率),γ(折现因子,仅在奖励模型中使用)和温度(下一个动作的随机性)进行网格搜索。使用固定的对数拟合和z得分的固定对数可能性是使用固定对数可能性从模型拟合到实验之间改组的数据的平均值和方差(n = 10次洗牌)的。此比较仅对我们的特定模型公式有效。有其他表述,多巴胺作为奖励预测错误与我们的数据一致。   所有假设检验均非参数。Mann – Whitney U测试的效应大小作为通用语言效果大小f表示。通过与n = 1,000的洗牌相关性(在整个手稿中称为洗牌测试)相比,建立了相关性。对于洗牌测试,如果所有相关性都超过1,000个混合物,则p值列为p <0.001而不是p = 0。调整p值以在适当的情况下使用Holm-Bonferonni步骤downdown步骤进行多个比较。样本量未预先确定,但与现场通常使用的样本量一致。有关使用类似技术的示例,请参见10,14。没有进行盲目,但基于Moseq的行为分析是自动化的。   框图(此处和整个过程中)遵守标准约定:边缘代表第一个和第三四分位数,而晶须延伸至包括第一个四分位数或第三四分位数的1.5个四分位数范围内的最远数据点。   除了上述相关部分中引用的分析特定包外,以下包还用于分析:numpy83,python84,seaborn85,matplotlib86和python 3(参考文献87)。   有关研究设计的更多信息可在与本文有关的自然投资组合报告摘要中获得。

本文来自作者[yjmlxc]投稿,不代表颐居号立场,如若转载,请注明出处:https://yjmlxc.cn/life/202506-8749.html

(10)
yjmlxc的头像yjmlxc签约作者

文章推荐

发表回复

作者才能评论

评论列表(3条)

  • yjmlxc的头像
    yjmlxc 2025年06月21日

    我是颐居号的签约作者“yjmlxc”

  • yjmlxc
    yjmlxc 2025年06月21日

    本文概览:  扩展数据表1中提供了试剂和资源列表。   所有实验程序均由哈佛医学院机构动物护理和使用委员会(协议编号04930)批准,并遵守哈佛大学的道德法规以及《动物保健和实验动...

  • yjmlxc
    用户062107 2025年06月21日

    文章不错《自发行为是通过强化而没有明确奖励来构成的》内容很有帮助