Homework 3 (105 points)

Submit on Canvas by Tuesday 10/25/22 at 11:53 PM. Ask questions and discuss on Piazza. Have fun!

Data set description

The data (in the file ReachData.npz) is available here and can be loaded as shown below. Please note that a .mat file of the same data for those who prefer MATLAB has been made available under Files --> Data on Canvas due to size restrictions on github. Unfortunately, I will not be able to provide any MATLAB coding support. A demo notebook playing around a little with the data will be shared at this link here (notebook from lecture).

import numpy as np
reachDataFile = np.load("ReachData.npz",allow_pickle=True)
spikeTimes = reachDataFile['spikeTimes']
timeTouchHeld = reachDataFile['timeTouchHeld']
timeGoCue = reachDataFile['timeGoCue']
timeTargetAcquire = reachDataFile['timeTargetAcquire']
cfr = reachDataFile['cfr']

Data description: There are 1127 trials, $N$, and 190 neurons, $D$. Each of the 1127 rows in the data arrays are in order of the trial meaning row \(n\) of each variable corresponds to row $n$ every other variable. Please note that all times are given in ms.

x
Creating spike count vectors from the spike times provided is a big part of all three questions. These vectors will be of size $D$x$1$, where $D$ is the total number of neurons (190). All of the classification will be on these vectors, and you should combine across neurons by taking the product of probabilites (not by doing anything that would colapse across neurons or mix their information).

Problem 1

Plan data only vs movement data (35 points)

Define two windows of spikes, the "plan window" encompassing spikes which were emitted between target onset and the go cue, and the "movement window" encompassing spikes which were emitted between the go cue and the time movement ends (note that both types of windows are of different lengths for for different trials). Here is example code to find the array of spikes in the plan window:

planSpikes = []
for trialIdx, trialSpikes in enumerate(spikeTimes):
    planSpikes.append([np.sum((st > timeTouchHeld[trialIdx]) &
     (st < timeGoCue[trialIdx])) for st in trialSpikes])
planSpikes = np.array(planSpikes) # will be 1127 x 190 
Randomly choose 50 trials per target to be set as the training data (400 trials total). The remainng trials will be used as test data. We can describe the number of spikes that a neuron produces in each window (vector of neural counts as $\textbf{d}_{plan}$ and $\textbf{d}_{move}$, where $\textbf{d}\in\mathbb{R}^{D}$, where $D$ is the number of neurons) using a wide variety of random process models. Use the training data to estimate the parameters of a target- direction-dependent vector Poisson process, i.e., $Pr(\textbf{d}_n∣target_k)∼Poisson(\textbf{λ}_k)$ $=$ $\prod^D_d Poisson(\textbf{λ}_{k,d})$. In other words, calculate a Poisson model for each neuron and class - the rate for a class will be a vector the size of the number of neurons. For each of the remaining test trials estimate the maximum loglikelihood reach target. What is the overall decoding accuracy (i.e., what fraction of trials are decoded correctly) using only the plan window data? Using only the movement window data? Combining both plan and movement data? Why might these be different? Do the accuracies match your expectations? Why or why not?

CLICK FOR BEWARES AND ONE HELPFUL HINT!

x
  1. If you are getting decoding numbers below 90%, you probably have not calculated rates or properly applied the rates to the right duration windows each trial. Remember that a Poisson process is defined by $\lambda * \textbf{T}$!
  2. If you have estimated a rate to be 0 for a particular reach direction, this will likely cause problems. Be pseudo-Bayesian. Never go full Bayesian. And please replace the 0s with a smol number.

Problem 2

Amount of plan data (40 points)

  1. The plan periods in the data are either 755 or 1005 ms. Decode using less than the full plan period. Generate new models for plan periods of increasing size (50 ms to 750 ms in 50 ms increments, where all plan windows begin at the target onset time). Using maximum likelihood estimation, decode the reach target for both train and test data using these different sized windows of neural data. Plot the decoding accuracy as a function of the size of the decoding window. Describe what you see and why it might be like this.

  2. Now, instead of using an increasing window size, use a constant 200 ms window, but slide the window from the start time from target onset to 550 ms after target onset (use 50 ms steps). Generate new models for each window location and decode the reach target for the train and test data. Plot the decoding accuracy as a function of the temporal location of the decoding window. Describe what you see, ponder why you see what you see, and finally type up your pondering thoughts.

Problem 3

Number of neurons (30 points)

In your model in Problem 1, you decoded maximum likelihood targets using all 190 neurons. Now, perform a "neuron dropping analysis". Starting with your full model, randomly eliminate neurons and evaluate decoding accuraccy in the reduced dataset. Eliminate between 20 and 180 neurons (by 20s -- so decode using between 10 and 190 neurons in steps of 20 neurons) -- average each point (number of neurons) by randomly choosing 20 neurons to be dropped 20 times (i.e., drop 20 neurons at each step 20 times to account for variability as some neurons probably contribute more information than others). Plot decoding accuracy as a function of the number of neurons available to the decoder. Describe what you see and hypothesize why this might be the case.