Recent neuroscientific and technical developments of brain machine interfaces have put increasing demands on neuroinformatic databases and data handling software, especially when managing data in real time from large numbers of neurons. Extrapolating these developments we here set out to construct a scalable software architecture that would enable near-future massive parallel recording, organization and analysis of neurophysiological data on a standard computer. To this end we combined, for the first time in the present context, bit-encoding of spike data with a specific communication format for real time transfer and storage of neuronal data, synchronized by a common time base across all unit sources. We demonstrate that our architecture can simultaneously handle data from more than one million neurons and provide, in real time (<25 ms), feedback based on analysis of previously recorded data. In addition to managing recordings from very large numbers of neurons in real time, it also has the capacity to handle the extensive periods of recording time necessary in certain scientific and clinical applications. Furthermore, the bit-encoding proposed has the additional advantage of allowing an extremely fast analysis of spatiotemporal spike patterns in a large number of neurons. Thus, we conclude that this architecture is well suited to support current and near-future Brain Machine Interface requirements. Keywords Electrophysiology · Databases · Data encoding · Real time Introduction past decades. Within the field of neural prosthetics, for instance, the general feasibility of real time control of Brain Machine Interfaces (BMI) and Brain Computer robotic arms using multi-electrode-array recordings of Interfaces (BCI) have developed substantially during the cortical neural activity has been demonstrated (Wessberg et al. 2000) and, more recently, a robotic device allowing advanced arm and hand movements has been successfully implemented in tetraplegic subjects (Velliste et al. 2008; This project was sponsored by a Linnaeus grant (project number Collinger et al. 2013; Hochberg et al. 2012; Gilja et al. 60012701) from the Swedish Research Council and The Knut and Alice Wallenberg Foundation (project number: KAW 2004.0119) 2012). This development has depended partly on the identification of important principles of motor control, Jens Schouenborg and Martin Garwicz contributed equally. revealed by neurophysiological investigations of neural Bengt Ljungquist activity in awake, behaving animals (Monfils et al. 2005), email@example.com and partly on advances within the field of robotics Jens Schouenborg (Velliste et al. 2008). However, a functional interaction firstname.lastname@example.org between the brain and robotic devices or computers Martin Garwicz also requires sophisticated neuroinformatics to ensure an email@example.com efficient organization and analysis of neural data. The demands on hardware and software used in the Neuronano Research Center, Integrative Neurophysiology and context of BMI/BCI are already high, as recent studies Neurotechnology, Lund University, Scheelevagen ¨ 2, 223 81 have used recordings of up to 1792 channels for a Lund, Sweden single subject (Schwarz et al. 2014). However, demands Department of Integrative Medical Biology, Umea ˚ University, can be foreseen to further increase in the near future, Linneus ´ vag ¨ 9, 901 87 Umea, ˚ Sweden 218 Neuroinform (2018) 16:217–229 as BMI and BCI reach new levels of sophistication, (Metcalfe and Boggs 1976), which we are using for for example by including real time feedback protocols. transferring data in the present study, PCI Express has Furthermore, in January 2016 DARPA (U.S. Defense slightly lower latencies, but is less developed and proven Advanced Research Project Agency) announced the Neural as a technique; therefore we have chosen Ethernet. Future Engineering System Design (NESD) program (Miranda development may yield PCI Express as a viable option et al. 2015), in which it aims for fully implantable also for our architecture, especially since it could be used devices able to record from up to one million neurons to communicate directly with GPUs (Graphics Processing and stimulate 100 000 neurons, supporting a bidirectional Unit) and FPGAs (Field Programmable Gate Array) (Bittner BMI. Although the hardware part to perform this feat and Ruf 2013). Commercial acquisition systems such as is a challenge in itself, the neuroinformatic requirements Neuralynx and Plexon also have online transfer of data, but are immense, surpassing those previously implemented the data format for the online transfer is not open, although in currently available neuroinformatics systems. Unless software exists for accessing and interacting with the data adequately addressed, this may potentially become the stream. For all current initiatives the data stream is limited rate limiting factor for the development of this field of to the number of channels in a single system, typically 512 neurotechnology in the near future. or 1024 channels. In a previous paper (Ljungquist et al. 2011), we showed The aim of this study was to develop a system how a traditional Relational DataBase Managements architecture and data encoding that enables handling of System (RDBMS), used for storing neural data, might electrophysiological data from very large numbers of be combined with a data transfer mechanism. RDBMSs neurons (1 million or more) in real time (<25 ms) over long have also been used by other groups to structure and periods of time. store electrophysiological data (Rautenberg et al. 2011). However, as RDBMSs are not built for large data sets, they scale poorly with binary data and multidimensional time Method series data such as electrophysiological and neuroimaging data. Hierarchical Data Format (HDF5) https://www. The System Architecture hdfgroup.org, by contrast, is a data format that potentially fulfills the requirements outlined above and has received In order to achieve the aim of this study, we suggest an increasing attention within the biomedical community for architecture composed of the following parts (see Fig. 1): storing data (Dougherty et al. 2009;Moucek ˇ et al. 2014; Chintaluri et al. 2014). It supports a large variety of data Master clock pulse (a) - a pulse generator with the same types, and is designed for flexible, efficient and parallel frequency as the acquisition systems sampling frequency, I/O also for complex data, including high dimensional data. synchronizing them through a clock input to the Here we take advantage of the HDF5 file format support for respective systems and providing a common timebase for high parallel throughput and introduce bit-encoding to allow all recorded data. Sub-microsecond precision is needed compression of large amounts of data. Although progress and is possible using an implementation IEEE 1588 has been made recently (Rossant et al. 2016), spike sorting precision time protocol (Correll et al. 2005) still remains a challenge. In this work, we assume that Acquisition systems (b) for storing electrophysiological a correct online spike sorting and tracking is performed data are optimized for reading a limited number of by each recording system. We also assume that the spike channels with high fidelity. Furthermore, we assume (as sorting results in single units being identified. not all systems may allow for example spike template Current initiatives to build open source hardware as definition during an experiment), that they do the well as software for electrophysiological recording and following preprocessing of data: 1) band pass filtering analysis, are using both HDF5 formats as well as custom 2) spike detection 3) spike template generation, usually formats to store data, for example Open Ephys http://www. by clustering in a feature space 4) classification of open-ephys.org. However, in order to store data for transfer new spikes into any of the clusters. Some systems may of data between data acquisition system and computer, also perform online clustering and tracking, that is, Open Ephys transfers the raw signal over USB (Universal adaptively changing cluster definitions during recording. serial bus) link, although work is under development to In this work, we assume that electrodes and spike sorting transfer data over PCI (Peripheral Component Interconnect) software of sufficient quality to only generate single units express. PCI Express has the advantages of lower latency and no multi units. Multi units may still be recorded and impressive transfer speeds. Compared to Ethernet by our suggested architecture, but the data format, as Neuroinform (2018) 16:217–229 219 Fig. 1 A proposed system architecture overview of storage for a data grid of neurons * time bins (c). The resulting data grid is seri- large amounts of real time data. Arrows indicate direction and path alized and sent over to spike data storage (d), as well as to narrow of data flow. A master clock pulse (a) synchronizes n acquisition sys- band (f) and waveform data storage (g). In this work, a and b are sim- tems (b), which handles band pass filtering, spike sorting (for spike ulated, c and d are implemented, while f and g are suggested not yet data) and down-sampling (for narrow band data), receiving electro- implemented components physiological data from subject (e). Neuronal spike data is encoded in explained below, does not allow for spikes closer in It should be noted that a massive multi channel recording time than 1 ms. As an experiment in the future can must consist of many different acquisition systems, which be expected to contain data from millions of neurons each have an array of electrodes for a specific part of the simultaneously, a recording of a single subject may be central nervous system of the subject, since a processor composed of many acquisition systems; with today’s (at least with current computational power) only can processing power of a single system, an acquisition handle a limited number of operations during a certain system is likely to be composed of maximally around time, for example online spike sorting. As the acquisition 1000 channels (commercial systems such as Neuralynx, systems are running in parallel, they gather data inde- Plexon and Blackrock Microsystems all have a maximum pendently as they do not share a common memory area. configuration of 512 channels, while TDT have 1024) In our suggested system architecture, they are integrated and are not designed to scale above this number of in the spike data storage and the waveform storage,per- channels. The output of such a system is thus: mitting extremely fast analysis of spatiotemporal spike patterns. - The spike times, sorted per neuron - This is the main Encoder (c) - composed of in-house developed software input data into the Encoder (see below) providing a real time interface for the acquisition systems - Narrow-band data (usually 1-300 Hz, down-sampled to an encoded format for the spike data, which is to 600 Hz, but could be set at other frequencies as described in detail below. well), recording Local Field Potentials (LFPs). This data is sent to the Narrow-band storage Spike data storage (d) - After transmission of data over - The waveform characterizations that are deemed to network, the spike data storage, a custom built software come from a single cell, e.g. templates of waveform written in Python, receives the data from the acquisition shape or similar descriptors. Waveforms characteriza- system encoder/adapter and then writes and integrates it tions (typically 32 data-points per waveform, assum- into the common data storage. Also the storage buffers ing 32 kHz sampling rate), are periodically sent to the notifies listening applications that data is available for Waveform storage. analysis and/or visualization. 220 Neuroinform (2018) 16:217–229 Narrow-band storage (f) - Since Local Field Potentials Such a ‘smallest set’ is termed a thread of execution. A (LFP) may carry essential information, we foresee the single-threaded solution will result in blocking as it will run use of storage for defined narrow bands, for example low on a single processor, handling many sources of data serially frequency bands, in future extensions of the architecture. instead of in parallel, and thus blocking communication Waveform storage (g) - Since the spike waveforms are with other acquisition systems when serving one. Multi- essential in order to evaluate cell identity (Bar-Hillel threaded design, on the other hand, allows each thread to et al. 2006; Fraser and Schwartz 2012), changes in their receive data from its corresponding acquisition system (and features have to be tracked over time. For this purpose, set of neurons) and execute independently. we also propose a secondary storage for the waveforms MPI, Message Passing Interface (Gabriel et al. 2004), that only records a subset of the neurons at a given time is a commonly used system which enables multithreading and cycles through all the neurons periodically in order and also defining a communication mechanism (message to track the waveforms shape progression of all neurons. passing) through a common memory area, which is needed The waveform storage may reside on the same computer when performing parallel computing in which the results as the spike data storage, since it does not require such are dependent on each other, which is the case in our extensive bandwidth as the spike data storage, and for application. By choosing MPI, we overcome the limitations practical rapid access if using waveforms in analysis of threading of the programming language, as this is calculations. abstracted away from the program implementation. For example, in Python the so called global interpretation lock Of these parts, only the encoder and the spike data (https://docs.python.org/3/c-api/init.html) stops true multi- storage are currently implemented in our architecture, with threading in pure Python, but as each Python runtime environment is run in a separate MPI thread, this is the acquisition systems being simulated. As a new session is started, the project parameters, e.g. bypassed. MPI also works together with HDF5 to support multi-parallel execution when writing and accessing data electrode type and number of neurons are fetched from the project server database, a MySQL database server. If from the data storage (https://support.hdfgroup.org/HDF5/ needed, session specific meta data is entered, for example PHDF5). Using HDF5 together with MPI thus supports our the stimulus type of the specific session. The controller aim to integrate and store neuronal data from distributed application then starts up the various system components. systems. Once the session is established and data is acquired, the At the software level, Python has a wide range of data flows from the acquisition system(s) through the available libraries, most importantly Scipy and NumPy system components to the data storages, as indicated in (van der Walt et al. 2011) for handling data arrays, Fig. 1. mpi4Py (Dalcin et al. 2008) which provides a Python API (Application Programming Interface) to MPI as well Software and Data Format as h5py which provides a Python API to HDF5, but also an increasing number of neuroscientific libraries such Our aim was to develop a system architecture that enables as OpenElectrophy (Garcia and Fourcaud-Trocme ´ 2009), handling of electrophysiological data from large numbers of Neo (Garcia et al. 2014) and NeuraPy https://github. neurons in real time over long periods of time. When using com/kghose/neurapy. Thus, Python is a suitable choice BMI in applications of read-out from the nervous system for a more complex software project. As an interpreted for control of a device, and/or stimulation for control of programming language, performance is however slow for the nervous system, computations need to be done within computationally heavy tasks. This might however be a certain time limit in order to be relevant in relation to addressed by precompiling critical parts, which in turn nervous system state. In general, the real time capacity of maps efficiently to machine code instructions. We have used a system refers to its capability of responding within the Cython (Behnel et al. 2011) for this purpose for the parts of specified time limit. Here we define this time limit as 25 our system that require extensive computations. ms (Markram et al. 1997). In order to achieve such short response times lots of information needs to be processed in Software Implementation parallel. This can be achieved by using different processors As the experiment starts, the system sets up project or cores in a computer or by time slicing of processor/core execution time. A limiting factor in this context is the size parameters, such as number of acquisition systems and their neuron count, through a user interface (either human of the smallest set of instructions that may be run separately. Neuroinform (2018) 16:217–229 221 interaction or through a REST-ful API, (Representational the HDF5 file. It is then possible to check for a pattern State Transfer) (Fielding and Taylor 2002). For each occurring at a previous instance in time by specifying which acquisition system, a thread is started using MPI. As time slot in the HDF5 file to compare to. In Fig. 2,an recording starts, data is sent from acquisition system to data overview of the messaging between the program execution storage through TCP (Transmission Control Protocol), in threads of the clients and server is shown. Below follows our case over a high-speed LAN (Local Area Network), to sample Python code for the key operations, the encoding which acquisition systems and data storage are connected. and transmission over the network (writing and reading Before the experiment starts all parameters have been from HDF5 file omitted, please see GitHub repository defined when configuring the project and the recording at https://github.com/NRC-lund/spikebit.git for full code). session parameters, for example the number of acquisition Also, the calls of routines have been omitted for readability systems, the number of neurons in all acquisition systems, and are instead visible in Fig. 2 together with some the acquisition system network ID, etc. The synchronized programming declarations such as type conversions. Some clock signal of the acquisition systems works together with of the parts below are precompiled using Cython to C the data format to ensure timing; the place in the data for speed in the actual application. In the first sample, ‘matrix’ constitutes also a binning in time. Assuming that Client thread - encoder, we show how the encoding is done acquisition systems handle the clock pulse correctly, only in practice by using sparse matrix representation of the checkup for data length and data sequence is needed, as spike train and the unit IDs to perform a transformation TCP handles low level checks of the transmission. As into bit encoding. In the second sample, Client thread - the expected data length is known once the recording sending data, we show how the data is sent by using session is started and parameters are set for data size, the numpy internal data buffer representation to direct the real time data transfer can thus be minimalistic; each transfer from numpy to binary format, while we in the received message is checked if it has the expected number third sample, Server thread - receiving data, show how of rows and columns. TCP handles that the messages the data is received into a numpy array using the same are received in order. The storage system then receives internal buffer to fill up the array from the binary data the defined number of 32-bit integers for each set of received, and then directly writing the data to the HDF5 32 neurons and assigns it to the corresponding place in file. Fig. 2 UML (Unified Modeling Language, Rumbaugh et al. (2004)) sequence diagram showing high-level overview of message exchange between acquisition system clients and server threads. Arrows correspond to function call in executing thread and/or message sent to other simultaneous threads on the same system. Multiple clients are communicating with multiple threads on the server side. The function/method arguments as well as the multiple threads function calls have been omitted for clarity 222 Neuroinform (2018) 16:217–229 Neuroinform (2018) 16:217–229 223 Encoding Algorithm and Spike Data Storage As data rate transfers in the architecture will be limited due to hardware and software constraints, it is crucial to use data size as efficiently as possible, without trading speed. For this purpose, we developed an encoding algorithm and data storage format, which reduces data size to a minimal format needed for rapid integration, analysis and interaction with real time data. Each spike is considered as a binary event in discrete time, either spiking during that period of time or not, which thus may be encoded as a single bit during a specific time as part of a 32-bit integer (64 bit could also be used if so desired). As for the time resolution, we assume that no neuron will fire with a frequency greater than 1000 Hz. The result is an array of 32-bit integers, of length N/32, where N is the number of neurons recorded in total at the time, see Fig. 3. The input to our system is spike train data, that is waveforms, and time of the threshold crossing (or other time of detection, if for example a wavelet-based detection of spikes was used). If recording a large numbers of neurons, say up to 1 million, different systems are encoding a smaller part of the total number of neurons, although they should share a common timebase. The algorithm is then as follows: 1. Divide the neurons in segments of 32, each correspond- In order to evaluate potential for reduced data size during ing to a bit in the 32-bit integer to be encoded. If not transfer, we tested compressing data using lzf- gzip and enough neurons exist for a segment, the reserve bits are szip compression. These compression techniques reduced set to 0. If new neurons are detected, the reserve bits are the data approximately to, respectively, 75%, 60% and 50% allocated in order. A finite number of segments (N/32) of original data size, but increased time to store data 6, 30 are pre-allocated in memory and file system cache, cov- and 13 times, respectively. Since real time requirements in ering the maximum number of neurons expected to be our case were deemed more important than data storage recorded during the session in each acquisition system. If a size, we did not use any other compression technique than segment is filled, a new bit segment from that acqui- our own bit-encoding in neither transfer nor storage of data. sition systems segments is allocated. Neurons which To keep integrity of data from different datasets, we are not recognized to have been identified before will used parallel HDF5 and MPI together, as mentioned always get “new” IDs. We did not reuse IDs, but rather previously, allowing parallel processes communicating and created a new bit group in order to store this information. writing to HDF5 file using MPI. File integrity is preserved 2. For each neuron, if a spike was detected since last as the communicating processes are not writing to the transfer, the bit corresponding to that neuron is set to 1, same dataset, but instead to different datasets, which are otherwise it is set to 0. The result is a 32-bit integer. separately filled with data. If data is not available from one 3. Each integer in its position of the array to be sent is set of the acquisition systems, this is detected as the expected to the resulting value, yielding a N/32 length array. sequence number in the header of the received message 4. The array, which now encodes whether the neurons is erroneous. The missing data points are then filled with recorded by the acquisition system fired or not during the zeros, corresponding to no spikes received. Wide band data latest millisecond, is sent to data storage (HDF5 file in storage is handled by each of the connected acquisition our system) for concatenation with previously recorded systems. It will therefore be possible to perform offline arrays and arrays from the other acquisition systems. analysis of wideband data together with the data in the integrated spike storage of our system in order to check This encoding results in a data format, in form of a m integrity of the data. xn size matrix, with m times sampled, in 1 ms bins, and 224 Neuroinform (2018) 16:217–229 n*32 neurons. A recording of 15 minutes of 1 000 000 pattern was detected). Test success criterion: Time to neurons would thus result in a 900 000 x 31 250 matrix, feedback stays below 25 ms. which if stored non-compressed would be of a size of 105 To test these parameters, we built a software test suite GB when using an HDF5 file to store the data. The software emulating an acquisition system in Python (with critical architecture is capable of recording longer periods of time; parts precompiled in Cython), which allowed for varying the size of the file system is the limit as HDF5 as data format the number of neurons, as well as running parallel sessions, has no limit in file size. simulating additional systems (in our tests with a fixed number of neurons, 320 000), which all were simulated by Scalability Evaluation a single computer for sending over a local area network, or the same computer as the data were stored on. The test Scalability refers to how well a system handles increasing suite did not however limit the sampling rate to the typically load. We supervised maximum sampling rate and time to expected of 1000 Hz per neuron, but rather sent data as feedback for different configurations in order to evaluate the quickly as possible. If the system indeed was able to handle scalability for our system architecture. The assumption for at least 1000 Hz, it was considered to be successfully able the evaluation is however that the architecture is provided to store the data. No “new” neurons were created during input from synchronized (through the clock signal) acqui- the testing. Spike trains were generated to correspond to an sition systems, each handling in the range of thousands of encoded array of integers in sequence (e.g. array of 1001 neurons each. We have tested the following scaling principles: at time T, array of 1002 at time T+1, array of 1003 at – Neuron scaling: Maximum sample rate for an increas- time T+2, etc), interrupted by events corresponding to firing ing number of neurons from a single acquisition sys- simultaneously (array of integer (2 − 1) = 4294967295). tem without losing data. Test success criterion: The This pattern of spike trains is not likely to be observed sampling rate stays above 1000 Hz sampling rate per physiologically, but is a demonstration of the capability of neuron, which corresponds to the maximum expected the system to detect a firing pattern. neuronal firing rate. Data were sent in two different ways: 1) directly over a 10 – Acquisition system scaling: Maximum sample rate for Gbit Ethernet wired connection or 2) through a local loop- an increasing number of systems integrated in real time back in the case where the simulator resided on the same into the same data storage without losing data. Number computer as the storage. In order to validate that the system of neurons fixed at 320 000 neurons per system. Test stored all generated data, the software test suite generated success criterion: The sampling rated stays above 1000 a predefined spiking stream, consisting of consecutive bit- Hz sampling rate per neuron, which corresponds to the encoded integers. The time to feedback was calculated as maximum expected neuronal firing rate. the time after which the simulator had generated a data – Time to feedback: The time from that the data is stream that should be detected by the pattern analysis acquired, analyzed (using a basic threshold detection (typically a logical comparison of a signal mask and the algorithm, calculating a moving average of the data), most recently recorded array of unit firing patterns) and sent it to the stimulator. and eventually sent back to a stimulator (if a specific Fig. 3 Proposed encoding of spike data in 32-bit integer data grids. At any given time, each neuron spike/non-spike is encoded as a binary bit (1 or 0) in the corresponding time bin. The binary bits of groups of 32 neurons are then expressed as a 32-bit integer. The 32-bit integers are then grouped together in a data grid, where each row corresponds to a specific time bin and each column to the activity in a group of 32 neurons Neuroinform (2018) 16:217–229 225 For the varied number of neurons, a total of 1000 data different numbers of neurons, as could be expected since the messages of nx20 (corresponding to the 20 ms window) local simulating process will be competing for resources on were sent over, where n is the number of neurons in the the same computer as the process writing to the database. trial, for which the speed was measured from the size of As each bit has the capacity to encode a spike for a the data received and the time it took to transfer it. This single neuron, as described above in the encoding section, was then repeated for 10 trials. For the increased number 50 MSamples/s corresponds to 1,6 GSpikes/s. This limit of systems, the same procedure was used, except for only is dependent on hardware, mainly network interface card, running one trial, since the parallel systems themselves RAM memory and hard drive, but also on the length of the are generating an increased number of messages by each network buffer. For our simulations, we have used a network sending 1000 messages. From the acquired times statistics buffer of 20 samples (corresponding to 20 ms, assuming a were calculated, pooling all samples for all trials for a sampling rate per neuron of 1000 Hz) for which the data specific number of neurons for the neuron scaling evaluation corresponding of all different neurons is sent over together. using MATLAB box plot. When scaling the number of For the systems scaling, the limit was found to be the acquisition system, we instead pooled samples from each of transfer of the network, or, if running locally, the speed of the different acquisition systems. the memory, provided that large enough amount of RAM- We also performed a longitudinal test of the software to memory is provided. Thus if not reaching up to either record for a longer period of time. We allowed 4 systems, network or memory speed, a linear scaling pattern could each having 100 000 neurons, thus in total 400 000 neurons, be observed, limited by the number of parallel acquisition to write to the repository during 24 hours. The number systems tested, in our case up to 10 acquisition systems. of neurons was limited for practical reasons in terms of For 10 acquisition systems, we had a transfer rate of around file system size, but could easily increased for larger file 400 million samples per second when running locally (if systems, also for longer periods of time. using 95% confidence interval limit), which corresponds Data were analyzed and stored using a dedicated HP to a spiking frequency of 4 000 Hz (corresponding to the Proliant HL350E Server with Ubuntu Linux 15.04, 16 GB total number of samples for 320 000 neurons per acquisition RAM memory, dual Intel XEON Quad Core 2.4 GHz system with 10 acquisition systems) per second, well over processors and a single HP Solid State Drive. All network the test success criterion, which is 1000 Hz per neuron. transfers were done using a 10 GBit/s network (for the When running over 10 GbE an increased performance setup streaming from different acquisition systems). Data was achieved with 600 million samples per second, which were generated by a HP z400 workstation with 12 GB corresponds to spiking frequency of 6 000 Hz. The reason Ram memory and dual Intel XEON Quad Core 2.4 Ghz for this is the same as for the single system example, processors. All involved hardware is thus quite ordinary generating and writing data is competing for computational commercial off the shelf, and may readily be utilized by resources from the same CPUs. In both cases, the lower different laboratories. The data from the longitudinal test bound of the 95% confidence interval was also above the was stored on a Samsung NVMe SSD 960 PRO Solid State test success criterion of 1000 Hz per second. Drive. The longitudinal test from 400 000 neurons resulted in a database (HDF5-file) with a size of 1014 GB (ca 1 TB). When checking for data integrity, it was confirmed that no Results data was missing. Scaling Results Analysis Results Throughput for our proposed data format and software The time to feedback stayed around 2.14 ms when including architecture was measured in samples/s, where a sample is the threshold analysis operation. However, if the detected defined as corresponding to a 32-bit integer, in which each event is occurring early in a 20 ms window (i.e. the window bit is encoding the spiking of a neuron as defined earlier. It time used, as described above), the detection of the event should be noted that performance is evaluated as maximal could be delayed a further maximum of 19ms. The result is, throughput, where the times between the spikes for a single however, well below 25 ms, fulfilling the test criterion. cell during the simulations are less than 1 ms. Using this As already mentioned, the data format may be suitably measure, neurons scaling stayed typically at a minimum sliced and concatenated with previous recordings, and level of 50 MSamples/s when running locally (using 95% is thus suitable for being sent over a TCP/IP network confidence interval) for the localhost also when increasing connection, or for local storage, as needed. As we outlined number of neurons above 450 000 or more, see Fig. 4. above, it may also be stored together with other previously The 10 GbE LAN connection performed more equally for recorded arrays as a multidimensional data storage of the 226 Neuroinform (2018) 16:217–229 Fig. 4 Box plots of throughput in samples/s as a function of number of neurons for a simulation with data a from a single system generating data locally, b from a single system sending data over 10 GbE LAN. Throughput in samples/s as a function of number of neurons c from an increasing number of systems generating data locally, d from an increasing number of systems sending data over 10 GbE LAN. For c and d the number of neurons were fixed at 320 000/system. Box plots are constructed with a 95% confidence interval (n=1000). The blue dashed lines show the samples per second corresponding to the theoretical limit from the test success criteria with the sampling rate of 1000 Hz per neuron, which, as may be seen, the tested architecture is well above firing pattern for all neurons of all acquisition systems. neurons, and the data rate is well over the required theo- Performing pattern matching and recognition is very fast retical limit of 3.2 million neurons spiking each with 1000 as the bit-wise comparisons are of low computational Hz; the mean was 6 times higher when streaming over complexity and in Python for example, efficient and LAN and 4 higher when streaming from a local simula- optimized software libraries exist for performing these tor program on the storage server. The system however operations. In our system, a pattern matching of data from has to be supported by databases which periodically record one million neurons to a specific pattern over 10 time bins waveform data and verifies cell identity over time, as well (10 ms) took around 0.5 ms. as recordings of the narrow band signal as shown in Fig. 1. Regarding time to feedback, a feedback within 25 ms could be achieved. Increasing the buffer will lead to higher laten- Discussion cies, but having a shorter buffer will lead to inefficient transfer of data, as many small packets take longer time As intracranial electrode technology develops, an increasing than a corresponding larger packet. This performance was number of neurons may be recorded from. The ability to reached on quite modest hardware and could probably be analyze and process the data in real time from this larger increased further using for example RAIDed hard drives set of neurons must be considered a fundamental feature of and/or cluster of servers with each having a separate net- future Brain Machine Interfaces. In this study, we develop work connection. Furthermore, the proposed architecture and evaluate a scalable software architecture, with a defined allows for both online feedback back to the subject as well interchange data format for mainly spike data which enables as online analysis of data. The present solution has been storage and analysis from at least a million of neurons shown to be highly scalable, seemingly limited by only the simultaneously in real time. transfer rate across the network, both when scaling up num- The system architecture, including a bit-encoding data ber of neurons and acquisition systems. The bit-encoding format and multi-threading storage software, has been tested data format limits firing rate to 1 kHz maximal firing rate for a maximum of 10 concurrent acquisition systems (acqui- with a bin size of 1 ms. However, the bin size may be chosen sition systems recording from a specific set of neurons smaller if desired to capture finer resolution, but the num- in the subject), each handling a subset of the recorded ber of neurons that are able to be recorded will be lesser. For Neuroinform (2018) 16:217–229 227 spikes from a single neuron, this is well below the minimum interfaces have been used in setups in which external world refractory period for most neurons. However, if data orig- states decide which intracranial feedback to provide, as inates from unsorted MUA (Multi Unit Activity), our data well as for studying and inducing spike-timing-dependent format will not be able to capture multiple spikes within plasticity in corticospinal connections using an implanted the bintime. We expect, however, that development of elec- chip for providing feedback (Nishimura et al. 2013). In the trodes will result in improved signal to noise ratio and more latter study, feedback within 25 ms was required, which is precise and stable identification of single units, and some within the capabilities of our proposed system architecture, promising electrode designs already exist (Agorelius et al. although with a small margin. If set too low, however, it 2015). affects data transfer performance. It furthermore enables In databases of scientific data there has previously trigging response trains also due to internal states, tracked typically been a trade off between organization of data andfollowedinrealtimebythe database. and performance in terms of data write and read rates Our proposed architecture also allows for integration for larger data sets. Relational data bases, at the very end of distributed recording of many groups of channels, of the spectrum with regard to organizational capabilities each handled by a separated processor or computer. In with extensive support for complex relationships between a neural interface with a large number of channels, data, have been used by many research groups for this is required. Todays acquisition systems are in the storing scientific data, also recently for storing larger maximum range of around 500 channels, which is around files such as images, movies and also electrophysiological the limit of what a single computer can cope with. But data (Ljungquist et al. 2011; Ljungquist et al. 2016). distributing the computing power to possibly thousands of However, if storing data values as single data entries in acquisition systems, which all are writing to the same data the data base, performance drops quickly, as INSERT structure, allows for integration of these data sources in statements in relational databases are costly (Schwartz real time, which is a necessity for performing calculations et al. 2012). One way to go around this is to store concerning all of the channels of the neural interface. item as so called BLOBs, binary large objects. However, This allows for handling the massive amounts of data performance drops with data sizes over a couple of that may come from future hardware technology, which MegaBytes in a BLOB (Stancu-Mara et al. 2014). It is would allow for recording of many more neurons than also possible to DROP indexes during insertion, in order today. When recording from these larger number of neurons, to increase performance slightly, but this would also result for performance reasons, it should be avoided to put in problems during simultaneous INSERT and readout of preprocessing such as spike sorting and thresholding on data and during repeated INSERT statements. Relational the same processing device as the process handling the databases in data intense real time applications is thus not storage; thus we predict a distributed setup of smaller feasible, although it might be fine for offline applications. devices collaborating in a local network to store data into an In contrast, acquisition system developers have recently integrated repository to be the future of neural interfaces. developed data formats which are prioritizing performance. In addition to integration, the suggested bit encoding For example the recent PL2 format by Plexon (http:// results in reduced data size, for transfer and storage of data. www.plexon.com/pl2-file-system-overview) is capable of If, for example, spikes would be encoded with timestamp impressive data read and write speed. In Memory Data Grids and unit ID, which would occupy 32 bits each, they would (IMDG) is another technique which recently has received need to be encoded with 64 bits per spike. Our suggested increasing attention in some real time intensive applications sampling rate of 1000 bit/s would then correspond to a in other fields, such as finance (e.g. Hazelcast https:// maximum spiking rate per neuron of 1000/64 =16 Hz. hazelcast.com and Grid-gain https://www.gridgain.com). However, during a simultaneous discharge, e. g. so called These databases, while accepting impressive throughput neuronal avalanches, as seen in vitro, Beggs (2007)the in terms of transactions per second, however do not neuronal firing rate may increase to many times this value, handle persistence and the requirement of being able to thus a system based on timestamp and unit ID for spike data correlate the data to previously recorded data. Our suggested would not be able to cope with the information. On the other system architecture allows for multi-trial and multi-subject hand, the bit format is currently more memory intensive as integration or cross-system integration within the same compared to representing spikes with unit ID and timestamp data source, combining the real time performance ability for firing rates below 16 Hz. of instantaneous persistence together with capabilities of Another important benefit of the new bitencoded data correlating the recently recorded data to a previously format proposed in the present work is the capability for recorded pattern. analyzing complex data patters during short periods of In previous studies of sensorimotor feedback loop time. In closed loop experiments, latency must be kept learning (O’Doherty et al. 2011), bidirectional neural minimal in real time interaction with data, as mentioned 228 Neuroinform (2018) 16:217–229 previously. This also includes analysis; in our test setup, Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// we have implemented a basic pattern matching, which creativecommons.org/licenses/by/4.0/), which permits unrestricted will return a similarity measure of the recorded pattern use, distribution, and reproduction in any medium, provided you give toward a desired pattern or previously recorded patterns. appropriate credit to the original author(s) and the source, provide a As the patterns, as well as the signals, are bit-encoded, link to the Creative Commons license, and indicate if changes were made. these operations are very fast since whole arrays may be compared bit-wise simultaneously. In particular, it will enable binary neural network classifiers, which are deep References neural networks constrained to operate on data and have weights that are binary (Courbariaux et al. 2016)to Agorelius, J., Tsanakalis, F., Friberg, A., Thorbergsson, P.T., Pettersson, L.M.E., Schouenborg, J. (2015). An array of highly operate on the recorded data. This type of classifiers have flexible electrodes with a tailored configuration locked by gelatin recently attracted interest due to their significant lower during implantation-initial evaluation in cortex cerebri of awake computational and power cost in implementations, and have rats. Frontiers in Neuroscience, 9, 331. been shown to have both very high throughput (in the range Bar-Hillel, A., Spiro, A., Stark, E. (2006). Spike sorting: Bayesian clustering of non-stationary data. Journal of Neuroscience of teraoperations per second) as well as very low latency Methods, 157(2), 303–316. (for most datasets in the microsecond range) in classifier Beggs, J. (2007). Neuronal avalanche. Scholarpedia, 2(1), 1344. implementations on specialized computational hardware Behnel, S., Bradshaw, R., Citro, C., Dalcin, L., Seljebotn, D.S., Smith, such as FPGAs (Umuroglu et al. 2016;Fraseretal. 2017) K. (2011). Cython: The Best of Both Worlds. Computing in Science & Engineering, 13(2), 31–39. in benchmarks of pattern recognition on established test Bittner, R., & Ruf, E. (2013). Direct GPU / FPGA Communication Via data sets. In addition to impressive computational results, PCI Express. Cluster Computing. power consumption needed for calculations is comparably Chintaluri, H.C., Ray, S., Bhalla, U.S., Wojcik, D.K. (2014). Neuro- low (Umuroglu et al. 2016), which may result in future science Simulation Data Format (NSDF) : HDF-based format for development of specialized analysis hardware in field large simulation datasets. Frontiers in Neuroinformatics, (26). Collinger, J.L., Wodlinger, B., Downey, J.E., Wang, W., Tyler-Kabara, conditions, e.g. carried or implanted electronics in the E.C., Weber, D.J., McMorland, A.J.C., Velliste, M., Boninger, recording subject. M.L., Schwartz, A.B. (2013). High-performance neuroprosthetic Further work involves implementation of the suggested control by an individual with tetraplegia. The Lancet, 381(9866), architecture in a more extensive electrophysiological data 557–564. analysis project, fully implementing also the suggested parts Correll, K., Barendt, N., Branicky, M. (2005). Design considerations for software only implementations of the IEEE 1588 precision of the proposed architecture, for example the waveform time protocol. In Conference on IEEE, (Vol. 1588 pp. 11–15). storage. Also, the integration of multiple modalities, for Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, example imaging, with electrophysiological data remains to Y. (2016). Binarized Neural Networks: Training Deep Neural be done. The work could be used together with existing Networks with Weights and Activations Constrained to +1 or -1. In arXiv (p. 9). methods for analysis of binned representations of massive Dalcin, L., Paz, R., Storti, M., D’Elia, J. (2008). MPI for Python: parallel spike train data such as detection of synfire chain Performance improvements and MPI-2 extensions. Journal of activity and sequences of synchronous events (Schrader Parallel and Distributed Computing, 68(5), 655–662. et al. 2008; Torre et al. 2016). Dougherty, M.T., Folk, M.J., Zadok, E., Bernstein, H.J., Bernstein, Lastly, the suggested architecture enables storage of data F.C., Eliceiri, K.W., Benger, W., Best, C. (2009). Unifying Biological Image Formats with HDF5. Queue, 7(9), 20. from various subjects in real time in the same database, Fielding, R.T., & Taylor, R.N. (2002). Principled design of the modern which opens up entirely new possibilities for analysis. Web architecture. ACM Transactions on Internet Technology, 2(2), Cross subject phenomena, for example electrophysiological 115–150. recordings of social interactions between two subjects, Fraser, G.W., & Schwartz, A.B. (2012). Recording from the same neurons chronically in motor cortex. Journal of neurophysiology, could be investigated. This opens up for a range of new 107(7), 1970–1978. research that may be undertaken. Fraser, N.J., Umuroglu, Y., Gambardella, G., Blott, M., Leong, P., Jahre, M., Vissers, K. (2017). Scaling Binarized Neural Networks on Reconfigurable Logic. In Proceedings of the 8th Workshop and Information Sharing Statement 6th Workshop on Parallel Programming and Run-Time Manage- ment Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms All source code is freely available in a public repository at - PARMA-DITAM ’17 (pp. 25–30). New York: ACM Press. GitHub: https://github.com/NRC-Lund/spikebit.git. Gabriel, E., Fagg, G.E., Bosilca, G., Angskun, T., Dongarra, J.J., Squyres, J.M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, Acknowledgements The authors thank Stephan Gerhard for fruitful A., Castain, R.H., Daniel, D.J., Graham, R.L., Woodall, T.S. discussions about the HDF5 data format and Palmi Thor Thorbergsson (2004). Open MPI: Goals, Concept, and Design of a Next for constructive feedback during the verification process of the Generation MPI Implementation. In Recent Advances in Parallel software. Virtual Machine and Message Passing Interface: 11th European Neuroinform (2018) 16:217–229 229 PVM/MPI Users’ Group Meeting Budapest, Hungary, September Nishimura, Y., Perlmutter, S.I., Eaton, R.W., Fetz, E.E. (2013). Spike- 19 - 22, 2004. Proceedings (pp. 97-104). Berlin: Springer Berlin timing-dependent plasticity in primate corticospinal connections Heidelberg. induced during free behavior. Neuron, 80(5), 1301–1309. Garcia, S., & Fourcaud-Trocme, ´ N. (2009). OpenElectrophy: an elec- O’Doherty, J.E., Lebedev, M.A., Ifft, P.J., Zhuang, K.Z., Shokur, S., trophysiological data- and analysis-sharing framework. Frontiers Bleuler, H., Nicolelis, M.A.L. (2011). Active tactile exploration in Neuroinformatics, 3, 14. using a brain-machine-brain interface. Nature, 479(7372), 228–31. Garcia, S., Guarino, D., Jaillet, F., Jennings, T., Propper, ¨ R., Rautenberg, P.L., Sobolev, A., Herz, A.V.M., Wachtler, T. (2011). A Rautenberg, P.L., Rodgers, C.C., Sobolev, A., Wachtler, T., database system for electrophysiological data. In Transactions on Yger, P., Davison, A.P. (2014). Neo: an object model for large-scale data-and knowledge-centered systems IV (pp. 1–14): handling electrophysiology data in multiple formats. Frontiers in Springer. neuroinformatics, 8(February), 10. Rossant, C., Kadir, S.N., Goodman, D.F.M., Schulman, J., Hunter, Gilja, V., Nuyujukian, P., Chestek, C.A., Cunningham, J.P., Byron, M.L.D., Saleem, A.B., Grosmark, A., Belluscio, M., Denfield, M.Y., Fan, J.M., Churchland, M.M., Kaufman, M.T., Kao, J.C., G.H., Ecker, A.S., et al. (2016). Spike sorting for large dense Ryu, S.I., et al. (2012). A high-performance neural prosthesis electrode arrays. Technical report, Nature Publishing Group. enabled by control algorithm design. Nature neuroscience, 15(12), Rumbaugh, J., Jacobson, I., Booch, G. (2004). Unified modeling 1752–1757. language reference manual, the. Pearson Higher Education. Hochberg, L.R., Bacher, D., Jarosiewicz, B., Masse, N.Y., Simeral, Schrader, S., Grun, ¨ S., Diesmann, M., Gerstein, G.L. (2008). J.D., Vogel, J., Haddadin, S., Liu, J., Cash, S.S., van der Smagt, P., Detecting synfire chain activity using massively parallel spike et al. (2012). Reach and grasp by people with tetraplegia using a train recording. Journal of neurophysiology, 100(4), 2165–2176. neurally controlled robotic arm. Nature, 485(7398), 372–375. Schwartz, B., Zaitsev, P., Tkachenko, V. (2012). High performance Ljungquist, B., Petersson, P., Schouenborg, J., Johansson, A.J., MySQL: optimization, backups, and replication, 3rd edn. CA: Garwicz, M.A. (2011). novel framework for storage, analysis and O’Reilly Media Inc. integration through mediation of large-scale electrophysiological Schwarz, D.A., Lebedev, M.A., Hanson, T.L., Dimitrov, D.F., Lehew, data. In 2011 5th International IEEE EMBS Conference on Neural G., Meloy, J., Rajangam, S., Subramanian, V., Ifft, P.J., Li, Z., Engineering, NER, (Vol. 2011 pp. 203–207). Ramakrishnan, A., Tate, A., Zhuang, K.Z., Nicolelis, M.A.L. Ljungquist, B., Jensen, T., Etemadi, L., Thelin, J., Lind, G., (2014). Chronic, wireless recordings of large-scale brain activity Garwicz, M., Petersson, P., Tsanakalis, F., Schouenborg, J. in freely moving rhesus monkeys. Nature Methods, (May 2013). (2016). Discrepancies between cortical and behavioural long-term Stancu-Mara, S., Baumann, P., Marinov, V. (2014). A Comparative readouts of hyperalgesia in awake freely moving rats. European Benchmark of Large Objects in Relational Databases. Technical Journal of Pain, 20(10), 1689–1699. report, Jacobs University. Markram, H., Lubk ¨ e, J., Frotscher, M., Sakmann, B. (1997). Torre, E., Canova, C., Denker, M., Gerstein, G., Helias, M., Gruen, S. Regulation of synaptic efficacy by coincidence of postsynaptic (2016). ASSET: analysis of sequences of synchronous events in APs and EPSPs. Science, 275(5297), 213–215. massively parallel spike trains. PLoS Computers in Biology, 12(7), Metcalfe, R.M., & Boggs, D.R. (1976). Ethernet: distributed packet e1004939. switching for local computer networks. Communications of the Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., ACM, 19(7), 395–404. Jahre, M., Vissers, K. (2016). FINN: A Framework for Fast, Miranda, R.A., Casebeer, W.D., Hein, A.M., Judy, J.W., Krotkov, E.P., Scalable Binarized Neural Network Inference. In Proceedings of the Laabs, T.L., Manzo, J.E., Pankratz, K.G., Pratt, G.A., Sanchez, 2017 ACM/SIGDA International Symposium on Field-Programmable J.C., Weber, D.J., Wheeler, T.L., Ling, G.S.F. (2015). DARPA- Gate Arrays - FPGA ’17 (pp. 65–74). New York: ACM Press. funded efforts in the development of novel brain-computer van der Walt, S., Colbert, S.C., Varoquaux, G. (2011). The interface technologies. Journal of Neuroscience Methods, 244, NumPy Array: A Structure for Efficient Numerical Computation. 52–67. Computing in Science & Engineering, 13(2), 22–30. Monfils, M.-H., Plautz, E.J., Kleim, J.A. (2005). search of the motor Velliste, M., Perel, S., Spalding, M.C., Whitford, A.S., Schwartz, engram: motor map plasticity as a mechanism for encoding motor A.B. (2008). Cortical control of a prosthetic arm for self-feeding. experience. The Neuroscientist, 11(5), 471–483. Nature, 453(7198), 1098–101. Moucek, ˇ R., Jezek, ˇ P., Vaeka, L., Rond´ık,T.,Brha,P.,Papez, ˇ V., Wessberg, J., Stambaugh, C.R., Kralik, J.D., Beck, P.D., Laubach, M., Mautner, P., Novotny, ´ J., Prokop, T., Stebet ˇ ak, ´ J. (2014). Software Chapin, J.K., Kim, J., Biggs, S.J., Srinivasan, M.A., Nicolelis, and hardware infrastructure for research in electrophysiology. M.A. (2000). Real-time prediction of hand trajectory by ensembles Frontiers in neuroinformatics, 8(March), 20. of cortical neurons in primates. Nature, 408(6810), 361–5.
– Springer Journals
Published: Mar 5, 2018