# AN FPGA BASED BIO-MIMETIC IMPLEMENTATION OF NEURAL SIGNAL PROCESSING IN BATS

C.Clarke<sup>1</sup>, L.Qiang<sup>1</sup>, H.Peremans<sup>2</sup> and R.Müller<sup>3</sup>

# 1. INTRODUCTION

Biologists are increasingly interested in hardware models of natural sensorimotor systems whether visual [1], olfactory [2], or auditory [3], [4]. Biomimetics merges biology with robotics and has developed rapidly in recent years. FPGA hardware implementations provide an approach to biology that allows researchers to evaluate plausible hypotheses and to explore sensory processing mechanisms of animals in the real world. The flexible nature and real-time performance of FPGA implementations makes them highly suitable for biomimetic research.

Digital implementations have been realized using application-specific integrated circuits (ASIC), field programmable gate array (FPGA) and digital signal processing (DSP). Traditionally general-purpose (programmable) DSP chips are used for low-rate applications, special-purpose DSP chips and ASICs are used for higher rates but incur a higher cost. The FPGA's keep the advantages of custom functionality provided by an ASIC whilst avoiding the high development costs and the inability to make design modifications after fabrication. The FPGA also has inherent design flexibility and adaptability with optimal device utilization whilst conserving both board space and system power, which is often not the case for DSP chips. The FPGA may also offer a better solution when time to market is critical and/or design adaptability is crucial. With FPGA technologies becoming more competitive, we can consider using them to perform acoustic signal processing task known to be mastered by bats. High achieveable throughput rates make it possible to perform these tasks on a representation that is in rough quantitative agreement with the base representation formed in the auditory nerve. The auditory nerve through which information travels from the ear to the brain of the bat. This work is a part of the biomimetic bat head constructed in the project CIRCE (Chiroptera Inspired Robotic CEphaloid).

This paper presents an efficient acoustic signal processing model and its implementation. In particular, we describe the use of an FPGA target technology to perform multi-channel bandpass filtering with demodulation by half-wave rectification and lowpass filtering. The rest of the paper is organized as follows: Section 2 states a simplified model based on the functional features of the auditory signal performed in the peripheral auditory system of bats (up to and including the auditory nerve). Section 3 describes an efficient FPGA implementation of multi-channel bandpass filtering with demodulation in detail. The FPGA test and its results are shown in section 4.

# 2. ACOUSTIC SIGNAL PROCESSING MODEL

Bats are true flying mammals and their auditory system is structured in the same way as that of other mammals [5]. Sound enters their hearing system through an outer ear (pinna), which is responsible for generating a spatial sensitivity pattern (directivity). The outer ear is followed by a

Vol.26. Pt.6. 2004

<sup>&</sup>lt;sup>1</sup> University of Bath, Department of Electronic and Electrical Engineering, Claverton Down, Bath, BA2 7AY, United Kingdom, {C.T.Clarke, L.Qiang}@bath.ac.uk

<sup>&</sup>lt;sup>2</sup> University of Antwerp, Department of Environment and Technology Management, Prinsstraat 13, B-2000 Antwerpen 1, Belgium, herbert.peremans@ua.ac.be

<sup>&</sup>lt;sup>3</sup>Maersk Institute, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark, rolfm@mip.sdu.dk

middle ear, which performs impedance matching between air and the body fluids of the inner ear. Inside the inner ear, the sensory cells in charge of transducing mechanical into electrical energy are embedded in the basilar membrane of the cochlea. The vibration modes of the basilar membrane are responsible for a frequency analysis of the incoming sound, the sensory cells from the connections (synapsis) together with the neurons form the auditory nerve. In the auditory never, the signal in each frequency channel is represented by a neural spike code. The spike is carried into the brainstem and up into the further stages of the brain's auditory pathway. Since very little is known about the functional significance of many features of the mammalian hearing system, in the context of biosonar, a parsimonious acoustic signal processing model was adopted for the bio-mimetic bat (Fig.1). This model will attempt to reproduce the functionally salient features of the bats' neural code and generate a quantitatively similar representation of the signal processing conveyed by the auditory nerve.

The model consists of linear bandpass filters (BPF), half wave rectifiers (HWR), lowpass filters (LPF), automatic gain control (AGC) and neural spike generation. The bandpass filtering is followed with a signal demodulation (envelope extraction) realized by a half- wave rectifier and lowpass filtering. The stages correspond to the auditory periphery of the hearing system [6]. The bandpass filters split the input stimulus into a filter bank representation. The cochlea of bats contains between 700 and 2200 sensory cells (inner haircells [7]), each of which constitutes one primary bandpass channel (although with heavily overlapping passband). The sensory cells are innervated in a divergent fashion by 13000 to 50000 neurons in the auditory nerve (spiral ganglion cells). This divergence is modelled here in terms of firing thresholds (below).



Fig. 1. Neural signal processing model

Bandpass filters in the auditory nerve of bats can have a very high filter quality; this is the case in the auditory fovea of certain bat species (cf-fm bats), which are capable of very fine frequency discrimination. A closed loop control is used to perform automatic gain control (AGC). In spike generation, neural spikes occur when the threshold is crossed. Together with the lowpass filter of the demodulation stage, which takes the form of a "leaky integrator", this thresholding operation can be seen to form an integrate-and-fire neuron model, which in turn can be regarded as an approximation of the Hodgkin-Huxley equations [8]. The number of threshold levels is set according to bat innervation ratios (spiral ganglion cells/ inner hair cells) and under the assumption that each of the spiral ganglion cells has a slightly different firing threshold. Between10 and 80 [6] are applied to each frequency channel output.

In the model described above we have chosen a realistic number of frequency channels, as this allows us to have a quantitatively correct representation of the neural code at the auditory nerve level in our biomimetic FPGA implementation. In this paper we will concentrate on the multichannel frequency bandpass filtering combined with demodulation, i.e. half wave rectification and lowpass filtering, as implemented with FPGA technology.

## 3. IMPLEMENTATION OF MULTI-CHANNEL FILTERING WITH DEMODUDLATION

## 3.1 Selection filter type

In the mammalian hearing system, the mechanical filtering of the basilar membrane can be modelled using a bank of bandpass filters. Gammatone filters have been applied widely in auditory studies [9]. The advantages of this filter are its easy parametrization in terms of centre frequency and bandwidth. These filter properties may be modified to allow researchers to examine the relevant features of the auditory basis representation [10]. An IIR implementation of fourth-order Gammatone filters have been described in [9]. Here, we choose an IIR Butterworth filter, which can be implemented efficiently in hardware due to the low number of unique coefficients. Since we are only interested in filter bandwidth and centre frequency (and their ratio, filter quality), and not in the precise shape of the transfer function, the efficient implementation advantages of the Butterworth filter outweigh the precise shape of the transfer function.

# 3.2 Selection target FPGA chip

Currently, the highest density Xilinx FPGA chip contains around 8 million system gates. The Xilinx Virtex II architecture has on chip block RAMs, hardware multipliers, configurable logic blocks (CLBs) and clock distribution resources. Virtex II CLBs [11] can also operate as shift-registers or distributed memory. The true dual-ported 18kbit block RAM allows higher system throughput. Their output can directly feed the embedded multiplier blocks every cycle. 18x18bit embedded multiplier blocks can be configured as pipelined or flow-through.

In our system, the input echo signals from the left and the right receivers are sampled at 1Million samples per second and are stored in 12 bit signed integer numbers. Bats have between 700 and 2200 inner hair cells and between 10 and 80 threshold levels per inner hair cell depending on the species [6]. Implementing the computational model on a single FPGA chip and adopting up to 2200 filters and 80 threshold levels per filter output, the model requires roughly 1.5Mbits on chip and 8MBytes for threshold crossing information which has to be stored off-chip.

Given the high processing speed requirement, we have chosen a Xilinx FPGA Virtex II XCV6000-4 as the platform for our system. The chip contains 6 million system gates, 144 multipliers and 144 block RAMs, up to 2.6 Mbits of block RAM and 1 Mbits of distributed RAM (created from CLBs). The multiplier has a sub 6 ns propagation delay. The close association of Block RAMs and multipliers provides very good performance in our filter design.

#### 3.3 Storage elements

The block RAM elements provided on the FPGA are used in a number of ways. For some sections of the implementation, the block RAM is treated as a continuous dual ported memory space, in other parts of the design, the block RAM is split into two smaller single-port RAMs by tying the top address bit of each port to a different value. We also make use of distributed RAM modules to store all coefficients for the bandpass filters and lowpass filters.

# 3.4 Full pipelined parallel implementation

Fig.2 shows the top level implementation diagram of the 704-channel filter set module. In the design, the control path has registers to deal with potential fan-out issues. This allows the control path to operate at the same speed as the datapath which simplifies clock issues. Each input echo sample enters the bandpass filtering module, to be split into 704 frequency sub bands and to produce the pipelined parallel filtering outputs. The bandpass filter outputs are sent to the half wave rectifiers. The pipelined rectifier outputs are passed directly to the lowpass filtering stage. All operations are under the control of the same clock signal. Pipelining is used extensively to maximise utilisation, and to keep the clock rate as high as possible.

All filter coefficients are stored in 128x16 bit single-port distributed RAMs in advance. Their addresses are distributed as shown in Fig. 3. When A13 is 0, the coefficients of the bandpass filters are addressed. Otherwise, the coefficients of the lowpass filters are addressed. The address A12-A8 allows for 22 parallel filter channels each with 32 pipelined bandpass filters. A7-A0 is the addressing space for all the coefficients in a single filter channel.



Fig. 2 The top level implementation diagram



Fig. 3 The coefficients address distribution

Fig. 4 shows the one-channel signal processing flow timing. In our implementation, the 22 channels are doing the same operation (but using different filter coefficients) in parallel. Initially, the system is reset, and coefficients are written into the RAMs. When the first sample arrives at the input, it is registered, and the bandpass filter processes the sample using the first set of coefficients, and previous stored values from this filter. 3 cycles later the same input sample is processed against the second set of coefficients and previous values. The pipelined nature of the system allows this to happen even though the first filter processing takes 7 cycles. The main

constraint in the design is usage of the multiplier. Each filter requires 3 multiplications; two are required for feed-back coefficients, but only one is required for feed-forward coefficients due to the symmetric nature of the Butterworth filter. With the system operating at 100MHz, there are 100 cycles between input samples. With 7 cycles required for the first filter output [12], and subsequent outputs arriving every 3 cycles, 32 filters can be implemented using a single multiplier. This multiplier is doing useful operations 96% of the time.



Fig. 4 The one-channel signal processing flow timing

Once the bandpass filter produces the first output, the HWR\_sampi goes '1' for a single cycle. Two cycles later the rectified output is passed to the input of the low pass filter (LPF\_input).

The structure of the low pass filter is very similar to that of the band pass filter with the notable exception that the individual pipelined filters receive their own rectified data streams rather than a single common sample.

When the input sample rate is 1 MHz and the clock speed is 100 MHz, the 22 parallel filter channels calculate 704 filter sets in  $1\mu s$ , (i.e. in real time). Note that the last filter outputs from the Vol.26. Pt.6. 2004

first input sample occur some 9 cycles after the second input sample has arrived, but the pipelined nature of the design means that this does not affect the processing of the second input sample.

## 4. FPGA TEST AND RESULTS

In the above implementation, the input echo signal is stored in 12 bits signed integer data with 1MHz sample rate, its frequency ranges from 20 kHz to 200 kHz. The coefficients are stored in single port ram (16 RAM128X1Ss) as 16 bits signed integer data. The outputs of the bandpass filters are truncated to 16 bits signed integers. The outputs of the half wave rectifiers and the lowpass filters are also truncated to 16 bits unsigned integers.



Fig. 5 a typical error curve in hardware compared with a floating point arithmetic software implementation

Due to the feedback in the Butterworth filters, the current output precision depends not only on current input, but also on the precision of many (possibly an infinite number) of previous input values. Since the input precision is fixed, the intermediate value stored precision will be a vital factor. In order to ensure output value precision, two multipliers (MULT18X18Ss) are used in parallel to perform one multiplication (18bit coefficient x 36 bit data), and two block RAMSs (RAMB16\_S18S\_S18s) are used to store the values. All intermediate values are stored with 32 bit precision. They are truncated to 16 bits at the filter output.

Fig. 5 shows a typical error curve over time of the lowpass filter output of the FPGA implementation, compared with a floating point arithmetic software implementation. The experimental results have shown that the energy error of the lowpass filter is less than 0.85%, as expected.

# 5. CONCLUSIONS

A simplified neural signal processing model of a bat's auditory processing has been presented in this paper. We have described an efficient FPGA bio-mimetic implementation of multi-channel

Vol.26. Pt.6. 2004

bandpass filtering and demodulation (via rectification and lowpass filtering). The system has a frequency range from 20 kHz to 200 kHz. compared with a floating point arithmetic software implementation. Compared with software implementation of the floating point arithmetic, a low energy error have been achieved in hardware implementation. This FPGA implementation can be realized on a single Xilinx FPGA Virtex II (XC2V6000-4) device and supports real time performance.

## 6. ACKNOWLEDGEMENTS

This work is supported by the European Union (IST Program, Life-like Perception Systems Initiative, CIRCE Project IST-2001-35144).

## REFERENCES

- [1] P. Foldesy, A. Rodriguez-Vazquez. A behavioural modelling technique for visual microprocessor mixed-signal VLSI chips. International journal of circuit theory and application, Vol,30, 2002, pp.139-163
- [2] Mark A. Glover, Alister Hamilton, Leslie S. Smith: Analogue VLSI Integrate and Fire Neural Network for Clustering Onset and Offset Signals in a Sound Segmentation System, International ICSC / IFAC Symposium on Neural Computation, Vienna, Austria, September 23-25, 1998
- [3] Simon Jones, Ray Meddis, Seow Chuan Lim and A. Robert Temple:Toward a Digital Neuromorphic Pitch Extraction System. IEEE Transactions on Neural Networks, Vol. 11, No. 4, July (2000) 978-987.
- [4] H. Peremans and R. Müller: A comprehensive robotic model for neural & acoustic signal processing in bats. Proc. of the 1st Int. IEEE EMBS Conf. on Neural Engineering, Capri. March, (2003) 458-461
- [5] J. O. Pickles. An Introduction to the physiology of hearing. Academic Press, 1982, London
- [6] T. Dau and D. Püschel. A quantitative model of the "effective" signal processing in the auditory system. I. model structure. *J. Acoust. Soc. Am.*, Vol.99,1996,pp.3615-3622
- [7] M. Vater: Cochlear physiology and anatomy in bats. Animal Sonar Processes and Performance by Nachtigall, P.E. Moore, P.W.B, Plenum Press, New York, 1988, pp.225-242
- [8] W. M. Kistler, W. Gerstner, and J. L. van Hemmen. Reduction of the Hodgkin-Huxley equations to a single-variable threshold model. *Neural Computation*, Vol.9,1997,pp.1015-1045
- [9] Lyon, R. F. The all-pole gammatone filter and auditory models. in Computational models of signal processing in the auditory system, ForumAcusticum 96, Antwerp, Belgium, 1996, unpublished.
- [10] R., Müller, and H.-U., Schnitzler. Acoustic flow perception in cfbats: Properties of the available cues. J. Acoust. Soc. Am. Vol. 105, 1999, pp.2958–2966
- [11] Virtex-II Complete Data Sheet, Xilinx Inc., October 14, 2003
- [12] C.T.Clarke, L.Qiang, H. Peremans and Á. Hernández. FPGA implementation of a neuromimetic cochlea for a bionic bat head. FPL International Conference, August, 2004, Antwerp, in press.