M J D Bishop

Defence Research Agency, Maritime Division, Portland, Dorset, UK

#### 1. INTRODUCTION

Multi-rate filter structures are powerful building blocks in a Digital Signal Processing (DSP) toolkit particularly when used to implement decimation and band selection functions prior to detection processing or interpolation prior to digital to analog conversion.

While equivalent filter specifications can be met by single-stage designs, the computational effort required often imposes undesirable restrictions on the attainable throughput. Thus even though the control complexity of multi-rate filters is greater than that of single-stage filters, if the control complexity and data organisation requirements of the multi-rate filter can be implemented, increased throughputs will be obtained through their use.

This paper gives performance data for multi-rate filters implemented on the Motorola DSP 56001 processor. Typical throughputs for both finite impulse response (FIR) and wave digital filter (WDF) decimators are given and contrasted with equivalent single-stage design.

Comments on the mechanisation of both multi-rate structures and filter kernels on the DSP 56001 architecture are also made, as it is by the careful exploitation of the processor architecture that high performance is obtained.

# 2. FILTERING REQUIREMENTS AND ALGORITHMS

Contemporary signal processing systems take analog signals, filter them, perform detection processing, and often output a derived analog signal. A generic "detection" system is shown in figure 1. Sigma-delta converters are used for both analog to digital and digital to analog conversion, with a programmable Digital Signal Processor (DSP) used to implement the band selection and detection process.

2.1 Specimen filters

Examples of multi-rate low pass filter designs using FIR and Wave digital filter (WDF) stages are:

- a) An FIR decimator, figure 2, providing decimation by 24, with 2 kHz transition bandwidth and stopband attenuation of 70 dB.
- b) A WDF decimator, figure 3, providing decimation by 32, with 500 Hz transition bandwidth and stopband attenuation of 95 dB. Frequency responses are given, in figures 4 and 5, respectively.

Both designs, when used to implement the decimation interpolation filtering requirements of the P and Q channels of figure 1 at sampling rates of 96 kHz on a 27 MHz DSP 56001 take approximately 40% of the available computational effort.

For each of these filter implementations, the operation counts per stage - referenced to the filter output rate - are tabulated on figures 2 and 3. In both cases, the efficiency of the multi-rate structure, in comparison to the operation counts of an equivalent single stage FIR filter is plain.

Table 1 compares predicted and measured timings for the multirate WDF structure of figure 3. The correlation between prediction and measurement is excellent, with utilisation of computational effort accurately accounted. An important point is that the preponderance of computational effort is taken by the high rate operations: mixing down to complex baseband, and the initial filter stages.

2.2 Band Selection and the Complex Baseband Representation When, as is typically the case, only part of the passband is required for detection processing. The use, as in figure 1 and the examples, of a complex baseband signal representation and decimation by low pass filters to perform the band selection function [1] is attractive. Particularly as experience indicates that the band selection function frequently requires more computational effort than detection processing.

The use of a complex baseband signal representation has several desirable characteristics:

- a) By performing the translation from passband to complex baseband, the desired portion of the spectrum is located between DC and some maximum frequency. Consequently, the sampling rate can subsequently be reduced by decimation.
  - b) Low pass filters can implement the decimators.
- c) If aliasing in don't-care bands is acceptable, then the more
- efficient Nth-band class of filters can be used.
  d) Complex baseband data flows naturally into the Discrete Fourier Transform, a classic "detection" algorithm.

As the interpolators required to translate between the complex baseband and passband representations, are the duals [1] of the decimators used to translate between passband and complex baseband, the following discussion is restricted to decimators.

2.3 Digital Filter Structures and Design
Digital filter structures and design techniques have been
extensively researched in the past 30 years providing an
extensive literature on which to draw for specific applications,
see for example [2,3,4,5].

The use of multi-rate filter structures is well described by, for example [1,6,7], and the primary literature [8,9,10]. The multi-rate structures, of figures 2 and 3, derive their efficiency from leaving the hardest part of the job (ie the tightest filter specification) until the sampling rate has been reduced.

The use of decimation by two, at each stage, is essentially optimal. Decimation by factors other than two should be used only as a last resort, part of the skill in system design is to facilitate this, while not constraining the application. Where disparate decimation rates are used, the largest decimation factors should be applied first [7].

A wide range of approaches to the design of digital filters is available. For FIR structures, the Remez exchange algorithm [11], together with some Nth-band tricks [12,13], provides linear phase filters with low noise levels. For IIR structures, wave digital filter structures [14,15] provide low noise levels, low sensitivity to signal and coefficient quantisation, and low computational requirements. Importantly, for both FIR and WDF structures all computations are performed at the (decimated) output rate, an important, and for recursive structures, unusual property.

For wave digital filters, one design approach for decimate-by-two designs which provides closed form expressions for the filter coefficients, and design formulae relating transition bandwidth, stopband attenuation and the filter order is that of Valenzuela and Constantinides [15]. Filters designed by this method, in general, exhibit non-linear phase. If decimation by a factor other than two, or constant group delay is required, an approximately linear phase design approach can be used [16,17,18].

### 3. DIGITAL FILTER MECHANISATION ON DSPs

The architecture of the digital signal processor (DSP) on which a digital filter is implemented has profound consequences for that implementation. The DSP on which the filters described in this paper have been implemented is the Motorola DSP 56001 [19,20]. This DSP is commonly used for SONAR applications in consequence of its 24 bit word length.

For maximum throughput the programmer has to mechanise the algorithmic kernel such that every DSP resource is occupied during every cycle. Currently hand coding in assembly language is the only design methodology to extract maximum throughput from the processor.

A range of complexity measures can be used in assessing DSP algorithms. Commonly quoted complexity measures are memory requirements and operation counts (particularly multiplications). While relevant for the DSP 56001, they are not acute indicators of performance. Measures which are significant for the DSP 56001 architecture must be used. That these are primarily ALU utilisation, tempered by memory access requirements, is plain from the data presented in figure 3 and table 1.

#### 3.1 FIR Filters

As FIR filters are simply convolutions of data and coefficients, they are straightforward to implement on a DSP, and their control logic is trivial. However, single stage filter designs, equivalent to the filter designs of figures 2 and 3, would require long filters of 189 and 1083 taps, respectively. Consequently, most data and coefficient storage would be in off-chip memory, and the additional cycles required would significantly reduce throughput.

In general FIRs have a number of desirable attributes. No data movement is required, as data shifting can be effected by the addressing mechanisms. Consequently, only N+1 accesses to data memory are required for an N point FIR. This is, of course, computed at the output rate.

A multi-rate filter built up of FIR stages is somewhat more complex to implement on a DSP. While the data and coefficients will typically fit the available memory without a speed penalty, the control logic is non-trivial as data has to cascade through the stages of the system. Nonetheless, their implementation is quite feasible.

# 3.2 Wave Digital Filters

Wave filter stages are implemented quite differently from FIR stages, with all computations for each stage being performed at once, the recursive nature of the stages making this possible. While this approach simplifies the control logic, the complexity of the filter stages increases significantly, as the recursions for parallel cascades of first-order sections are much more complex to implement than the simple structure of an FIR stage.

#### 4. CONCLUSIONS

Multi-rate filters can be implemented effectively on DSPs, providing precise filtering at a high sampling rate. Two contrasting implementation examples have been discussed, highlighting the efficiency of multi-rate complex baseband designs, in comparison to single stage designs.

Wave digital filters provide excellent filter stages. However, in consequence of its non-canonic, recursive stages, the WDF is poorly matched to the DSP architecture. First order stages, which are preferable for their low noise, are particularly awkward. The ALU is, in the best case, busy only 66% of the time, as against 100% for an FIR.

Interestingly, this mismatch may be getting worse. The recently introduced Motorola DSP 96002 has an explicit pipeline delay between the multiply and add operations. If this cannot be ameliorated, the additional instruction cycle required by each first order stage will decrease ALU utilisation to 57%.

#### 5. ACKNOWLEDGEMENTS

Acknowledgement is due to Professor T E Curtis of the Defence Research Agency, Maritime Division for introducing the author to WDFs, and for kindly providing some of the WDF design software.

#### 6. REFERENCES

- [1] R E Crochiere & L R Rabiner, 'Multi-rate Digital Signal Processing', Prentice-Hall, 1983
- [2] L R Rabiner & B Gold, 'Theory and Application of Digital Signal Processing', Prentice-Hall, 1975
- [3] A V Openheim & R W Schafer, 'Digital Signal Processing', Prentice-Hall, 1975

- [4] A Bateman & W Yates, 'Digital Signal Processing Design', Pitman, 1988
- [5] J G Proakis & D G Manolakis, 'Introduction to Digital Signal Processing', McMillan, 1988
- [6] D F Elliott & K R Rao, 'Fast Transforms : Algorithms, Analyses, Applications', Academic, 1982
- [7] R E Crochiere & L R Rabiner, 'Interpolation and Decimation of Digital Signals a Tutorial Review', Proc IEEE, 69 p300 (1981)
- [8] R E Crochiere & L R Rabiner, 'Optimum FIR Digital Filter Implementations for Decimation, Interpolation, and Narrow-Band
- Filtering', IEEE Trans, <u>ASSP-23</u> p444 (1975)
  [9] L R Rabiner & R E Crochiere, 'A Novel Implementation for Narrow-Band FIR Digital Filters', IEEE Trans, ASSP-23 p457 (1975) [10] R E Crochiere & L R Rabiner, 'Further Considerations in the

Design of Decimators and Interpolators', IEEE Trans, ASSP-24 p296 (1976)

- [11] J H McClellan, T W Parks & L R Rabiner, 'A computer Program for Designing Optimum FIR Linear Phase Digital Filters', IEEE Trans <u>AU-21</u> p506 (1973)
- [12] F Mintzer, 'On Half-Band, Third-Band and Nth Band FIR Filters and their Design', IEEE Trans ASSP-30 p734 (1982)
- [13] P P Vaidyanathan & T Q Nguyen, 'A "Trick" for the Design of FIR Half-Band Filters', IEEE Trans CAS-34 p297 (1987)
- [14] A Fettweis, 'Wave Digital Filters : Theory and Practice', Proc IEEE 74 p270 (1986)
- [15] R A Valenzuela & A G Constantinides, 'Digital Signal Processing Schemes for Efficient Interpolation and Decimation', IEE Proc <u>130</u> Pt G p225 (1983)
- [16] R Ansari & B Liu, 'A Class of Low-Noise Computationally Efficient Recursive Digital Filters with Applications to Sampling Rate Alterations', IEEE Trans ASSP-33 p90 (1985)
- [17] M Renfors & T Saramaki, 'Recursive Nth-Band Digital Filters - Part I : Design and Properties!, IEEE Trans CAS-34 p24 (1987) [18] M Renfors & T Saramaki, 'Recursive Nth-Band Digital Filters Part II : Design of Multistage Decimators and Interpolators', IEEE Trans <u>CAS-34</u> p40 (1987)
- [19] 'DSP 56001 Data Sheet', Motorola, DSP56001/D, 1988
  [20] 'DSP 56000 Digital Signal Processor User's Manual', Motorola, DSP56000UM/AD, Rev 2, 1990

Any views expressed are those of the author and do not necessarily represent those of the Department / HM Government.

British Crown Copyright 1991 / MOD Published with the permission of the Controller of her Britannic Majesty's Stationery Office

| Predicted<br>Timings |        |                 |        |      | Measured<br>Timings |      |
|----------------------|--------|-----------------|--------|------|---------------------|------|
| Operation            | Cycles | Rate            | Cycles | 3    | Time(uS)            |      |
| Interrupt<br>Service | 13     | (66/333)<br>*32 | 88     |      |                     |      |
| Mix to P+Q           | 5      | 32              | 160    | 23   |                     | Bill |
| 96 > 48              | 9      | 16              | 144    | 21   | 12                  | 18   |
| 48 > 24              | 17     | 8               | 136    | 20   | 13                  | 20   |
| 24 > 12              | 17     | 4               | 68     | 10   | 7                   | 11   |
| 12 > 6               | 29     | 2               | 58     | 8    | 5                   | 8    |
| 6 > 3                | 41     | 1               | 41     | 6    | 3.5                 | 5    |
| Totals 695           |        |                 |        | 66uS |                     |      |

Measurements for 27 Mhz DSP 56001; 0 WS Memory; 6 WS IO
 Timings: 66uS Decimation, 69uS Interpolation, 333uS Total

Table 1 Predicted and Measured Timings for the WDF Design



Figure 1 Generic "Detection" System



Figure 2 Multi-Rate Decimation by 24



Equivelent Single Stage FIR Filter (P or Q) = 1063 Instr Total Cycles > 2166 (P+Q)

Figure 3 Multi-Rate Decimation by 32



