

# Efficient Afir Filter Based On Distributed Arithmetic

Bharathkumar. M<sup>1</sup>, Siddarthraju. K<sup>2</sup>

PG Scholar, M.E. VLSI Design, K P R institute of engineering and technology, Coimbatore, India<sup>1</sup>

Assistant Professor, ECE Department, K P R institute of engineering and technology, Coimbatore, India<sup>2</sup>

Abstract: The Finite Impulse Response digital filter is widely used as a basic tool in various signal time realization of FIR (Finite Impulse Response filter) with less hardware requirement and less latency has become more and more important. The design method of MAC (multiplication and accumulation) operation is the core of FIR filter implementation. Distributed Arithmetic is an important technique to implement digital signal processing (DSP) functions in FPGA (Field Programmable Gate Arrays). It provides an approach for multiplier-less implementation of FIR filter, since it is an algorithm that can perform multiplication with use of LUT (look Up Table) that stores the precomputed values and can be read out easily which makes DA (Distributed Arithmetic) based computation well suited for FPGA realization, because the LUT is the basic component of FPGA. The major disadvantage of DA technique is that the size of Da-LUT increased exponentially with the increasing length of input. Several efforts have been made to reduce the Da-LUT size for efficient realization of DA-based designs. In this paper, LUT is partitioned into smaller size LUT, so that the LUT size can be reduce to one fourth (the size of the table is reduced from one 4N\*2B LUT to four N\*2B tables). Hence the length of the LUT can be reduced.

Keywords: Distributed Arithmetic, Finite Impulse Response, Look Up Table.

#### **INTRODUCTION** I.

A digital filter is a system that performs mathematical multiplier less unit, where the MAC operations are operations on a sampled or discrete time signal to reduce replaced by a series of LUT access and summations. or enhance certain aspects of that signal. One type of Distributed Arithmetic is a different approach for digital filter is FIR filter. It is a stable filter. It gives implanting digital filters. The basic idea is to replace all linear phase response Pipelining and parallel processing multiplications and additions by a table and a shifter technique is used in FIR filter. Pipelining operation takes accumulator. LUT are the kind of logic that used in place in an interleaved manner. Pipelining done by SRAM based FPGAs. Basically each look table is a bunch inserting latches (delay element) in the system, it increased of single bit memory cells storing individual bit values in the overall speed of the architecture but the hardware each of the cells. Memory access time is less in SRAM, structure and system latency will increase. Hardware so speed of the static RAM is high. Distributed Arithmetic structures increased due to inserting pipelining latches. provides cost effective and area time efficient computing For M-level pipelining M-1 delay elements required. Latency is the difference between the availability of first output in the sequential system and pipeline system. At every clock cycle it will operate multiple inputs and produced multiple outputs is called parallel processing. It required extra hardware. Both pipelining and parallel processing has disadvantages. For IFR filters, output is a linear convolution of weights and inputs. For an Nth order FIR filter, the generations of each output sample takes N+1multiply accumulate (MAC) operations. Multiplication is strongest operation because it is repeated addition. It require large portion of chip area. Power consumption is more. Memory based structures are more regular compared with the multiply accumulate structures, and have many other advantages, e.g., greater potential for high throughput and reduced---latency implementation and are expected to have less dynamic power consumption due to less switching activities for memory read operations compared to the conventional multipliers. Memory based structures are well suited for many digital signal processing (DSP) algorithms, which involve multiplication with a fixed set of coefficients. For this distributed. Arithmetic is one way to implement convolution with

structures. Digital finite Impulse Response (FIR) filters are essential building blocks in most Digital Signal Processing (DSP) systems. A large application is telecommunication where filters are needed in receivers and transmitters, and an increasing portion of the signal processing is done digitally. However, power dissipation of the digital parts can be a limiting factor, especially in portable, battery operated devices. Scaling of the feature sizes and supply voltages naturally helps us to reduce power. For a certain technology, there are still many kinds of architectural and implementation approaches available to the designer. Due to the advancement in Very Large Scale Integration (VLSI) technology, realization of FIR filters is done in Application Specific Integrated Circuits (AIC) and Field Programmable Gate Arrays (FPGA) platforms.

### LUT DESIGN FOR MEMORY BASED II. MULTIPLLICATION

The basic principle of memory-based multiplication is depicted in Fig1. Let A be a fixed coefficient and X be an input word to be multiplied with A. If we assume X to be an unsigned binary number of word-length L, there can be



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL. ELECTRONICS. INSTRUMENTATION AND CONTROL ENGINEERING Vol. 2. Issue 5. May 2014

possible values of product C=A.X.



Figure1. LUT based multiplier

Therefore, for the conventional implantation of memorybased multiplication, a memory unit of words is required to e used as look-up table consisting of pre-computed product values corresponding to all possible values of X. The product-word A, for 0 -1, is stored at the memory location whose address I s the same as the binary value. such that if L-bit binary value of is used as address for the memory-unit, then the corresponding Product value is read-out from the memory.

#### **DISTRIBUTED ARITHMETIC** III.

In Distributed Arithmetic concept the LUT size is increasing exponentially, the LUT size will be  $2^n$ . The LUT is partitioned into smaller size LUT, so that the LUT size can be reduced to one fourth (The size of the table is reduced from one 4N\*2B LUT to four N\*2B tables). Hence the length of the LUT can be reduced.

Here pipeline architecture is used to increase the speed of the design. Pipeline process in nothing but, it fetches the next data while the current computation is executing, the basic DA architecture which is implemented in three main stages i.e. shift register units, LUT unit, and the shift and add unit. This architecture represents a 16-tap FIR filter. In this case, the LUT size is  $(2^n=256)$  where n is the filter order. Bute we are dividing LUT size form one 4N\*2B to four N\*2B so now the LUT size is 16, but we are using 4 LUTs.

#### IV. LUT LESS ARCHITECTURE

DA technique proves to be a powerful technique for implementing MAC unit as a multiplier less algorithm through the use of memory Rom or LUT to store a DA technique proves to be a powerful technique for



implementing MAC unit as a multiplier less algorithm through the use of memory ROM or LUT to store a pre-

possible values of X, and accordingly, there can be computed partial sum of inner products. Therefore, this computational efficiency made the DA popular various DSP applications in which one of the multiplication operands is fixed. Unfortunately, DA technique suffers from a major drawback i.e. the dramatic growth of LUT size when filter order of the number of input variables increase. This paper presents new structures for the distributed arithmetic LUT that provides the DA technique with the optimum solution for its major drawback. It made the LUT size independent of the filter order or the number of input variables.

> The proposed structure of Da-based adaptive filter of length N=4 is shown in figure 6.1.

> It consists of a four-point inner product and a weight increment block along with additional circuits for the computation of error value e(n) and control word t for the barrel shifters.

> The four point inner product block includes a DA table consisting of an array of 15 registers which stores the partial inner products y 1 for 0<1<=15 and a 16:1 multiplexer (MUX) to select the content of one of those registers. Bit slices of weights A={w31 w21 w11 w01} are fed to the MUX as control in LSB-to-MSB order, and the output of the MUX is fed to the carry-save accumulator. After L bit cycles, the carry-save accumulator shift accumulates all the partial inner products and generates a sum word and a carry word of size (L+2) bit each. The carry and sum words are shifted added with an input carry "1" to generate filter output which is subsequently subtracted from the desired output d(n) to obtain the error e(n).



Figure3. Adaptive FIR filter

As in the case all the bits of the error except the most significant one are ignore, such that multiplication of input x k by the error is implemented by a right shift through the number of locations given by the number of leading zeros in the magnitude of the error. The magnitude of the computed error is decoded to generated the control word t for the barrel shifter.



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING Vol. 2, Issue 5, May 2014

The logic used for the generation of control word t to be used for the barrel shifter. The convergence factorµ is usually taken to be O(1/N). We have taken  $\mu$ =1/N. however, one can take µ as 2-i/N, where I is a small integer. The number of shifts t in that case is increased by I, and the input to the barrel shifter is pre-shifted by I locations accordingly to reduce the hardware complexity. The weight-increment unit consists of four barrel shifters and four adder or subtractor cells.

## V. RESULT

The conventional DA based structure is designed and simulated in Modelsim which consumes more area whereas the proposed DA based structure is used as a combinational block. The 8bit LUT is designed and it occupies more area. Combinational block LUT is designed and it occupies low area, so this LUT is used in the proposed DA system. Simulation results of various LUT are shown in figure



Fig Simulation results of AFIR filter

## REFERENCES

- Allred D.J., Yoo, H, Krishnan.V, Huang.W and Anderson D.V, "LMS adaptive filters using distributed arithmetic for high throughput", IEEE Transactions Circuits System, Reg. Paper, Volume 52, No.7, pp 1327-1337, July 2005
- [2] [2] Guo.R and DeBrunner L.S, "Two implementation schemes using distributed arithmetic", IEEE transactions on Circuit System II, exp. Briefs, Volume 58, No.9, pp 600-604, September 2011.
- [3] Guo.R and DeBrunner L.S.," A novel adaptive filter implementation scheme using distributed arithmetic", In Proceedings Asilomar conference of signals, system, Computation, November 2011, pp 160-164.
- [4] Haykin. S and Widrow.B, Least Mean Square Adaptive Filters, Hoboken, NJ,USA, Wiley,2003.
- [5] Meyer.M.D and Agarwal.P, " a modular pipelined implementations of a delayed LMS transversal adaptive filter", in Proceedings IEEE International Symposium Circuits system, May 1990, pp 1943-1946.
- [6] Meher P.K. and Park S.Y., : High throughput pipelined realization of adaptive FIR filter based on distributed arithmetic", in VLSI Symposium Technology, Dig., October 2011, pp 428-433.
- [7] White s.A.," Application of the distributed arithmetic to digital signal processing: A tutorial review", IEEE ASSP Magazine, Volume 6, No. 3, pp 4-19, July 1989, High performance adaptive filter.