Cascaded integrator comb ﬁlters with smoothly varying coefﬁcients for reduced delay in synchrotron feedback loops

The Rapid Cycling Synchrotron (RCS) of the J-PARC complex in Tokai, Japan, is designed to accelerate a high intensity proton beam from 181 MeV, and later 400 MeV to 3 GeV in 20 ms within the 40 ms machine cycle. The beam power up to 1 MW demands a stable beam control to avoid excessive losses and activation of the accelerator chain. The fully digital control system is based on quadrature modulation and demodulation. In the amplitude control loops standard FIR ﬁlters separate the harmonics ( h (cid:1) 2 ) and ( h (cid:1) 4 ) after down conversion. For the phase loops at ( h (cid:1) 2 ) and ( h (cid:1) 4 ), intended to damp synchrotron oscillations, the delay in a FIR ﬁlter would limit the loop stability. Cascaded integrator comb ﬁlters, also called CIC ﬁlters, provide a shorter delay because they ﬁlter the longitudinal beam signal only where it is necessary. The notches are located at multiples of the revolution frequency of the proton beam. For ﬁxed frequency accelerator applications, digital comb ﬁlters with ﬁxed clock frequency are widely used to improve loop stability. For variable frequency accelerator applications, as in a proton synchrotron, where the frequency swing is larger than the notch width, usually the clock frequency of the comb ﬁlter is variable and chosen to be an integer multiple of the particle revolution frequency. At J-PARC RCS, the clock frequency has to be ﬁxed. Tracking the frequency would require a variable noninteger number of ﬁlter taps. Here we present a ﬁlter, based on the weighted output of 2 CIC ﬁlters with variable length, and one tap difference. The ﬁlter function looks like a CIC with smoothly varying coefﬁcients, where the notches follow the revolution frequency of the proton beam. The delay of this ﬁlter is approximately half of the corresponding FIR ﬁlter, so that the phase loops have a higher stability margin.


I. INTRODUCTION
The Rapid Cycling Synchrotron (RCS) of the J-PARC complex [1] in Tokai, Japan, is designed to accelerate a high intensity proton beam from 181 MeV in a first stage, and 400 MeV in a later stage to 3 GeV in 20 ms at 25 Hz repetition frequency. The beam power will be up to 1 MW; therefore it is mandatory to keep the beam stable to avoid excessive losses and activation of the accelerator chain.
The acceleration cavities for RCS [2] are loaded with magnetic alloy, adjusted so that their quality factor is approximately 2, which allows dual harmonic operation. These cavities require no tuning loop in contrast to ferrite cavities.
Dual harmonic operation [3] uses (h 2) for acceleration and (h 4) for longitudinal beam shaping. For precise control and synchronism of the amplitudes and phases of the 11 cavities, the fully digital control system [4] uses a common reference oscillator to supply a phase reference for both harmonics to the cavity driver modules. The signal processing is based on digital quadrature modulation and demodulation. For the amplitude control loops [5] standard FIR filters [6] are used to separate the harmonics. For the phase loops at (h 2) and (h 4), which damp synchrotron oscillations, the delay in a FIR filter would limit the loop stability.
Digital cascaded-integrator-comb filters, also called CIC filters have a shorter delay, because they filter the signal, just where it is necessary, e.g., the notches of the filter are located at multiples of the revolution frequency of the proton beam. Originally, CIC filters were introduced for audio applications [7]. A reference for the application of digital comb filters in a high-energy proton synchrotron is given in [8]. For variable frequency accelerator applications, as in a proton synchrotron, where the frequency swing is larger than the notch width, the clock frequency of the comb filter is an integer multiple of the particle revolution frequency [9], so that a one-turn delay is achieved. For fixed frequency accelerator applications, as in electron or positron synchrotron storage rings [10], digital comb filters [11,12] with fixed clock frequency are widely used to improve the loop stability.
At J-PARC RCS, the clock frequency is fixed to a common 12 MHz reference. The filter presented here is based on the weighted output of 2 CIC filters with variable length, and one tap difference. The filter function looks like a CIC with smoothly varying coefficients, where the notches follow the revolution frequency of the proton beam.

A. Digital receiver basics
The basic subsystem for digital down conversion of a revolution harmonic is shown in Fig. 1. The longitudinal beam signal is detected by a fast current transformer (FCT), filtered by an analog low pass, and sampled by an AD converter. A digital local oscillator with sine and cosine of the selected center frequency multiplies the digital data. The digital low-pass filters remove the upper sidebands after mixing. Then a digital coordinate transformer [13] processes the in-phase and quadrature components of the detected signal, so that amplitude and phase appear at the outputs. The components, such a digital receiver is made of, are described in [14].
The digital filter after down conversion of the selected harmonic has a low-pass characteristic, which rejects unwanted signals. In a synchrotron we expect to have longitudinal signals at each revolution frequency harmonic. Therefore the cutoff frequency of the filters has to be smaller than the revolution frequency. At injection time of RCS, at 181 MeV, the revolution frequency is f rev 469 250 Hz. At extraction time, at 3 GeV, the revolution frequency is 835 867 Hz.
If we were allowed to use a variable system clock, we could select a fixed number of taps n tap and then apply a variable clock frequency f clk , which tracks the revolution frequency f rev of the synchrotron.
f clk n tap f rev : (1) This means for a CIC filter, which is a special type of FIR filter where all taps have the same weight, that the notch spacing will follow the acceleration in the synchrotron.

B. Standard CIC filters
An example of such filter, shown in Fig. 2 has 32 taps. The clock frequency for this filter would vary from 15.016 MHz to approximately 26.7477 MHz during the RCS cycle. Such type of filtering is standard for the lowlevel system of COSY [15], with a revolution frequency between 490 and 1578 kHz. At COSY the number of taps is set to 16, so that the maximum clock frequency is not much higher than 25 MHz. This sample rate limit was defined by the available 12-bit AD-converters and 16-bit multipliers around 1995.
The transfer function for a CIC filter with n tap taps is hf With f=f clk , it becomes h sinn tap n tap sin e ÿjn tap : The transfer function for the filter with 32 taps is plotted in Fig. 3. The first zero of the transfer function is at f=f clk 1=32 0:031 25. All harmonics of the revolution frequency f rev below f clk are attenuated. However, this solution is not applicable for J-PARC RCS, because the clock frequency of the digital system for RCS is fixed to 36 MHz. In J-PARC, the injector Linac, the RCS, and the main ring (MR) are synchronized to a common 12 MHz reference oscillator [16]. With fixed clock frequency, there is the other option to change the number of taps as a function of revolution frequency during RCS acceleration. However, this results in another conflict. The frequency pattern of the RCS cycle is a smooth function, and one expects the filter notches to track accordingly. However, the number of taps of a simple CIC filter can only be changed in integer steps, so that the resulting frequency step for the notches is bigger than the notch width. In other words, there is no straightforward version of a digital delay for a fraction of a clock cycle. This restriction is removed by linear interpolation between 2 CIC filters, which have a different number of filter taps, as is shown in Fig. 4, and then explained.

C. Interpolation at fixed clock frequency
In contrast to Figs. 5(a) and 5(b), here we assume, that both filters, the one with 76 taps, and the other with 77 taps are running at a fixed clock of 36 MHz. The transfer functions of both CIC filters have different frequencies where maximum attenuation occurs. These notches are at multiples of 36 MHz=77 467:53 kHz for the filter with 77 taps, and at multiples of 36 MHz=76 473:68 kHz for the filter with 76 taps. This is shown for the first 2 notches in Fig. 7. The frequency to be suppressed, f rev 469:25 kHz, is located in between the notches of both filters.   The optimum number of taps k opt is noninteger: here k opt 36 MHz=0:469 25 MHz 76:718: (5) We define the integer number of taps: k int intk opt intf clk =f rev ; here k int 76; (6) and the remaining fraction: k frac k opt ÿ k int k opt ÿ intk opt ; here k frac 0:718: Then a scaling process, similar to linear interpolation between two points is applied. The output of the filter with 77 taps is scaled by k frac 0:718, and the output of the filter with 76 taps by 1-k frac 1-0:718 0:282 and both scaled values are added. The combined filter shown in Fig. 8 has a transfer function, plotted in Fig. 9, which is almost like the transfer function of a single CIC filter with variable clock. The D-flip-flop resources for the delays 1 . . . 76 can be used for both filters. Also the generation of the sum can be shared. A filter version with optimized usage of resources, important for implementation in FPGA (field programmable gate array), is shown in Fig. 10. The result for the average of 76 taps is available one clock cycle earlier than the result for the 77 tapstherefore another D-flip-flop is put before scaling with (1=76). The filter in Fig. 10 is equivalent to a nonsymmetric FIR filter, shown in Fig. 11, where the first 76 coefficients are unity, and the coefficient 77 is variable between 0 and 1, and finally a scaling is applied to obtain unity gain at dc. At first glance, it looks obvious, to combine the cascaded multipliers to save resources. However, the final scaling for unity gain at dc requires a signal-processing environment that supports floating-point numbers.

D. Interpolated CIC filters with a variable number of taps
Between injection and extraction of RCS, the revolution (h 1) frequency pattern will sweep from 469.25 kHz to approximately 835.866 kHz. In a later stage, when the injector LINAC will be upgraded to 400 MeV, the injection frequency of RCS will rise to 613.690 kHz, which is included in the filter operating range. For the expected sweep, the factor between revolution frequency and clock frequency k opt f clk =f rev decreases from 76.718 at 181 MeV injection to 43.069 at 3 GeV extraction. In case of 400 MeV injection, the factor k opt varies between 58.662 at injection and 43.069 at extraction. In order to be prepared for changes in the future, and for testing purposes, the factor k opt is defined with some safety margin. The chosen valid range for k opt is between 40 and 80. The delay structure is composed of a fixed delay and a variable delay. For a fixed delay of 32 taps, which is easy to realize in a tree structure, the remaining variable delay will be changeable from 8 to 48 taps. The number of delays k int is an integer number, with k int intk opt intf clk =f rev from Eq. (6). The corresponding circuit is shown in Fig. 12. The function is explained by an example. The revolution frequency is assumed to be f rev 816:326 kHz. Then k opt 36 MHz=0:816 326 MHz 44:1, k int 44, and k frac 0:1. The fixed delay of 32 taps and the variable delay of 12 taps result in a total of k int 44 delays. As k opt is nearer to 44 than 45, the interpolated CIC-filter transfer function is expected to be close to the transfer function of a single CIC filter with 44 taps. In the upper signal path, the sum of 45 taps is scaled by (1=45). In the next multiplier, it is scaled by k frac 0:1.
In the lower signal path, the sum of 44 taps is first scaled by (1=44). In the next multiplier, it is scaled by 1 ÿ k frac 1 ÿ 0:1 0:9.

A. Scaling factors (k frac ) and (1-k frac )
For implementation of the filter, the scaling factors have a limited definition range. The scale factor (k frac ) in the upper filter path of Fig. 12 is defined for positive numbers from zero up to, but not including ''1''. The scale factor (1-k frac ) in the lower part of the filter is defined for positive numbers slightly bigger than zero (zero is not included) up to including 1. To avoid floating-point operation, fixedpoint arithmetic is implemented, where the scaling factors are stored in a pattern memory.
The sum of these scaling factors k frac 1-k frac is always 1, therefore it is sufficient, to store only one of them in the pattern memory. As a straightforward implementation, the number k opt is stored as a fixed-point number in a 16-bit wide pattern with 40 000 memory locations. This is compatible with ramping function generation for the low-level rf system of RCS [17], where for example the voltage is defined by pattern memories with 40 000 entries, so that each microsecond a new value is available during the 40 ms RCS machine cycle. The upper 6-bit (MSB) contain the integer part (k int ÿ 32). The value 32 is subtracted for the fixed delay of 32 taps, and to keep the number within 6-bit range. The remainder k frac k opt ÿ k int is multiplied by 1024, which results in a 10-bit positive integer number between 0 and 1023 (LSB). Table I gives examples for the pattern as function of frequency. The data format is given in Table II.
The full range of the table for f rev from 444.445 to 900.000 kHz was checked for consistency. The momentum of RCS will cycle like within 20 ms from p inj 610:259 MeV=c to p top 3824:87 MeV=c, so that the kinetic energy increases from 181 to 3000 MeV. Near injection, the lowest bit of the filter pattern will change when the revolution frequency changes by 6 Hz. The first change will occur at 16 s. Near extraction, the lowest bit of the filter pattern will change when the revolution frequency changes by 12 Hz. The last change during the 20 ms acceleration time happens at 19.854 ms. Between 2 and 4 ms, the revolution frequency changes 37 . . . 49 Hz per s. Then the pattern value for the CIC filter changes by the value 5 or 6 each microsecond step.  Fig. 13 is a 17-bit number in 2's complement. Therefore the multiplier for scaling needs a 17-bit input. Alternatively, the number could be changed to 32 767, because the resulting error is small, so that a 16-bit input is sufficient.

B. Scaling with the number of taps
The circuit shown in Fig. 12 contains one multiplier to scale by 1=k int 1 and another to scale by 1=k int . This is shown in detail in Fig. 14 is necessary, which is implemented by selected the appropriate multiplier outputs. Finally, 3 CIC filters as shown in Fig. 12 will be cascaded. Therefore the scaling table has     S9 and S10 are closed 9 Only S9 is closed 8 None of the switches are closed

D. Cascading 3 interpolating CIC filters
Using one of these interpolating CIC filters is not sufficient for good suppression of unwanted signals. For the digital receiver subsystem, which is operating with 16-bit arithmetic, we expect attenuation in the order of 80 dB or better. Also the notch width is not wide enough, using the maximum synchrotron frequency of RCS (approximately 6 kHz) as criterion. Increasing the number of filters put in series improves the notch depth. Selecting a different number of taps for each of the cascaded filters increases the notch width. As a good compromise between total delay and desired transfer function, in total 3 filters with different variable delay are put in series. According to Fig. 16 the original delay variable (k int ) is replaced by either k int ÿ k or k int k.
The value k depends slightly on the revolution frequency as is shown in Fig. 17. Near RCS injection, at low frequency, the filter stop band is narrow. There k 3 is chosen. For revolution frequencies higher than 600 kHz, k 2 gives better performance.
This criterion can be expressed as function of k int : For a value of k int 60; choose k 2: For a value of k int > 60; choose k 3: In case the injection energy of RCS is upgraded to 400 MeV, the value k 2 will stay constant during the whole RCS cycle.

A. Simulation of the 3 filters in series
The filter structure, shown in Fig. 16 was simulated with SCILAB [18]. For the case k int 76 and k 3 at injection, the filter length in total sample points is 76 ÿ 3  76 76 3 228 taps. The impulse response is shown in Fig. 18. In the real circuit, the delay is slightly longer due to latches in the adder trees. When a ramp with a rise time of 1000 clock cycles 27:75 s is applied as stimulus to the filter, the delay of the filter is half the number of total taps. At 50% signal level, the answer of the filter is delayed by 114 clock cycles or 3:166 s. This is shown in Fig. 19. For higher revolution frequency, the number of taps goes down, but the delay will not become shorter, because the delay flipflops are still in the signal path. This way, the behavior of the phase loop, where the variable CIC filter is used, does not change so much during the acceleration cycle. The resulting filter function at injection is shown in Fig. 20.  Then Fig. 21 shows the pass band, and Fig. 22 the suppression of the next revolution harmonic, which is better than 90 dB.

B. Comparing with a standard FIR filter
The digital low-pass filter for amplitude control in RCS [19] uses a combination of 3 CIC filters with 8 taps each (Fig. 23) for down sampling from 36 MHz clock to 9 MHz clock followed by a 63 tap FIR filter running at 9 MHz. The FIR filter has 2 delays between each coefficient, so that the length of the FIR filter is equivalent to 63 8 504 clock cycles at 36 MHz clock. The FIR-filter coefficients are shown in Fig. 24. To allow for troubleshooting, the welltested FIR filter for amplitude control is also available for the phase-control circuit as a back-up solution. Both, the FIR and the variable CIC filter are realized in the same FPGA and are selected at boot time of the FPGA for phase control. Figure 25 shows the response of the FIR-filter function in Fig. 24 to a ramp as stimulus. The same type of stimulus  Fig. 19 for the variable CIC filter is applied. The FIRfilter delay is more than twice compared to the variable CIC filter in Fig. 19. The frequency characteristics of the FIR filter combined with the CIC filter for down sampling are shown in Fig. 26(a) from dc to the Nyquist frequency limit. The FIR filter is clocked with 9 MHz, therefore Fig. 26(b) confirms, that the CIC down-sampling filter provides enough attenuation at multiples of 4.5 MHz. Then Fig. 26(c) shows the pass-band characteristic. The FIR filter has a longer delay, compared to the variable CIC filter, because the total frequency span that is attenuated is wider.

C. Simulation results with a phase loop
The digital phase loop for RCS is simulated with PSPICE to check the stability margin [20]. For 6 kHz synchrotron frequency, both control loops, either with FIR low-pass filter for down conversion [ Fig. 27(a)], or with variable CIC filter [ Fig. 28(a)] are stable. For an assumed synchrotron frequency of 30 kHz, the loop with the FIR filter [ Fig. 27(b)] is near instability and shows strong ringing, while the phase loop with the variable CIC filter [ Fig. 28(b)] is stable. The reason for the smaller stability margin of the FIR filter is the longer delay, which translates into a phase shift of ÿ90 at 33.89 kHz.

V. SAVING FPGA RESOURCES BY DOWN SAMPLING
One CIC filter will need 82 16-bit D-FF for the delays, 4 multipliers in the delay circuit, and 79 adders in the adder tree. For 3 of the variable CIC filters in series, running at 36 MHz clock, approximately 250 16-bit D-FF, 12 multipliers, 250 adders, and the table to look up for 40 scale values are needed. There are several possibilities to save resources.

A. Optimizing the circuit
The 3 filters in series in Fig. 16 have different numbers of taps, for example, (k int ÿ 2), (k int ), and (k int 2). The maximum value for (k int ) is 76 at injection. It is possible to remove always unused taps in the filters with (k int ) and (k int ÿ 2). This way, 6 . . . 8 16-bit D-FF can be saved. Also the total delay will be approximately 200 ns shorter. As a disadvantage, each CIC-filter stage will look different.

B. Down sampling to 9 MHz clock frequency
This approach is the same as reducing the resources for the FIR filter for amplitude control, shown in Fig. 24.
The 3-stage CIC filter with 8 taps from Fig. 23 at the input sides provides low-pass filtering, so that the sample rate can be reduced to 9 MHz. However, for the 3 variable CIC filters in series, k has to be smaller than 1. In case of k 1 the spacing between the harmonics for (k int ) and k int 1 becomes too big. Therefore the difference in the number of taps between the filter stages k becomes noninteger. An example with k int 0:5 is shown in Fig. 29. Some translation logic for the filter with k int 0:5 is necessary, which makes the circuit  more difficult to understand, and less efficient in terms of used resources.

C. Down sampling to 18 MHz clock frequency
As in Sec. V B and Fig. 23, a 3-stage CIC filter with 8 taps is used as a prefilter. For the phase-feedback board [21] (PFB) it reduces the sample rate from 36 to 18 MHz (see Fig. 30). This prefilter introduces an additional delay of 36 clock cycles. 12 of these clock cycles are delays in the adder tree, and 24 of these delays are related to the filter taps. For slow signals like a ramp, the delays in the filter taps only contribute half. Then the 3-stage 8 tap CIC filter introduces a delay of 12 24=2 24 clocks, or 666 ns for ramp signals as stimulus. This filter already exists in the FPGA for the FIR filter (Sec. IV B), which is the back-up solution-therefore its design is reused. One CIC filter will need approximately: 55 D-FF for the delays, 4 multipliers, and 41 adders for the delays. In discussions with the company, who is building the RCS digital low-level rf system, the structure in Fig. 30 was defined as the preferred filter for phase control in RCS. In the 36 MHz version, the variable k was either 2 or 3, depending on revolution frequency. In the 18 MHz version, k is constant with k 1. The circuit for the single variable CIC filter then changes from Fig. 12 Table VII shows the limits of (k int k) for the 6 places where these values appear.
The transfer function of the 3 interpolated CIC filters (18 MHz version) is shown for 3 different revolution frequencies in Fig. 32. Figure 33(a) shows the attenuation of the combination of CIC prefilter and 3 variable CIC filters (the structure of Fig. 30) from dc up to 18 MHz-the Nyquist frequency for a 36 MHz clock system. Figure 33(b) gives an enlarged view of the frequency span from dc to 4.5 MHz. Thanks to the prefiltering, the suppression of the higher harmonics of the revolution frequency is 90 dB or better, which is an improvement

VI. APPLICATION: 90 DELAY (HILBERT-TRANSFORMATION)
The basic idea for the interpolating CIC filter can be applied for solving the problem of generating a delay, that is proportional to the revolution period of a particle beam. This allows ''one-turn-delays'' [22] or 90 delays for Hilbert transformation to be realized with a fixed clock digital system. Figure 34 shows the structure of a digital 90 phase shifter for Hilbert transformation that can follow the revolution frequency while operating at fixed clock. The optimum number of delays for 90 at harmonic h is The integer number of delays is k int intk opt . The corresponding scaling factors a 1 and a 2 are computed in a way, that the amplitude at the output follows the input amplitude and the phase between input and output is 90 as expected.
The output after (k int ) delays is a complex vector z 1 x 1 jy 1 , and the output after (k int 1) delays is a complex vector z 2 x 2 jy 2 . After scaling with the real numbers a 1 and a 2 , we get the sum in frequency domain Sf a 1 z 1 a 2 z 2 a 1 x 1 jy 1 a 2 x 2 jy 2 : For a 90 phase shift, the real part has to be zero a 1 x 1 a 2 x 2 0; (11) and the amplitude has to be constant, e.g., 1.
a 1 x 1 a 2 x 2 2 a 1 y 1 a 2 y 2 2 1: Equation (11) inserted into (12) simplifies to a 1 y 1 a 2 y 2 1: Inserting Eq. (11) gives  x 2 y 2 x 1 ÿ y 1 x 2 ; a 2 x 1 y 2 x 1 ÿ y 1 x 2 : Now we insert the transfer function of the delays z 2 e ÿj2hk int 1f=f clk cosÿ2h k int 1 f=f clk j sinÿ2h k int 1 f=f clk : The variation of (k int ) and the amplitude scaling factors a 1 and a 2 for the frequency swing of RCS at (h 1) is shown in Fig. 35. z 1 x 1 jy 1 , and z 2 x 2 jy 2 from Eqs. (15) and (16) were inserted into Eq. (14), and the upper solution was used to obtain positive scaling factors a 1 and a 2 . The scaling factors a 1 and a 2 , and the variable delay selector (k int ) are put into a pattern memory, so that they can be accessed as function of revolution frequencyas in the case of the interpolated CIC filter.

VII. OUTLOOK
The filter structure explained in this document provides a way to translate the idea of a CIC filter for variable clock into designs for fixed clock. In this way, a recipe for notch filters which track a reference frequency is given. These filters have a shorter delay compared to standard FIR applications, so that the stability of loops using these filters is improved. This is important for the high intensity proton synchrotron RCS, where beam loss has to be kept as small as possible. Both filter types (FIR and variable CIC) are realized in FPGA technology, so that we are able to test both configurations in the commissioning phase of RCS. We plan to use this filter structure for the main-ring (MR) of J-Parc, too. In addition, this filter design method can be applied to generate a 90 delay for Hilbert transformation or a ''one-turn delay'' for fixed clock digital systems.