Jun 7, 2017

Time Domain Processing Techniques Using Ring Oscillator-Based Filter Structures

Lieuwe B. Leene, Timothy G. Constandinou

Department of Electrical and Electronic Engineering, Imperial College London, SW7 2BT, UK

Centre for Bio-Inspired Technology, Institute of Biomedical Engineering, Imperial College London, SW7 2AZ, UK

1 Abstract

The ability to process time-encoded signals with high fidelity is becoming increasingly important for time domain (TD) circuit techniques that are used at advanced nanometre technology nodes. This work proposes a compact oscillator-based subsystem that performs precise filtering of asynchronous pulse-width modulation (PWM) encoded signals and makes extensive use of digital logic, enabling low voltage operation. First and second order primitives are introduced that can be used as TD memory or to enable analogue filtering of TD signals. These structures can be modelled precisely to realise more advanced linear or nonlinear functionality using an ensemble of units. This paper presents the measured results of a prototype fabricated using a 65 nm CMOS technology to realise a 4$^{th}$ order lowpass Butterworth filter. The system utilises a 0.5 V supply voltage with asynchronous digital control for closed-loop operation to achieve a 73 nW power budget. The implemented filter achieves a maximum signal to noise and distortion ratio (SNDR) of 53 dB with a narrow 5 kHz bandwidth resulting in an figure-of-merit (FOM) of 8.2 fJ/pole. With this circuit occupying a compact 0.004 mm² silicon footprint, this technique promises a substantial reduction in size over conventional Gm-C filters whilst additionally offering direct integration with digital systems.

2 Introduction

Modern digital architectures and energy constrained devices are being increasingly challenged by device variability and probabilistic computation that are incompatible with today’s digital paradigm ¹. In contrast, many biological processes such as the human visual system are robust to such challenges. This has inspired research to explore alternative means for signal representation and computation based on phenomena observed in the natural world ². This has led to the re-emergence of processing in the analogue domain as an ‘accelerator’ inside a digital framework³. This is because the efficiency of analogue processing can be far superior to its digital equivalent for specific applications⁴⁵. However there remain many challenges that in practice prevent such architectures from achieving a clear advantage. Current systems demand an integrated System on Chip (SoC) solution using digital CMOS technologies to realise cost effective performance. This substantially degrades analogue performance and ultimately leads to the use of time domain (TD) circuits to mitigate a number of these issues⁶. Out of the different signal modalities that have been established: continuous-time continuous value (i.e. traditional analogue), discrete-time continuous value (i.e. switched cap analogue), discrete-time discrete value (i.e. traditional digital), these TD circuits represent the continuous-time discrete value (i.e. asynchronous digital) approach of representing information.

TD systems rely on encoding signals in terms of the delay between instantaneous events such as clock edges or digital pulses that can be manipulated using asynchronous or synchronous digital logic with very high efficiency⁷. The nature of digital provides immunity to supply noise and flexibility in signal representation that is less sensitive to operating conditions when compared to conventional voltage or current mode processing. In fact these techniques are becoming increasingly more widespread in recent years extending from the typical use in phase locked loops (PLL) towards sensing⁸ and processing applications⁹. Moreover the ongoing trends in supply voltage reduction and technology scaling will lead to the time-based alternatives becoming increasingly more favourable for digital system integration¹⁰.

Figure 1: Concept of processing multi-phase time-encoded signals using digital logic, in combination with oscillator-based memory elements for retaining system states.

It is becoming increasingly important to establish which techniques can process time-encoded signals in a way that is robust towards noisy digital environments and the nonlinear characteristics of nanometre-scale CMOS. Several methods have already been developed for PLL subsystems such as using noise to linearise time quantisation¹¹ or using two-dimensional vernier lines to perform noise shaping¹². One example of a TD processing system is the event-driven digital filter ¹³ that uses a reconfigurable delay line to process TD signals asynchronously. This work applies different weights to the delay line outputs to realise finite impulse response (FIR) filtering without introducing clocked time quantization.

Delay based techniques for amplification¹⁴, addition¹⁵, and subtraction¹⁵ have been particularly successful for MHz/GHz signals but tend to be incompatible with low frequency control or when dealing with signals of dissimilar bandwidths. This drawback is also characteristic of FIR techniques due to the fact that millisecond delay lines are easily prone to noisy aggressors and may require an exhaustive number of delay elements. Other systems use open loop voltage-controlled oscillator (VCO) structures for transducing low frequency signals with reduced complexity¹⁶¹⁷. These tend to rely on the linearity of capacitive discharge or voltage-controlled frequency generation for precise processing. However this dependency is particularly vulnerable to process, voltage and temperature variations or device dependent nonlinearities if correction/compensation is not performed. Digital techniques have been proposed to reduce the overhead from correction logic⁸ but it would be desirable to reduce such sensitivities.

This work proposes a ring oscillator based filter (ROF) structure that reduces the complexity of existing TD systems to realise a compact TD filter with closed loop operation for ultra-low-power computationally intensive applications¹⁸¹⁹. The dynamics of this architecture are, in some way, similar to asynchronous delta sigma modulators²⁰ or asynchronous delta modulators²¹²². The difference is that the input and output are time-encoded signals such that the functionality is strictly focused on processing. This is illustrated in Fig. 1. This topology aims to exclusively use digital logic and asynchronous control loops to adjust the phase of an oscillator which is in turn used to generate digital feedback signals to realise a continuous-time dynamical system or infinite impulse response (IIR) in the digital domain.

Similarly to the three prior works, the presented implementation also targets near-threshold voltage operation by reducing or in this case eliminating the analogue nodes that necessitate a large voltage swing. Instead the large signal components are encoded using an asynchronous digital representation. The presented technique rely on encoding phase using PWM signals and utilising current-controlled oscillators to achieve low distortion that do not require any overhead for calibration. This approach considers the oscillator as a TD memory element analogous to a capacitor in a Gm-C circuit. The resulting circuit is operated asynchronously but the concept of TD memory can also be found in clocked time to digital converters (TDC)²³. Furthermore, the auxiliary digital subsystem will feature additional functionality and flexibility in terms of event-driven/nonlinear outputs and gain control.

The remainder of this paper is organised as follows: Section 3 describes the basic first/second order ROF structures and ‘analogue’ processing characteristics; Section 6 elaborates on digital processing techniques for manipulating TD signals; Section 9 details the transistor level implementation; Section 14 presents measured results and device characteristics; and Section 20 concludes this work with respect to the achieved performance.

3 Analogue Processing using ROFs

The concept for the proposed topology that filters TD signals and allows local feedback without external clocking or control is shown in Fig. 1.

This uses digital control to switch a transconductive element adjusting the oscillator phase according to the intended filter response. The feedback utilises the anti-aliasing properties provided by the current controlled phase modulation to reject high frequency errors in the digital computation thereby allowing the approximate computation presented in Sec. 6.

Figure 2: Analogy between conventional analogue circuits and TD circuits in relation to the four signal modalities.

The different signal representations and associated processing domains are illustrated in Fig. 2. This shows how the oscillator-based processing concept presented herein relates to conventional analogue circuits, using phase information instead of magnitude to represent signals. Traditional analogue (continuous value and time) employs integration in voltage (or time) using a transconductive element that is loaded by a memory circuit. If the time however is discretized (i.e. sampled-time analogue), there is a requirement for a fast switch and large sampling capacitor. Alternatively, sampling the phase information of an oscillator can be achieved by simply using a clocked register (as the time encoded signal is inherently quantized). This implies that TD systems are able to utilize digital memories to significantly increase information capacity with minimal demand on resource. The analysis that follows develops expressions for this configuration by considering structures that are analogous to single and two-stage amplifiers²⁴.

The oscillator’s phase (φ) is extracted using an XOR-based phase detector (PD). By using a differential structure, the phase output will not need an external reference since the XOR output will represent the phase difference ($\Delta$φ) of two synchronized oscillators. In fact this phase measurement is a key feature that mitigates the need for external clocking or digital differentiation, as found in other realizations⁶¹⁰. Moreover the XOR PD does not experience distortion from band-limiting digital gates such as the pulse swallowing seen in ¹⁵. Instead reducing the phase difference and equivalent PWM modulation depth leads to a smaller digital bandwidth requirement, which is not the case for the register based PD.

4 Single-Stage ROF

Figure 3: Implementation of the single-stage ROF topology showing: (a) the switched current source driving an oscillator with closed loop feedback control of the TD signals D, Q, E; (b) the simplified s-domain equivalent model based on an ideal integrator in feedback.

A block diagram of the single-stage ROF is shown in Fig. 3. The signals D & Q are PWM encoded TD signals that are compared and subsequently generate a third output that injects current into the differential oscillator such that the two pulse widths are matched. This control will either increase or decrease the relative phase and proportionally adjust the pulse width of Q in a closed loop fashion. The operation depends on the integral relationship that the output phase φ has with respect to injecting a small signal current i\tss{$\Delta$}. This is characterised using an impulse sensitivity function (ISF) originally developed for analysing CMOS oscillators ²⁵.

$$ \phi (t) = \int_{-\infty}^{t} \Gamma_{i\Delta}(\omega_0 , \tau) i_{\Delta}(\tau) : d\tau $$

Eq. 1 models the ISF due to i\tss{$\Delta$} as $\Gamma_{i\Delta}$. This implies the simplified s-domain model yields an integration factor k1$\approx$I\tss{$\Delta$}$\Gamma_{i\Delta}$. Strictly the ISF is a cyclostationary function implying that $\Gamma$ may have phase dependent sensitivity with respect to i\tss{$\Delta$}. However because the current is injected into the virtual supply node VR, this sensitivity is small as seen from $\Gamma_{ig}$ in Fig. 10 and instead will be assumed phase independent (for clarity). This allows for a relatively simple argument to be made to estimate $\Gamma_{i\Delta}$ for low-power ring oscillators because the low-voltage operation implies that essentially all biasing current will be used to charge and discharge capacitors on each oscillator node. More specifically the contribution of short circuit current is negligible due to strictly non-overlapping conduction of the NMOS & PMOS transistors in the oscillator and similarly the transistor area will be sufficiently small to assume that the gate leakage component is much smaller than IB.

Suppose qmax is the amount of charge dissipated by the oscillator each period. Then it should follow that qmax = IB/fosc by definition but this factor should also relate the total amount of capacitance switched every cycle as qmax = N VRG Cgate, where N, Cgate, VRG are the number of oscillator stages, total capacitive load at the output of every oscillator stage, and voltage across the oscillator respectively. More interestingly if we now consider injecting some excess charge every cycle then its impact is simply normalised by qmax leading to $\Gamma_{i\Delta}$ = 2π/qmax. The final result is that if this integrator is configured for unity-gain feedback its bandwidth can be summarised in Eq. 2. $$ f_{3dB} = \frac{I_{\Delta}}{q_{max}} = f_{osc} \frac{I_{\Delta}}{I_{B}} = \frac{I_{\Delta}}{N V_{RG} C_{gate} } $$ These relations above are needed to point out a defining characteristic of the single-stage ROF which is that the oscillator frequency is directly related to the circuit bandwidth. Moreover in practice it would make sense that the ratio I\tss{$\Delta$}/IB is close to unity to maximise both bandwidth efficiency and minimise the input referred offset due to any difference in fosc between the two oscillators. A ratio larger than 1 inherently leads to nonlinearity as VRG will become strongly dependent on the dynamic current being switched and therefore vary f3dB as a function of input. Instead VRG should be well-defined in terms of the biasing current such that it can be estimated using sub-threshold device operation VRG = Vth+η UT ln(2IB/Ispec) where Vth, η, UT, and Ispec are the transistor model parameters for threshold voltage, slope factor, thermal voltage, device specific current respectively²⁶. This formulation allows the nonlinear signal compression to be estimated as $\epsilon$ which is expanded in Eq. 3 to determine an appropriate ratio $\Delta$=I\tss{$\Delta$}/IB where IC=IB/Ispec. Finally note the desirable property that the open-loop gain is inherently infinite and independent of any operating conditions. Moreover the digital output can virtually drive any type of load without affecting the circuit bandwidth.

$$ \epsilon = \frac{ V_{th} + \eta U_{T} : ln[ 2 IC (1 + \Delta) ] }{ V_{th} + \eta U_{T} : ln( 2 IC ) } - 1 $$

5 Two-Stage ROF

Figure 4: The compensated two-stage ROF topology which uses the first order structure and introduces a more explicit pole due to the switched current and load capacitor C<sub>L<sub>. Shown are: (a) implementation; and (b) the s-domain equivalent as two ideal integrators in feedback.

Figure 5: Characteristic phase and magnitude response of the two-stage ROF structure with capacitive compensation.

The two-stage ROF structure is shown in Fig. 4. This provides more degrees of freedom in the design with a small increase in complexity over the single-stage ROF. The main difference here is that a more conventional charge pump now precedes the oscillator and is responsible for the filtering characteristics. By having the digital output drive the capacitor CL the TD integrator is both compensated and able to operate at maximum efficiency irrespective of oscillator frequency. The s-domain coefficients are therefore k1 = I\tss{$\Delta$}/CL and k2 = gmMB/qmax. The factor C in Fig. 4 accounts for the total capacitance CT on V\tss{P/N} that may attenuate the feedback by defining it as C = CL/CT and gmMB is the transconductance of the biasing transistor MB. This means that bandwidth efficiency of the VCO integrator is now boosted by the transistor’s sub-threshold slope 1/η UT. The requirement of fosc in fact becomes relaxed and may actually be smaller than the circuit’s bandwidth if multiple phases are used to represent Q in parallel denoted as K.

The impact of processing multiple taps from the ring oscillator is two-fold. First the stability requirement for the VCO pole location to lie outside the circuit bandwidth f3dB = k1 /2π generally becomes negligible as it is easy to guarantee k1\textless C K k2. This condition implies that the loop has a phase-margin \textgreater 45\tps{$\circ$} when the two pole structure is put in unity-gain configuration. Secondly the combined value of Q will in effect have K+1 quantisation levels that due to the capacitive feedback onto V\tss{P/N} presents high frequency quantisation noise with an amplitude of VDD/K. Similarly to the previous linearity requirement regarding I\tss{$\Delta$}/IB, K should intentionally be large to obtain linear behaviour of MB and the oscillators. Fortunately K does not affect the efficiency or power dissipation of this circuit as the product of K fosc is a constant for a fixed current in IMB. Instead K influences circuit complexity to some extent. Another benefit of the two-stage configuration is that although the band-limiting capacitor needs to be broken up into K units to accommodate all phases it is an explicit capacitor and unlike the single-stage configuration it does not rely on the precise control/matching of parasitic capacitance to determine the pole location. Moreover charge pump circuits and the associated dynamics have been studied extensively in PLL circuits²⁷ and can easily be applied here. That said, it can be concluded that the two-stage ROF should be used in scenarios when the output and bandwidth characteristics need to be precise and the single-stage ROF should be applied when focus lies with performing asynchronous computation with diminished requirements.

$$ H(s) = \frac{k1 k2}{s^2+C k2 s} \cdot e^{-s t_d} $$

The open loop system response with capacitive compensation is characterised by Eq. 4. This is derived using the linearised model and introducing the impact of digital gate delay (td) for further processing Q to the frequency response¹⁰. The corresponding Bode plot is shown in Fig. 5. The second order roll-off will assist in rejecting high frequency artefacts due to any approximations made in the digital processing system. It should also be evident from the linearised model that high pass behaviour can be realised with the same feedback but instead taking the output from the digital processing block which is driving the charge-pump circuit.

6 Digital Processing using ROFs

From the introduction it is clear that there is large variety of techniques being used to process time domain signals with asynchronous logic. This section will present specific techniques for manipulating the multi-phase PWM signals that can be obtained from the ROF without introducing delay lines. Applying Boolean functions to PWM signals can be divided into two scenarios: coherent and incoherent operation. This relates to the cases when the signals being operated on are the exact same frequency (e.g. different phases of the oscillator) or when they are different frequencies (e.g. when processing the signals D & Q). It will be shown that these two cases lead to significantly different behaviour.

7 Coherent Operations

Figure 6: Average PWM output for simple Boolean functions with a coherent input. The output is evaluated with respect to the pulse width of A and the delay \$\Delta\$T.

Figure 7: Average PWM output and the analytical result for a gain of 2x (blue), gain of 4x (green), and the complement of the absolute value for x-0.5 with the exact Boolean operator **B** annotated.

The coherent operations useful for manipulating the multiple phases output by a single ring oscillator because these delays are relatively well matched with respect to the oscillator period, thereby allowing predictable outputs irrespective of oscillator frequency. These simple operations are summarised in Fig. 6. This visualises the average PWM output Q subject to a PWM input A, the delay $\Delta$T and a Boolean function B. Here A is a periodic function with a normalised periodicity of one. As expected Q is linear with respect to the pulse width x of A. Let A be formally defined in terms of Eq. 5 such that Q can be evaluated as Eq. 6. This calculates the mean value of Q over the period of A denoted as T.

$$ A(\tau,x) =\begin{cases} 1 & \tau (mod: 1) : < :x 0 & \text{otherwise} \end{cases} $$

$$ E[Q(x,\Delta T)] = \int_{0}^{1} \mathbf{B}(A(\tau,x),A(\tau-\Delta T,x)) : d\tau $$

However most of these operations can be visualised in terms of adding and removing pulses using delayed components of A. For instance using an OR gate with a delay of 0.5T will add an identical pulse at half the period and realise the equivalent ‘gain’ of 2x and effectively doubling the frequency of A. This example also illustrates that clipping will occur if x exceeds 0.5T as a natural consequence of overflow/saturation. Note that the output of B for the AND and OR gates have 3 regions that exhibit saturation, linear dependency, or gain. The interesting aspect here is that the point of clipping can be chosen freely by closely inspecting the region in Fig. 6 for which B is always 1. An underflow will occur for a pulse width smaller than c when using an AND gate with a delay of cT and an overflow will occur for a pulse width larger than (1-c) if an OR gate is used with a delay of cT. The clipping regions will not exceed 0.5 unless we combine more phases to realise larger ‘gain’ factors as illustrated in Fig. 7.

8 Incoherent Operations

Figure 8: Average PWM output for simple Boolean functions with incoherent inputs A & B. The output is evaluated with respect to the pulse width of each input.

Figure 9: The result from applying an AND gate(blue), OR gate (green), and XOR gate (red) to two PWM signals with equal pulse width but are modulated by different frequencies with the analytical polynomial annotated as a function of pulse width x.

Typically it will be the case that the signals of interest will not have the same frequency which requires us to consider how the two PWM signals A & B interact with one another. The primary interest will still lie with the average or near-DC behaviour of the Boolean function because the ROF is inherently lowpass in response. The main concern is associated with the beat frequency of the two PWM carrier frequencies fA-fB. This is because this spur needs to lie sufficiently outside of the f3dB bandwidth for us to make the approximation that B is uncorrelated with respect to A. This implies that the pulse B can be assumed uniformly distributed with respect to A. The circuit bandwidth will represent the averaging time constant and should ideally not be subject to carrier dependent tones such that a precise output is maintained. The oscillator frequencies are easily perturbed and subject to drift making this assertion quite reasonable in practice. As a result the average output Q due to two PWM signals with pulse width x & y can be calculated using the expression in Eq. 7.

$$ E[Q(x,y)] \approx \int_{0}^{1} \int_{0}^{1} \mathbf{B}(A(\tau,x),B(t-\tau,y)) : d\tau dt $$

This type of processing uses concepts from stochastic computation \cite{SDSP, sto_cmp} since the two digital signals interact with respect to a probability distribution that is shaped using the Boolean operator. The difference however is that these bitstreams themselves are not stochastic in the large signal sense and they are not clocked by some specific frequency. Instead the bitstreams are intentionally decorrelated by choosing different carrier frequencies. The primitive operations are summarised in Fig. 8 with respect to the PWM signals A and B. In some cases these operations will lead to nonlinear or polynomial behaviour which can be observed in Fig. 9. In addition the inverse of these functions can also be realised by manipulating the feedback and using B(Q,R) instead of Q directly where R is the output of a single-stage ROF in unit gain feedback with the input Q but the carrier frequency is doubled to decorrelate R from Q.

9 Circuit Implementation

\begin{figure*} \centering \includegraphics[width=18cm]{/images/tcas2016/System.svg} \caption{Detailed transistor level implementation of the second-order ROF structure. Here the digital gates in: (a) implement a difference operator; (b) is the switched current DAC; (c) is the floating differential ring oscillator structure; (d) is the differential delay cell, and (e) is the corresponding buffer that amplifies the oscillator voltage to full swing. All device sizes are shown in (f). } \label{Fig:TDSys} \end{figure*}

This particular implementation focuses on achieving robust low-voltage operation and minimising analogue complexity to enable larger multi-channel systems. A commercially available TSMC 65 nm CMOS LP MS RF technology (1P9M 6X1Z1U RDL) was used to develop a lowpass filter that processes the signals from the TD instrumentation circuit in ²⁸ and illustrates the basic performance characteristics of the ROF structure. The proposed circuit is detailed in Fig. \ref{Fig:TDSys} which can be divided into four sub-blocks: digital control (a), analogue integrator (b), TD integrator (c), and the oscillator stages (d & e).

10 Charge Pump

The switches S\tss{A/B/C} control how a reference current IB is pumped differentially into nodes V\tss{P/N}. Transistors M\tss{1-2} provide common mode regulation on V\tss{P/N} and mirrors the biasing current into the ring oscillators using M\tss{3-4}. This is extended for multi-phase inputs by operating several charge pumps in parallel. Any resulting voltage difference across V\tss{P/N} injects a differential current into the TD integrator as M\tss{3-4} represent a pseudo-differential pair. Although it is not shown M\tss{3-4} is split up into 5 devices of which two have their drain connected to the opposite polarity which allows us to manipulate the I\tss{$\Delta$}/IB ratio. This leads to a smaller VCO bandwidth and induces more filtering with better linearity. Using high Vth devices for M\tss{1-4} allows the common mode of V\tss{P/N} to be placed close to 250 mV which leaves enough voltage headroom for the switches and biasing transistors.

Figure 10: Post-layout simulation results showing to one of the oscillator outputs in a) for reference and the ISF \$\Gamma_{ig}\$,\$\Gamma_{io}\$,\$\Gamma_{ir}\$ for injecting a small signal charge at the virtual ground, oscillator output, and virtual rail nodes.

11 Differential Oscillator

Each oscillator consists of 7 differential delay stages each of which use a cross coupled load resulting in a total of 14 outputs. This structure is based on ²⁹ to achieve additional supply noise rejection when compared to the conventional ring oscillator. The 5 nA biasing current for each charge pump will lead to sub-threshold operation of all analogue devices which means the oscillator output that swings around VR & VG is only 100 mVpp with a transition time of 1/(14 fosc). Amplifying this output to improve signal transition time with high efficiency is achieved by a buffer that recovers the digital signal integrity and also uses positive feedback provided by M\tss{16-17}. This particular configuration requires some consideration with respect to the the optimal operating conditions of the buffer.

The charge sensitivity for this oscillator is shown in Fig. 10. The ISF has been extracted using using post-layout simulation results. The sensitivities $\Gamma_{ig}$, $\Gamma_{ix}$, and $\Gamma_{ir}$ are evaluated by injecting 1 fC of charge $\Delta$Q into the nodes VGP, VOP, VR and evaluating the change in phase with respect to having no charge injected. Then $\Gamma$ is characterised by systematically injecting charge at some point in time (tq) with respect to the oscillator period and performing normalisation as $\Gamma$(tq)=2π $\Delta$φ(tq) fosc/$\Delta$Q to obtain the small signal equivalent. This illustrates the phase independent characteristic of $\Gamma_{ir}$ as well as the 100 mV swing of the oscillator. Note that noisy aggressors coupled through $\Gamma_{ir}$ are common to both phase outputs and rejected by the low impedance from M5. The behaviour of $\Gamma_{ix}$ is also interesting because when the output is not transitioning the coupling is shorted to either virtual supply and therefore has equivalent sensitivity. However during a transition there is a brief doubling sensitivity as it is being injected into one node instead of being loaded by the differential structure. It should be noted that $\Gamma_{ix}$ is not very representative for modelling how noise couples at the output since many sources will be psuedo-common to all stages (e.g. substrate noise) and the transistor noise is further affected by the operating point of the device itself.

12 TDFA unit

The PWM difference operator or time-domain full-adder (TDFA) unit is detailed more clearly in Fig. 11. This shows that a crucial aspect of computing with incoherent TD signals lies with carefully using different signal representations. In this case the nonlinearity that would have been expected from Sec. 8 is negated by using a 1.5 bit ternary encoding. Instead the output Q is linearly dependent on the difference in pulse width of D & Q without distortion. This is important because in-band distortion is not shaped by the filter and any nonlinearity from B will propagate to the output including down modulated PWM carrier spurs.

Figure 11: Implementation of the linearised TDFA unit which calculates the difference with respect to the two PWM encoded signals D & Q.

13 Fabricated Prototype

The fabricated device is shown in Fig. 12. This prototype integrates a number of TD sensing systems together where the TD ROF structure is located in the lower left section. This subsystem operates together with an asynchronous analogue to time converter (ATC) such that the measured characterisation reflects system-level performance. Moreover this mitigates any difficulty associated with precisely generating PWM encoded signals off-chip and transmitting them to the filter under low noise conditions. The entire system is 7200 μ m² in size and one ROF is around 30$\times$40 μ m². Excluding the ATC this filter structure has a 3600 μ m² silicon footprint. There is also a reconfigurable asynchronous DSP block that realises several different coherent Boolean operations intermediate to the ATC and ROFs blocks. In particular there are variable-gain blocks that use the gain function from Sec. 7.

Figure 12: Microphotograph of the fabricated device showing the chip with annotated floor plan in (a) while the P1,M1,M2 layers of the ROF layout are highlighted in (b) (n.b. metal fill omitted for clarity).

14 Measured Results

Figure 13: Experimental setup used for characterising the ROF filters. Various off-chip instruments are used to supply power and analogue test signals to the device while a Saleae Logic digital acquisition tool samples the PWM output from the chip.

Figure 14: Photograph of the custom printed circuit board used for testing the ASIC.

15 Experimental Setup

A custom test platform was developed to characterise the fabricated ASIC using Raspberry Pi 3 development board to provide a graphical interface that automates the low level device control and test routines. This setup is illustrated in Fig. 13 with a photograph of the custom PCB in Fig. 14. The SPI interface allows the hardware to be reconfigured using a configuration register where 3 bits are used to fine tune the biasing current IB and another 10 bits are used for variable gain (VG) settings and output control. As shown the ROF signal chain consists of 6 blocks in the following order: ATC, VG, ROF, VG, MUX. The ATC will sense and amplify 5 mVpp differential signals and generate a PWM encoded signal with a 450 kHz carrier frequency. The VG blocks can select additional X1-X4 gain settings using only digital logic. The cascaded ROF provides a 4th order lowpass filter and the MUX gives control over which signals are sent off chip. Not all the TD phases will be sent off chip because of noise and overhead concerns. Instead the MUX will output one phase from the ATC or ROF for preliminary characterisation during asynchronous operation. The digital bit stream appearing at the output is then acquired at 100 MS/s over 1 second using a digital scope.

Figure 15: Spectral power densities of the ROF PWM output with a 4 mVpp 1 kHz differential input signal where the distortion has been annotated in red and the oscillator harmonics are annotated in blue.

16 Filter Characteristics

Taking the Fourier transform of the PWM output gives the spectrum shown in Fig. 15. Here the ROF oscillator frequency is observed at around 35 kHz with the corresponding higher harmonics. The bandwidth of this filter was designed to be 5 kHz which means these aggressors are sufficiently rejected for most applications. In fact the measured filter response in Fig. 16 shows the cascaded ROF will reject these harmonics by more than 50 dB. More practically, when the output of the ROF output needs to be sampled without the interference of such harmonics, this structure can easily be transformed into an oversampling TDC that decimates the PWM signal and filters out of band components ¹⁷²⁸. This particular setup uses a 5.2 nA biasing current which leads to the charge-pump pole being precisely situated at the 5 kHz. Because the VCO pole location suffers from increased variability it is intentionally placed at twice the charge-pump cut-off frequency. It is evident from Fig. 16 that verifying the post-fabrication pole position and the corresponding variance remains challenging. If necessary this pole location can be calibrated using established techniques such as trimming M\tss{3-4} or introducing a digitally-switched capacitive load ³⁰ at the cost of increasing circuit complexity.

17 Linearity

Using a 1 kHz tone, the linearity characteristics are shown in Fig. 17. It is important to note that the use of an on-chip ATC implies that the distortion also includes nonlinearity from the amplifying ATC. The signal processing chain can accept a maximum input 4 mVpp under before the ATC feedback loop starts to overload the asynchronous $\Delta\Sigma$ modulator. These measurements show that a maximum total harmonic distortion (THD) and spurious-free dynamic range SFDR of 53 dB is achievable for a 0.6 mVpp input amplitude. The noise floor is slightly higher than -80 dB and calculating the integrated noise over 10 kHz indicated that the maximum SNR is 55 dB for a 4 mVpp input signal that has a THD of 44 dB. In order to minimise the impact of ATC nonlinearity a 2x VG setting is used during this test such that the ATC output is at -4 dB of the full range but the ROF processes signals near the full input dynamic range.

18 Supply Noise Sensitivity

The PSRR has been tested using a 10 mVpp tone at different frequencies while the ATC input was shorted together. The result is presented in Fig. 18. This perturbation induces output tones at -55 dB of the full range which when referred to the 4 mV input range implies a PSRR of 63 dB. This level of supply coupling is difficult to improve because of this measurement setup and the ADC nature of the ATC. The implementation of the ATC uses VDD as reference voltage such that it is coupled asymmetrically to analogue nodes degrading supply rejection even in differential configurations. Although the impact of using the differential oscillator structure is not well represented, any further degradation in PSRR is prevented and the input referred noise-floor is not corrupted by supply noise coupled from the digital switching. Moreover this figure should be very representative for larger scale or multi channel systems as this implementation only uses a 2.5 pF decoupling capacitor for the shared 0.5 V supply. It can thus be expected that using more decoupling capacitance or separating the supplies will further improve this figure at the cost of allocating more resources.

Figure 16: Measured filter response due to a 4 mVpp differential sinusoidal input at frequencies from 1 kHz to 100 kHz.

Figure 17: Measured harmonics due to a 1 kHz differential input tone with increasing input amplitudes. The spectral power of the output tones are calculated with respect to the maximum output dynamic range.

Figure 18: Measured PSRR of the entire system due to a 10 mVpp sinusoidal signal on top of a 0.5 V bias driving the system's V<sub>DD<sub> at frequencies from 50 Hz to 60 kHz.

19 Performance Summary

Table 1: System Characteristics and Comparison with State-of-the-Art

Parameter [unit]	\textbf{\small{This Work}}	¹³(^\ddagger)	⁶	¹⁰	¹⁷\cite{1}	²⁰	³¹
Tech.[nm]	65	130	90	65	40	130	180
Modality	Time	Time	Time	Time	Time	Volt.	Volt.
Type	\textbf{TD-IIR}	TD-FIR	TD-IIR	TD-IIR	TD-(\Delta\Sigma)	GmC-(\Delta\Sigma)	GmC-IIR
Order	4	16	4	4	2	1	5
Supply-V[V]	0.5	1	0.55	0.6	0.9	0.25	0.5
Supply-I[A]	146 n	0.46 m	5.27 m	43.7 m	2.8 m	72 n	1.2 m
Bandwidth[Hz]	5 k	70 k	7 M	70 M	40 M	1.9 k	135 k
DR[dB]	\textbf{55(\dagger)}	50	61	58	61	58	61
Area[mm²]	0.004	5	0.29	0.38	0.017	0.08(^\star)	0.29
FOM[fJ/pole]	8.17	1299	92	118	28	12	520
{\thanks{$\dagger$ using a 10 kHz integrated noise figure, $^\ddagger$ performance quote from full system asynchronous PWM operation, $^\star$ uses external passive components.}}

The filter performance is summarized in Table 1. The circuit power consumption has been measured to be 73 nW of which simulation results indicate 16 nW is dissipated in the charge pump plus oscillator circuits and 21 nW is dissipated by the biasing circuits. The remaining 36 nW is due to digital control and PWM switching. One of these contributions comes from applying digital feedback onto the capacitor CL which is 560 fF. This is expected to dissipate power according to foscCLV²DD or in this particular case 3 nW. This later component can be become substantial if the supply voltage is not small enough or if very low-noise performance is required since in-band noise performance is directly dependent on CL. However when compared to other works the achieved performance is comparable and can operate with good energy efficiency. This is evaluated using the FOM from ³¹ which is defined in Eq. 8 using the system power (Psys) and the number of poles (Npoles) to normalise performance. The most substantial gain from the ROF filter is that the reduced complexity leads to a very compact implementation that is not only considerably smaller than state-of-the-art but also more capable of reconfigurable functionality. Based on KT/C relations we may expect all-analogue processing to be more power efficient in a noise limited scenario because such systems can take advantage of the transistor sub-threshold slope. This drawback is similar to the noise performance from all-digital PLLs in comparison to sub-sampling PLLs. However the time-domain circuits will allow far superior linearity & dynamic range during ultra low voltage operation which the all-analogue systems cannot achieve. The 65 nm technology primarily influences the impact of excess digital switching from the asynchronous logic/overhead. Using an advanced CMOS technology allows most of the power to be dissipated in the oscillator and enables more efficient performance.

$$ FOM = \frac{P_{sys}}{N_{poles} f_{3dB} DR} $$

20 Conclusion

This work presents the first system to explicitly deliver IIR or analogue filtering for PWM encoded signals asynchronously using standard CMOS technology. The implementation and model of a low-complexity oscillator based filter is detailed to complement existing FIR and delay line based techniques for clockless processing of time-encoded signals. The proposed topology can deliver 53 dB SFDR with a maximum SNR of 55 dB while operating at 0.5 V. The extensive use of digital logic allows highly flexible and reconfigurable oscillator based computing for future ultra-low-power systems in nanometre CMOS. Measured results demonstrate 8.17 fJ/pole efficiency for the 5 kHz bandwidth and reports an area requirement of 0.004 mm² . In fact unlike prior art this topology is substantially more efficient and compact at processing asynchronous TD signals that have reduced bandwidths or require low frequency filtering than state-of-the-art. Moreover the ROF primitives and digital processing techniques presented here can be directly applied to ultra-low-power $\Delta\Sigma$ modulators and mixed signal systems due to its simplicity and affinity for low voltage mixed signal operation.

21 Acknowledgement

The authors would like to thank Dr. Pantelis Georgiou, and the Europractice Advanced Technology Stimulation programme for providing access to the TSMC 65nm technology. The authors additionally thank Michal Maslik for the helpful comments and assistance with improving this manuscript.

Refernces:

I.L. Markov, ‘‘Limits on fundamental limits to computation,’’ Nature, vol. 512, pp. 147–154, August 2014. [Online]: http://dx.doi.org/10.1038/nature13570 ↩︎
R.Sarpeshkar, ‘‘Analog versus digital: Extrapolating from electronics to neurobiology,’’ Neural Computation, vol.10, no.7, pp. 1601–1638, Oct 1998. [Online]: http://dx.doi.org/10.1162/089976698300017052 ↩︎
N.Guo etal., ‘‘Energy-efficient hybrid analog/digital approximate computation in continuous time,’’ IEEE Journal of Solid-State Circuits, vol.51, no.7, pp. 1514–1524, July 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2543729 ↩︎
M.Verhelst and A.Bahai, ‘‘Where analog meets digital: Analog-to-information conversion and beyond,’’ IEEE Solid-State Circuits Magazine, vol.7, no.3, pp. 67–80, September 2015. [Online]: http://dx.doi.org/10.1109/MSSC.2015.2442394 ↩︎
Y.Chen, E.Yao, and A.Basu, ‘‘A 128-channel extreme learning machine-based neural decoder for brain machine interfaces,’’ IEEE Transactions on Biomedical Circuits and Systems, vol.10, no.3, pp. 679–692, June 2016. [Online]: http://dx.doi.org/10.1109/TBCAS.2015.2483618 ↩︎
B.Drost, M.Talegaonkar, and P.K. Hanumolu, ‘‘Analog filter design using ring oscillator integrators,’’ IEEE Journal of Solid-State Circuits, vol.47, no.12, pp. 3120–3129, Dec 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2225738 ↩︎ ↩︎ ↩︎
G.W. Roberts and M.Ali-Bakhshian, ‘‘A brief introduction to time-to-digital and digital-to-time converters,’’ IEEE Transactions on Circuits and Systems—Part II: Express Briefs, vol.57, no.3, pp. 153–157, March 2010. [Online]: http://dx.doi.org/10.1109/TCSII.2010.2043382 ↩︎
T.Anand, K.A.A. Makinwa, and P.K. Hanumolu, ‘‘A vco based highly digital temperature sensor with 0.034 $^\circ$C/mV supply sensitivity,’’ IEEE Journal of Solid-State Circuits, vol.51, no.11, pp. 2651–2663, Nov 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2598765 ↩︎ ↩︎
V.Pourahmad etal., ‘‘Nonboolean pattern recognition using chains of coupled cmos oscillators as discriminant circuits,’’ IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol.3, pp. 1–9, Dec 2017. [Online]: http://dx.doi.org/10.1109/JXCDC.2017.2654300 ↩︎
B.Vigraham, J.Kuppambatti, and P.R. Kinget, ‘‘Switched-mode operational amplifiers and their application to continuous-time filters in nanoscale cmos,’’ IEEE Journal of Solid-State Circuits, vol.49, no.12, pp. 2758–2772, Dec 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2354641 ↩︎ ↩︎ ↩︎ ↩︎
S.Zheng and H.C. Luong, ‘‘A wcdma/wlan digital polar transmitter with low-noise adpll, wideband pm/am modulator, and linearized pa,’’ IEEE Journal of Solid-State Circuits, vol.50, no.7, pp. 1645–1656, July 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2015.2413846 ↩︎
P.Lu, Y.Wu, and P.Andreani, ‘‘A 2.2-ps two-dimensional gated-vernier time-to-digital converter with digital calibration,’’ IEEE Transactions on Circuits and Systems—Part II: Express Briefs, vol.63, no.11, pp. 1019–1023, Nov 2016. [Online]: http://dx.doi.org/10.1109/TCSII.2016.2548218 ↩︎
C.Vezyrtzis etal., ‘‘A flexible, event-driven digital filter with frequency response independent of input sample rate,’’ IEEE Journal of Solid-State Circuits, vol.49, no.10, pp. 2292–2304, Oct 2014. [Online]: http://dx.doi.org/10.1109/JSSC.2014.2336532 ↩︎ ↩︎
H.J. Kwon etal., ‘‘Analysis of an open-loop time amplifier with a time gain determined by the ratio of bias current,’’ IEEE Transactions on Circuits and Systems—Part II: Express Briefs, vol.61, no.7, pp. 481–485, July 2014. [Online]: http://dx.doi.org/10.1109/TCSII.2014.2328800 ↩︎
W.Yu, K.Kim, and S.Cho, ‘‘A 0.22 ps rms integrated noise 15 mhz bandwidth fourth-order $\Delta \Sigma$ time-to-digital converter using time-domain error-feedback filter,’’ IEEE Journal of Solid-State Circuits, vol.50, no.5, pp. 1251–1262, May 2015. [Online]: http://dx.doi.org/10.1109/JSSC.2015.2399673 ↩︎ ↩︎ ↩︎
W.Jiang etal., ‘‘A ±50 mV linear-input-range vco-based neural-recording front-end with digital nonlinearity correction,’’ IEEE Journal of Solid-State Circuits, vol.52, no.1, pp. 173–184, Jan 2017. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2624989 ↩︎
M.Hovin etal., ‘‘Delta-sigma modulators using frequency-modulated intermediate values,’’ IEEE Journal of Solid-State Circuits, vol.32, no.1, pp. 13–22, Jan 1997. [Online]: http://dx.doi.org/10.1109/4.553171 ↩︎ ↩︎ ↩︎
Y.Liu, J.L. Pereira, and T.G. Constandinou, ‘‘Clockless continuous-time neural spike sorting: Method, implementation and evaluation,’’ in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2016, pp. 538–541. [Online]: http://dx.doi.org/10.1109/ISCAS.2016.7527296 ↩︎
M.Yang etal., ‘‘A 0.5 v 55 $\mu$w 64$ imes$2 channel binaural silicon cochlea for event-driven stereo-audio sensing,’’ IEEE Journal of Solid-State Circuits, vol.51, no.11, pp. 2554–2569, Nov 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2604285 ↩︎
L.H.C. Ferreira and S.R. Sonkusale, ‘‘A 0.25-V 28-nW 58-dB dynamic range asynchronous delta sigma modulator in 130-nm digital cmos process,’’ IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.23, no.5, pp. 926–934, May 2015. [Online]: http://dx.doi.org/10.1109/TVLSI.2014.2330698 ↩︎ ↩︎
R.Mohan etal., ‘‘A 0.6 V, 0.015-mm$^2$, time-based ecg readout for ambulatory applications in 40 nm cmos,’’ IEEE Journal of Solid-State Circuits, vol.52, no.1, pp. 298–308, Jan 2017. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2615320 ↩︎
S.Patil etal., ‘‘A 3-10 fJ/conv-step error-shaping alias-free continuous-time adc,’’ IEEE Journal of Solid-State Circuits, vol.51, no.4, pp. 908–918, April 2016. [Online]: http://dx.doi.org/10.1109/JSSC.2016.2519396 ↩︎
J.P. Caram, J.Galloway, and J.S. Kenney, ‘‘Harmonic ring oscillator time-to-digital converter,’’ in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2015, pp. 161–164. [Online]: http://dx.doi.org/10.1109/ISCAS.2015.7168595 ↩︎
W.-H. Ki, ‘‘Signal flow graph analysis of feedback amplifiers,’’ IEEE Transactions on Circuits and Systems—Part I: Fundamental Theory and Applications, vol.47, no.6, pp. 926–933, Jun 2000. [Online]: http://dx.doi.org/10.1109/81.852948 ↩︎
A.Hajimiri, S.Limotyrakis, and T.Lee, ‘‘Phase noise in multi-gigahertz cmos ring oscillators,’’ in IEEE Proceedings of the Custom Integrated Circuits Conference, May 1998, pp. 49–52. [Online]: http://dx.doi.org/10.1109/CICC.1998.694905 ↩︎
C.C. Enz and E.A. Vittoz, ‘‘Charge-based MOS transistor modeling: The EKV Model for Low-Power and RF IC design,’’ John Wiley & Sons, August 2006. [Online]: http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470855452.html ↩︎
A.Homayoun and B.Razavi, ‘‘On the stability of charge-pump phase-locked loops,’’ IEEE Transactions on Circuits and Systems—Part I: Regular Papers, vol.63, no.6, pp. 741–750, June 2016. [Online]: http://dx.doi.org/10.1109/TCSI.2016.2537823 ↩︎
L.B. Leene and T.G. Constandinou, ‘‘A 0.5V time-domain instrumentation circuit with clocked and unclocked $\Delta\Sigma$ operation,’’ in IEEE Proceedings of the International Symposium on Circuits and Systems, May 2017, pp. 2619–2622. ↩︎ ↩︎
W.S.T. Yan and H.C. Luong, ‘‘A 900-MHz cmos low-phase-noise voltage-controlled ring oscillator,’’ IEEE Transactions on Circuits and Systems—Part II: Analog and Digital Signal Processing, vol.48, no.2, pp. 216–221, Feb 2001. [Online]: http://dx.doi.org/10.1109/82.917794 ↩︎
Y.Zhang etal., ‘‘A 0.35-0.5 V 18-152 MHz digitally controlled relaxation oscillator with adaptive threshold calibration in 65 nm cmos,’’ IEEE Transactions on Circuits and Systems—Part II: Express Briefs, vol.62, no.8, pp. 736–740, Aug 2015. [Online]: http://dx.doi.org/10.1109/TCSII.2015.2433531 ↩︎
P.Khumsat and A.Worapishet, ‘‘A 0.5-V R-MOSFET-C filter design using subthreshold r-mosfet resistors and otas with cross-forward common-mode cancellation technique,’’ IEEE Journal of Solid-State Circuits, vol.47, no.11, pp. 2751–2762, Nov 2012. [Online]: http://dx.doi.org/10.1109/JSSC.2012.2216708 ↩︎ ↩︎

1 Abstract

2 Introduction

Figure 1: Concept of processing multi-phase time-encoded signals using digital logic, in combination with oscillator-based memory elements for retaining system states.

3 Analogue Processing using ROFs

Figure 2: Analogy between conventional analogue circuits and TD circuits in relation to the four signal modalities.

4 Single-Stage ROF

Figure 3: Implementation of the single-stage ROF topology showing: (a) the switched current source driving an oscillator with closed loop feedback control of the TD signals D, Q, E; (b) the simplified s-domain equivalent model based on an ideal integrator in feedback.

5 Two-Stage ROF

Figure 4: The compensated two-stage ROF topology which uses the first order structure and introduces a more explicit pole due to the switched current and load capacitor C<sub>L<sub>. Shown are: (a) implementation; and (b) the s-domain equivalent as two ideal integrators in feedback.

Figure 5: Characteristic phase and magnitude response of the two-stage ROF structure with capacitive compensation.

6 Digital Processing using ROFs

7 Coherent Operations

Figure 6: Average PWM output for simple Boolean functions with a coherent input. The output is evaluated with respect to the pulse width of A and the delay \\(\Delta\\)T.

Figure 7: Average PWM output and the analytical result for a gain of 2x (blue), gain of 4x (green), and the complement of the absolute value for x-0.5 with the exact Boolean operator **B** annotated.

8 Incoherent Operations

Figure 8: Average PWM output for simple Boolean functions with incoherent inputs A & B. The output is evaluated with respect to the pulse width of each input.

Figure 9: The result from applying an AND gate(blue), OR gate (green), and XOR gate (red) to two PWM signals with equal pulse width but are modulated by different frequencies with the analytical polynomial annotated as a function of pulse width x.

9 Circuit Implementation

10 Charge Pump

Figure 10: Post-layout simulation results showing to one of the oscillator outputs in a) for reference and the ISF \\(\Gamma_{ig}\\),\\(\Gamma_{io}\\),\\(\Gamma_{ir}\\) for injecting a small signal charge at the virtual ground, oscillator output, and virtual rail nodes.

11 Differential Oscillator

12 TDFA unit

Figure 11: Implementation of the linearised TDFA unit which calculates the difference with respect to the two PWM encoded signals D & Q.

13 Fabricated Prototype

Figure 12: Microphotograph of the fabricated device showing the chip with annotated floor plan in (a) while the P1,M1,M2 layers of the ROF layout are highlighted in (b) (n.b. metal fill omitted for clarity).

14 Measured Results

Figure 13: Experimental setup used for characterising the ROF filters. Various off-chip instruments are used to supply power and analogue test signals to the device while a Saleae Logic digital acquisition tool samples the PWM output from the chip.

Figure 14: Photograph of the custom printed circuit board used for testing the ASIC.

15 Experimental Setup

Figure 15: Spectral power densities of the ROF PWM output with a 4 mVpp 1 kHz differential input signal where the distortion has been annotated in red and the oscillator harmonics are annotated in blue.

16 Filter Characteristics

17 Linearity

18 Supply Noise Sensitivity

Figure 16: Measured filter response due to a 4 mVpp differential sinusoidal input at frequencies from 1 kHz to 100 kHz.

Figure 17: Measured harmonics due to a 1 kHz differential input tone with increasing input amplitudes. The spectral power of the output tones are calculated with respect to the maximum output dynamic range.

Figure 18: Measured PSRR of the entire system due to a 10 mVpp sinusoidal signal on top of a 0.5 V bias driving the system's V<sub>DD<sub> at frequencies from 50 Hz to 60 kHz.

19 Performance Summary

20 Conclusion

21 Acknowledgement

Refernces:

Figure 7: Average PWM output and the analytical result for a gain of 2x (blue), gain of 4x (green), and the complement of the absolute value for x-0.5 with the exact Boolean operator B annotated.