# A Charge-Accumulation Based High-Performance CMOS Circuit 

Sherif M. Sharroush<br>Department of Electrical Engineering, Faculty of Engineering, Port Said University, Port Said, Egypt, Email: smsharroush@gmail.com * Sherif M. Sharroush, smsharroush@gmail.com,DOi: 10.21608/PSERJ.2021.57591.1088

Received 21-1-2021
Revised 3-10-2021
Accepted 9-12-2021
© 2022 by Author(s) and PSERJ.

This is an open access article licensed under the terms of the Creative Commons Attribution International License (CC BY 4.0).
http://creativecommons.org/licen ses/by/4.0/



#### Abstract

There is no doubt that complementary metal-oxide semiconductor (CMOS) circuits with wide fan-in suffers from degraded performance. In this paper, a circuit that depends on charge accumulation is proposed as an alternative to conventional CMOS design. The proposed scheme is investigated quantitatively and verified by simulation using predictive technology model (PTM) of the 45 nm CMOS technology with a power-supply voltage, $V_{D D}$, equal to 1 V . Although the proposed scheme suffers from more sensitivity to process variations compared to static CMOS, the comparative analysis and simulation results confirm the superiority of the proposed scheme from the points of view of speed, area, power consumption, and unity-noise gain. It is verified that the proposed scheme has a smaller area, power consumption, and delay compared to the conventional CMOS design when the number of inputs, $n$, exceeds four, two, and three, respectively. The impacts of process variations, component mismatches, and technology scaling are also investigated. The speed advantage gained from the proposed scheme is expected to be more obvious when operating in the subthreshold region. A figure of merit including the unity-noise gain, area, power consumption, and time delay is defined and the proposed scheme showed superior performance compared to the conventional CMOS logic when $n$ exceeds four. Finally, the proposed scheme was compared with various previous schemes.


Keywords: Area, CMOS technology, charge accumulation, component mismatch, process variations, time delay.

## 1.INTRODUCTION

Static CMOS circuits have been dominant during the last three decades due to its low static-power consumption and high noise immunity [1]. However, increasing the fan-in of conventional CMOS circuits causes the performance of these circuits to degrade. Specifically, increasing the number of the inputs causes the number of either the serially connected PMOS transistors in the pull-up network (PUN) or the number of the serially connected NMOS transistors in the pull-down network (PDN) to accordingly increase, thus lengthening the low-to-high or the high-to-low propagation delay. The need to charge the parasitic capacitances associated with the NMOS and PMOS devices also increases the dynamic-switching power consumption. The matter becomes worse when adopting pseudo NMOS or dynamic CMOS logic families due to the dc current and the keepercontention current, respectively [2]. This is typically the case with wide fan-in NAND and NOR gates. Comparators, multiplexers, and microprocessor circuits are types of applications that require wide fan-in [3]. In this paper, a quick survey of the previous solutions to this problem is given and an alternative design that is based on charge accumulation is proposed.

The remainder of this paper is organized as follows: Section 2 provides a quick survey of the previous solutions to the problem of the degraded performance of wide fan-in CMOS circuits with their pros and cons and the proposed solution is presented qualitatively in Section 3 with the comparative analysis presented in Section 4. The impact of process variations and component mismatches on the proposed scheme is discussed in Section 5. Section 6 includes the simulation results, discussions, and comparisons with other schemes. Finally, the paper is concluded in Section 7.

## 2.LITERATURE REVIEW

In this section, some of the previously proposed schemes used to speed-up the response of wide fan-in CMOS circuits are discussed with their pros and cons illustrated. Before delving into these schemes, refer first to Figs. 1 and 2 for the circuit schematic of the standard domino logic and the timing diagram of the clock signal, $C L K$, respectively [4]. During the precharge phase, the $C L K$ signal is at logic " 0 ," thus activating the PMOS header transistor, $Q_{P}$, deactivating the NMOS footer transistor, $Q_{N}$, and charging the dynamic-node capacitance. If the inputs are allowed to change during this phase, there will be no change in the status of the output.

During the evaluation phase, the $C L K$ signal is at logic " 1 ," thus activating $Q_{N}$. Depending on the status of the inputs, the dynamic-node capacitance discharges or remains charged. If there is no discharging path during the PDN, the keeper keeps the dynamic node charged at $V_{D D}$ in spite of the leakage current in the PDN. The PMOS keeper, however, slows down the discharging process due to its contention current.

In [5], the strength of the PMOS keeper in a typical domino logic was changed by changing its threshold voltage by double capacitive body biasing so that both the leakage power can be reduced and the speed can be enhanced [6-8]. In [9 and 10], a footer voltage feedforward domino was proposed in which the footer transistor is dispensed, however, two parallel paths can be activated during the evaluation phase using charge sharing. According to this technique, the speed is enhanced by a feedforward path and the noise immunity is enhanced using the self-reverse biasing [11].


Figure 1: The circuit schematic of the standard domino logic [4].


Figure 2: The timing diagram of the CLK signal according to the standard domino logic [4].

The techniques used to resolve the trade-off between noise immunity and speed in wide fan-in domino logic can be classified into two categories; using a conditional keeper or raising the voltage of the source terminals of the PDN transistors [9]. The first category depends on controlling the strength of the PMOS keeper while the second category utilizes the stacking effect [12] or modifying the PDN [13]. Keeper types include conventional feedback keeper [14], XOR-based keeper [15], and conditional keeper [16]. The first one is a single keeper with contention current, although simple in design, it has the slowest speed due to the contention current. The second type is a single keeper without contention, thus resulting in the highest speed, however, at the expense of wrong output in case of charge sharing. The third type is a dual keeper which has the most complicated design [17].

In [18], a conditional isolation keeper was utilized that reduces the dynamic-switching power consumption by reducing the voltage swing associated with the dynamic node and separating the dynamic node from the PDN by an NMOS device, thus reducing the parasitic capacitance. M. Nasserian et al. have proposed controlling the voltage swing of the dynamic node [19], thus decreasing the power consumption of wide fan-in gates without degrading the other metrics such as speed, area, or noise immunity [20].

In [21], a technique was proposed that depends on performing a comparison of mirrored current of the PUN with its worst-case leakage current. According to this technique, the parasitic capacitance of the dynamic node was isolated from the PDN, thus resulting in a smaller keeper contention current, power consumption, and delay. Other keeper designs can be found in [6, 22-29]. In [30 and 31], the dynamic-switching power consumption was reduced by using charge sharing between a small dummy capacitor that was charged to a low-swing output voltage and a predischarged capacitance. The small swing at the output can then by detected and restored to rail-to-rail voltage swing using a proper sense amplifier.
In [32], a diode-connected transistor is serially connected to the transistors in the PDN, thus limiting the subthreshold leakage by the stacking effect [33]. Also, a current mirror was added to speed-up the evaluation. In [34], two modifications were made. The first one is using a buffer to delay the operation of the PMOS keeper, thus inhibiting its contention current. The second modification is adopting a variation-coupled keeper in order to compensate for the variation of the leakage current with process variations [35]. In [36], a diode-connected NMOS transistor was connected in series with the PMOS keeper in order to reduce the contention current during the evaluation phase. In [37], a current mirror was adopted in order to enlarge the discharging current of the PDN and speed-up the operation.

In [38], a negative capacitance was adopted in conjunction with the node with the highest parasitic capacitance, thus improving the timing yield. S. Narang has proposed varying the supply voltage in order to compensate for the negative bias temperature instability [39]. P. K. Pal et al. have proposed using a voltage comparison circuit in order to achieve a smaller power dissipation and higher speed by reducing the current of stacked transistors and the number of switching nodes [40]. Also, dual-threshold voltage can be adopted [41]. Feedthrough logic was used in [42] in order to improve the performance by partially evaluating the voltage in the computational block before developing the final steady-state output. In [43], a voltage-comparison based domino logic was proposed to compare between the voltages of upper and lower nodes of the PDN by a proper voltage comparator resulting in lower power consumption and higher noise immunity without significant delay increment for wide fan-in gates.

An improved keeper that is based on a graphical representation of the trade-off between the performance and noise margin was suggested in [35]. One of the most important applications that have wide fan-in is the comparator. In [44], a parallel prefix structure was used to implement a large bitwidth comparator without unnecessary transitions. According to this design, the comparator was
constructed by locally interconnecting a limited number of CMOS gates that does not exceed five and four for the fan-in and the fan-out, respectively. Also, the circuit shown in Fig. 3 was proposed in [45]. It depends on initially charging $C_{L}$, then deciding to keep it charged or discharge it depending on the voltage divider consisting of the always activated PMOS transistor and the NMOS branches.

Other realizations of wide fan-in CMOS gates can be found in [46-48]. In [47], the inputs are applied to NMOS transistors resulting in a current that is proportional to the number of the activated inputs, then entering a current race. This scheme, although faster, consumes larger power. In [47], charge sharing is performed between two capacitors, the size of one of them depends on the number of the activated inputs. In [48], a pulse with an adjusted width is adopted in order to result in a proper output in a high fan-in circuit.


Figure 3: A previously proposed alternative to NMOS stacks with eight inputs [45].

In [49 and 50], the delay variability was reduced by adopting a dual keeper with clock control; this was achieved by reducing the loop gain of the feedback circuitry. In [51], the contention current was reduced using a clock-delayed dual-keeper technique. In addition, by virtue of the stack effect utilized in the keeper circuitry, the size of the keeper could be increased to enhance the robustness of the circuit without sacrificing the speed. In [52], the keeper was controlled using a controlling network; $31.42 \%$ and $31.91 \%$ reductions in the power consumption and power-delay product, respectively, were reported for 32 -inputs OR gate. In [53], the contention current was eliminated at the beginning of the evaluation phase by modifying the keeper; specifically, an NMOS device was added in series to the keeper. In [54], a multiplexer was used for gating the clock signal, thus reducing the power consumption. Garg et al. have proposed using the stack effect to reduce the noise effect and leakage currents, however at the cost of a delay penalty due to the addition of an inverter between the dynamic node and the output node [55]. In [56], a delay network containing an odd number of inverters was used to control the keeper.

Finally, in [57], a floating-gate MOS transistor whose equivalent resistance depends on the number of the activated inputs was adopted. In a nutshell, the domino CMOS logic suffers from the degraded performance with wide fan-in in addition to the trade-off between noise immunity and speed. In the next section, the proposed scheme is presented. As will be observed, the proposed scheme has a speed advantage with this advantage more clear with wide fan-in. This is due to the fact the proposed scheme depends on the parallel
operation in the input paths instead of the series sequential operation of the conventional CMOS logic and domino logic.

## 3.THE PROPOSED SCHEME

Refer to Fig. 4 for illustrating the proposed scheme. According to this scheme, there are two phases; the predischarge and evaluation phases. The circuit operates as follows: During the predischarge phase, the CLK signal is activated, thus turning on the associated NMOS transistors and discharging any remnant charge on any of the capacitors with value $C$ and discharging the parasitic capacitance at the inverter input, $C_{L}$. The reason for this will be clear shortly. The floating capacitors can be implemented as interpoly (metal parallel plate) capacitors or fringe capacitors. If any one of the inputs were activated during the predischarge phase, there will be no response at the $V_{C L}$ node due to the deactivation of the series NMOS transistors in the input paths activated by the $\overline{C L K}$ signal. Now, during the evaluation phase, the $C L K$ signal will be at logic " 0 " and consequently, if any one of the inputs is activated, then the capacitor in the corresponding branch will charge $C_{L}$ by a specific amount. The larger the number of the activated inputs, the larger will be the voltage developed across $C_{L}$. In order for the scheme to operate properly, the threshold voltage of the inverter, $V_{\text {thinv }}$, must be adjusted such that if all the inputs except at least one is activated, then the generated voltage across $C_{L}$ will be smaller than $V_{t h i n v}$ and thus the output voltage will be at logic "1."
On the other hand, if all the inputs are activated, then the generated voltage across $C_{L}$ must be larger than $V_{\text {thinv }}$ with the result that $V_{\text {out }}$ will be at logic " 0 " as it must be. If one of the inputs is activated in a certain cycle then deactivated in the next cycle, then the charge remnant on the corresponding capacitor with value $C$ will affect the voltage across $C_{L}$ and the output voltage may be erroneous. So, it is necessary to initially discharge all these capacitors. This is the reason why the $C L K$ signal is used along with the corresponding NMOS transistors to discharge any remnant charge. For the same reason, $C_{L}$ must be initially discharged. If it were not for the series NMOS transistors in the input paths activated by the $\overline{C L K}$ signal, activation of any one of the inputs will create a contention path with the discharging of $C_{L}$, thus slowing down the discharging process or resulting in no discharging. Thus, there is a need to cut the input paths during the predischarge phase. Alternatively, the NMOS transistor discharging $C_{L}$ can be sized properly and the $n$ NMOS transistors related with the $\overline{C L K}$ signal can be dispensed. In this case, the sizing of the discharging transistor must be done adopting the worst case; that is, assuming all the $n$ inputs are activated during the predischarge phase. Finally, the reason behind connecting the inputs to both the gate and drain terminals of the pass devices is as follows: If one of the inputs is at logic " 0 ," then the corresponding pass device will be deactivated, thus not affecting the charge accumulated across $C_{L}$. The proposed circuit can thus operate as an alternative to the NMOS stacks. Any even number of inverters can be used as a buffer in order to obtain a rail-torail voltage swing at the output node.

Note also that all the $n$-input paths are identical. This is due to adopting the assumption that all the $n$ inputs have the same
probability of occurrence and thus there is no preference for an input over the others. It is obvious that the voltage difference at $C_{L}$ between the cases of all-activated-inputs and all-except-one-activated inputs decreases with increasing $n$ until we arrive at the state that this difference is smaller than that caused by process variations; an obvious degradation in the performance of the proposed scheme for large values of $n$. A solution to this dilemma can be found in Fig. 5. In this figure, the inputs are decomposed between two identical circuits, each one of them is similar to that of Fig. 4. Then, the outputs of these two circuits are ORed together so that the final output will be at logic " 0 " only when all the inputs are activated. In this case, the robustness of the circuit is enhanced, however, at the cost of the additional delay of the OR gate.

The version of the proposed scheme which acts as an alternative to the PMOS stack is the same as that in Fig. 4 with only one modification. The threshold voltage of the inverter must be smaller than $V_{C L}$ in case only one input is activated. This requires lowering $V_{\text {thinv }}$ by a large amount which requires using a huge NMOS transistor in the inverter in order for the circuit to operate properly for all the input combinations. As an alternative, the circuit shown in Fig. 6 can be used to replace the PMOS stack. Finally, the proposed scheme can also be extended for large values of $n$ using the circuit shown in Fig. 7 where a NOR gate is added.

### 3.1 Operation in the Subthreshold Regime

The drain current of the N-channel MOSFET transistor in the subthreshold region is given by [58]
$I_{D}=I_{0 N} e^{\left(\frac{V_{G S}-V_{t h n}}{n_{n} V_{t h}}\right)}\left[1-e^{\frac{-V_{D S}}{V_{t h}}}\right]$.
$V_{G S}, V_{D S}$, and $V_{t h n}$ are the gate-to-source voltage, the drain-to-source voltage, and the threshold voltage, respectively. $I_{O N}$ is given by

$$
\begin{equation*}
I_{0 N}=\mu_{0 n} C_{0 x}\left(\frac{W}{L}\right)_{n}\left(n_{n}-1\right) V_{t h}^{2} \tag{2}
\end{equation*}
$$

where $\mu_{0 n}$ is the electron mobility, $C_{o x}$ is the gate-oxide capacitance per unit area, $W$ is the channel width, $L$ is the channel length, $n_{n}$ is the subthreshold-swing coefficient, and $V_{t h}$ is the thermal voltage.

It is apparent that the drain current in the subthreshold region depends exponentially on both $V_{D S}$ and $V_{G S}$. Thus, for stacks operating in the subthreshold region, the drain current decreases significantly which causes the charging/discharging of the parasitic capacitance at the output node or the internal nodes to be much slower. This is in contrast to the charge-accumulation process of the proposed scheme which is performed in a parallel fashion. So, the speed advantage gained from the proposed scheme is expected to be more obvious when operating in the subthreshold regime.

## 4.COMPARATIVE ANALYSIS

In this section, the proposed scheme of the circuit shown in Fig. 4 is investigated quantitatively from six aspects. The first one is the determination of the valid range of $V_{\text {thinv }}$ in
order for the proposed scheme to operate properly. The second aspect is the comparison between the high-to-low propagation delay of the proposed and conventional schemes. The third aspect is the area comparison between the conventional and proposed schemes. The fourth one is the power-consumption comparison between the conventional and proposed schemes. The fifth aspect is the noise immunity. The sixth and last aspect is the determination of the minimum frequency of operation. We will in the following analysis assume that the parasitic capacitances at the output according to the conventional and proposed schemes are represented by $C_{\text {outc }}$ and $C_{\text {out }}$, respectively.

### 4.1 The Proper Range of $V_{\text {thinv }}$

As stated in the previous section, for the circuit of Fig. 4 to operate properly, $V_{\text {thinv }}$ must be larger than $V_{C L}$ that is generated when all the inputs except at least one is activated and smaller than $V_{C L}$ that is generated when all the inputs are activated. The previously mentioned condition corresponds to the minimum range for $V_{\text {thinv }}$ and thus represents the worstcase scenario. When all the inputs are activated, a charge is deposited on $C_{L}$ due to the currents of the $n$ activated paths.

The series combination of the two NMOS transistors in each input branch can be represented by a proper equivalent resistance, $R$ [59]. Thus, in steady state, the capacitors act as open circuits and consequently there will be a zero-voltage drop across each resistance. By KVL, we obtain

$$
\begin{equation*}
V_{D D}=V_{C L n}+V_{C}, \tag{3}
\end{equation*}
$$

where $V_{C L n}$ is the voltage developed across $C_{L}$ in case of $n$ activated inputs and $V_{C}$ is the voltage across the parallel combination of the capacitors each with capacitance, $C$. The voltage developed across $C_{L}$ can be found from the charge accumulated across it by using
$Q=C_{L} V_{C L n}$
This charge is nothing but the charge extracted from the capacitors in the activated paths. So, for $n$ activated inputs, $Q$ can be written as

$$
\begin{equation*}
Q=n C V_{c}, \tag{5}
\end{equation*}
$$

where $n C$ is the parallel combination of the capacitors each with capacitance $C$. So, from Eqs. (4) and (5), we obtain

$$
\begin{equation*}
C_{L} V_{C L n}=n C V_{C} . \tag{6}
\end{equation*}
$$

Substituting $V_{C}$ from Eq. (6) into Eq. (3) results in
$V_{C L n}=\frac{n C V_{D D}}{n C+C_{L}}$.
Had we assumed that there are $n-1$ activated inputs, then the voltage developed across $C_{L}$ would have been
$V_{C L(n-1)}=\frac{(n-1) C V_{D D}}{(n-1) C+C_{L}}$.
The difference between these two voltages is thus
$\Delta V_{C L}=V_{C L n}-V_{C L(n-1)}=\frac{n C V_{D D}}{n C+C_{L}}-\frac{(n-1) C V_{D D}}{(n-1) C+C_{L}}$.



Figure 4: The circuit schematic of the proposed scheme which acts as an alternative to NMOS stacks.


Figure 5: Decomposing the inputs into two circuits for enhancing the robustness of the scheme alternative to NMOS stacks.



Figure 6: The circuit schematic of the proposed scheme which acts as an alternative to PMOS stacks.


Figure 7: Decomposing the inputs into two circuits for enhancing the robustness of the scheme alternative to PMOS stacks.

One important comment about Eq. (9) is in order here. The difference in voltage across $C_{L}$ does not depend on the absolute value of capacitances; rather, it depends on the ratio between them. Taking into consideration that the integrated circuits are sensitive to poor device tolerances but have the advantage of good component matching, this seems to be an important merit [4]. $C_{L}$ depends on the gate capacitances of the transistors of the inverter and the wiring and interconnections to the input branches and to the inverter input [4]. We will neglect the latter component for simplicity and adopt the convention that the capacitance associated with each terminal of the MOS transistor is proportional to its aspect ratio [60]. Also, the PMOS devices are assumed to have twice the size of NMOS ones to compensate for the mobility difference [60]. Let the capacitance associated with each terminal of the transistor be $C_{l}$ for minimum-sized devices (i.e. with aspect ratio equal to 1 and channel length equal to the minimum-feature size). So, if the parasitic capacitance due to connecting one floating capacitor is equal to that associated with the MOStransistor terminal and the transistors of the inverter are minimum-sized except the PMOS device of the inverter which has an aspect ratio of 2, then the parasitic capacitance, $C_{L}$, will be

$$
\begin{equation*}
C_{L}=(n+4) C_{1} . \tag{10}
\end{equation*}
$$

$C_{I}$ for the adopted 45 nm CMOS technology is 0.045 fF [60]. So,

$$
\begin{equation*}
C_{L}=0.045(n+4) f F . \tag{11}
\end{equation*}
$$

It is, of course, preferable to maximize the voltage difference, $\Delta V_{C L}$, so as to make the scheme as robust as possible, thus making the scheme survivable in spite of the process variations and component mismatches. Refer to Figs. 8, 9, and 10 for the plots of $\Delta V_{C L}$ versus $n, C$, and $C_{L}$, respectively, for $V_{D D}=1 \mathrm{~V}$.


Figure 8: The relationship between $\Delta V_{C L}$ and $\boldsymbol{n}$ for $\boldsymbol{C}=$ 0.1 fF .


Figure 9: The relationship between $\Delta V_{C L}$ and $C$ for $n=8$.


Figure 10: The relationship between $\Delta V_{C L}$ and $C_{L}$ for $n=$ 8 and $C=0.1 \mathrm{fF}$.

It is obvious from Fig. 8 that $\Delta V_{C L}$ decreases monotonically with the increase in $n$. This makes sense because the voltage generated on $C_{L}$ cannot exceed $V_{D D}$ irrespective of the number of inputs. So, increasing $n$ causes the effect of each activated input on $C_{L}$ to decrease. On the other hand, from Fig. 9 and 10, it is clear that there are optimum values for $C$ and $C_{L}$ at which $\Delta V_{C L}$ is maximum. As always the case with any curve featuring an optimum behaviour, there must be two contradicting effects associated with varying the parameter at hand. In this respect, increasing $C_{L}$ causes the voltage developed across it to decrease due to the $Q=C V$ relationship. However, increasing $C_{L}$ causes its capacitive reactance to decrease, thus a larger part of the input voltage will appear across $C$. As a result, a larger charge will be extracted from $C$ to $C_{L}$ and consequently increases $\Delta V_{C L}$. Similar statements can be said about increasing $C$.

Of course, the optimum values of $C$ and $C_{L}$ can be found by differentiating $\Delta V_{C L}$ from Eq. (9) with respect to $C$ and with respect to $C_{L}$, then equating the derivative to zero. Alternatively, the optimum values of $C$ and $C_{L}$ can be found from the plots of Fig. 9 and 10 to be 0.07 fF and 0.75 fF , respectively, with the maximum value of $\Delta V_{C L}$ equal to 33.4 mV for the two cases.

For the proposed scheme to operate properly, $V_{\text {thinv }}$ must satisfy the following two conditions:
$V_{\text {thinv }}>\frac{(n-1) C V_{D D}}{(n-1) C+C_{L}}$
and
$V_{t h i n v}<\frac{n C V_{D D}}{n C+C_{L}}$.
The proper range for $V_{t h i n v}$ is thus
$\frac{(n-1) C V_{D D}}{(n-1) C+C_{L}}<V_{\text {thinv }}<\frac{n C V_{D D}}{n C+C_{L}}$.
The optimum value of $V_{\text {thinv }}$ is certainly the average of these two limits.

### 4.2 Time-Delay Comparison

In this section, the time delays of the proposed and conventional schemes are compared and plotted versus the number of inputs, $n$, for the high-to-low transition at the
output. The high-to-low propagation delay according to the proposed scheme, $t_{d p}$, contains five subcomponents (assuming a buffer containing two inverters is added at the output to obtain a rail-to-rail voltage swing). The first one, $t_{d p l}$, is the time required to discharge $C_{L}$. For typical values of $V_{C L}$ and $V_{D D}$, the discharging transistor operates in the deep-triode region and thus can be replaced by an equivalent resistance, $R_{d i s}=1 /\left(k_{n}{ }^{\prime}(W / L)_{n}\left(V_{D D}-V_{\text {thn }}\right)\right)$ where $k_{n}{ }^{\prime}$ and $(W / L)_{n}$ are the process-transconductance parameter and the aspect ratio of the NMOS devices. So, $t_{d p 1}$ is equal to $2.3 R_{\text {dis }} C_{L}$ in which the discharging-time delay is computed at the instant at which $V_{C L}(t)$ is equal to $0.1 V_{C L n}$. The second subcomponent, $t_{d p 2}$, is the time required for $C_{L}$ to charge to $V_{\text {thinv }}$ and the third subcomponent, $t_{d p 3}$, is the high-to-low propagation delay of the first inverter in case all the inputs are activated. The fourth and fifth subcomponents are associated with the two inverters of the added buffer. To find $t_{d p 2}$, the instantaneous voltage, $V_{C L}(t)$ must first be found, then $V_{C L}(t)$ is substituted by $V_{t h i n v}$ and $t$ by $t_{d p 2}$. Refer to Fig. 11 for illustration where $R$ is the sum of the equivalent resistances of the two NMOS transistors in each input path. Toward that end, each of the access transistors is substituted by its equivalent resistance, $R_{l}$, where $R_{l}$ is given by [59]

$$
\begin{equation*}
R_{1}=\frac{V_{D S}}{k_{n}^{\prime}\left(\frac{W}{L}\right)_{n}\left(V_{G S}-V_{t t n}\right)^{\alpha}}+\frac{V_{D S}}{k_{n}^{\prime}\left(\frac{W}{L}\right)_{n}\left[\left(V_{G S}-V_{t h n}\right) V_{D S}-\frac{1}{2} V_{D S}^{2}\right]} \tag{15}
\end{equation*}
$$

where $\alpha$ is a parameter that accounts for the short-channel effects and is equal to 1.3 for short-channel devices [61]. One simplification is to represent the two serially connected NMOS transistors in each input path by one equivalent transistor with half the aspect ratio [62]. The two voltages, $V_{G S}$ and $V_{D S}$, of this transistor can be substituted by their average values in which the source and drain of the equivalent transistor are at $C_{L}$ and the input terminal, respectively. The initial values of $V_{G S}$ and $V_{D S}$ are both $V_{D D}$ as the source terminals of the NMOS transistors controlled by the $\overline{C L K}$ signal are initially at 0 V while their final values are $V_{D D}-V_{C L n}$ when $C_{L}$ is assumed to be charged to $V_{C L n}$. So, the average values of $V_{G S}$ and $V_{D S}$ are $0.5\left(2 V_{D D}-V_{C L n}\right)$. Another evaluation for the transistor's equivalent resistance is simply to divide its average $V_{D S}$ by its average drain current. After simple circuit analysis, we obtain
$V_{C L}(t)=\frac{n C V_{D D}}{n C+C_{L}}\left[1-e^{-\left(\frac{n C+C_{L}}{R C C_{L}}\right) t}\right]$.


Figure 11: The equivalent circuit of the proposed scheme when all the inputs are activated.

To find $t_{d p 2}$ assuming that $t=0$ corresponds to the instant of time at which $C_{L}$ begins charging, we equate $V_{C L}(t)$ by $V_{\text {thinv. }}$ So,
$t_{d p 2}=\frac{R C C_{L}}{n C+C_{L}} \ln \left[\frac{n C V_{D D}}{n C V_{D D}-V_{\text {thinv }}\left(n C+C_{L}\right)}\right]$.

The high-to-low propagation delay of the first inverter, $t_{d p 3}$, is [4]

$$
\begin{equation*}
t_{t o 3}=\frac{2 C_{a n t}}{k_{n}^{\prime}\left(\frac{W}{L}\right)_{n}\left(V_{D D}-V_{t n n}\right)}\left[\frac{V_{t n}}{V_{D D}-V_{t l n}}+\frac{1}{2} \ln \left(\frac{3 V_{D D}-4 V_{t a n}}{V_{D D}}\right)\right], \tag{18}
\end{equation*}
$$

where $C_{\text {outl }}$ is the parasitic capacitance at the first-inverter output. The fourth and fifth subcomponents can be evaluated in a similar manner. In contrast, the time delay of the conventional NMOS stack, $t_{d c}$, can be approximated by the following relationship

$$
\begin{equation*}
t_{d c}=0.7 n R_{c} C_{\text {outc }}, \tag{19}
\end{equation*}
$$

where $C_{\text {outc }}$ is the parasitic load capacitance at the output of the conventional stack and $R_{c}$ is the equivalent resistance of each of the NMOS transistors in the stack. The delay is estimated to the $50 \%$ point. Eq. (19) was based on the assumption that each transistor in the stack was replaced by its equivalent resistance and that these resistances are equal. $R_{c}$ is approximated by

$$
\begin{equation*}
R_{c}=\frac{1}{k_{n}^{\prime}\left(\frac{W}{L}\right)_{n}\left(V_{D D}-V_{t h n}\right)}, \tag{20}
\end{equation*}
$$

where each of these transistors is assumed to operate in the deep-triode region [45]. The effect of the internal capacitances was neglected here with respect to that of $C_{\text {outc }}$.

When adopting the previously described convention for evaluating the parasitic capacitances, we get $C_{\text {outc }}=(3 n) \mathrm{fF}$. The parameters of the 45 nm CMOS technology extracted from [63 and 64] are adopted. $t_{d p}$ and $t_{d c}$ are plotted versus the number of inputs, $n$, in Fig. 12 assuming that $V_{\text {thinv }}$ is at its optimum value and a fan-out capacitance of 1 fF . Three important notes are in order here. Firstly, as obvious, the proposed scheme has a smaller time delay when the number of inputs exceeds three. Secondly, the percentage reduction in the time delay is more obvious with increasing $n$. Thirdly, and the most important, the time delay of the proposed scheme is approximately constant and does not increase with the number of the inputs. This is due to the parallel operation inherent in the proposed scheme represented by the simultaneous flow of the currents in the input branches. To be more accurate, increasing the number of the inputs causes a proportional increase in the value of $C_{L}$; a slight effect that can be safely neglected. This must be compared with the series operation of the conventional stack which is inherently slow and slows further with increasing $n$.


Figure 12: The high-to-low propagation delays of the conventional and proposed schemes versus $n$.

### 4.3 Area Comparison

As a rough estimation of the area, we adopt the approximation that the area of a certain transistor is proportional to the sum of its channel, drain diffusion, and source diffusion areas. Assume that the area of each of the source and drain diffusions are equal to that of the gate. Adopting the convention that the PMOS transistor has twice the area of the NMOS one to compensate for the mobility difference and adopting the conventional sizing strategy of increasing the aspect ratio of the transistors in the stack with $n$ transistors by $n$ in order to compensate for the delay increase [4], then the areas of the conventional and proposed schemes, $A_{c}$ and $A_{p}$, can be approximated by
$A_{c}=3\left(n^{2}+2 n\right) W L$,
and

$$
\begin{equation*}
A_{p}=3(4 n+13) W L, \tag{22}
\end{equation*}
$$

respectively. In Eq. (22), the area of the capacitor was taken equal to that of the minimum-sized transistor. Refer to Fig. 13 for the plots of $A_{c}$ and $A_{p}$ versus $n$. It can be concluded from this rough estimation of the area that the proposed scheme has an area advantage when $n$ exceeds four.


Figure 13: The estimated areas of the conventional and proposed schemes versus $n$.

### 4.4 Power-Consumption Comparison

In this section, the dynamic-switching power in the conventional and proposed schemes, $P_{c}$ and $P_{p}$, are compared. The short-circuit and leakage components are neglected here for simplicity. The power consumption according to the conventional stack is
$P_{c}=\alpha_{s w} f V_{D D}^{2}\left[(2 n(n-1)+n(n+2)+3 n) C_{1}+C_{f a n}\right]$,
where $\alpha_{s w}$ is the switching activity. $C_{f a n}$ is the fan-out capacitance. These include the switching power required to charge the parasitic capacitances at the gates of the NMOS and PMOS devices to $V_{D D}$ and also the parasitic capacitances at the internal nodes. Now, for the proposed scheme, the short-circuit power consumption of the three inverters must be taken into account if $V_{D D}>V_{t h n}+\left|V_{t h p}\right|$, where $V_{t h p}$ is the threshold voltage of PMOS devices. The adopted values of $V_{D D}, V_{t h n}$, and $V_{t h p}$ are $1 \mathrm{~V}, 0.34 \mathrm{~V}$, and 0.23 V , respectively. Although $V_{D D}$ is larger than $V_{t h n}+$ $\left|V_{\text {thp }}\right|$; however, the rise time of the adopted pulses is short enough to neglect the short-circuit power consumption [65]. Adopting the previously described strategies for computing the parasitic capacitances at the nodes, we arrive at the following equation for $P_{p}$ :
$P_{p}=\alpha_{s w} f C_{1} V_{D D}\left[V_{C L n}(n+4)+6 V_{D D}+V_{D D}(2 n+7)+2 n V_{D D}\right]+\alpha_{s w} f C_{f a n} V_{D D}^{2}$

The first term represents the switching power required to charge $C_{L}$ (assuming the worst case and thus charging occurs to $V_{C L n}$ ). The second and third terms of Eq. (24) represent the switching power required to charge the parasitic capacitances at the outputs of the three inverters and those associated with the $C L K$ and $\overline{C L K}$ signals to activate the corresponding transistors, respectively. The fourth and fifth terms are associated with the parasitic capacitances at the gates of the NMOS devices in the input paths and the fan-out capacitance, respectively. Refer to Fig. 14 for the plots of $P_{c}$ and $P_{p}$ versus $n$ for $f=1 \mathrm{GHz}$ and $C_{f a n}$ $=1 \mathrm{fF}$. It is obvious that the proposed scheme has a lower power consumption when $n$ exceeds two. Although the conventional static CMOS stack has negligible leakage power consumption due to the stack effect [33], the main reason for the smaller power consumption of the proposed scheme is the reduction of the parasitic capacitances which are associated with the smaller area. Finally, refer to Fig. 15 and 16 for the plots of the power-delay products and the energy-delay products of the conventional and proposed schemes versus $n$.


Figure 14: The power consumption according to the conventional and proposed schemes versus $n$.


Figure 15: The power-delay products according to the conventional and proposed schemes versus $n$.


Figure 16: The energy-delay products according to the conventional and proposed schemes versus $n$.

In a nutshell, the range of $n$ over which the proposed scheme has an advantage compared to the conventional stack is determined by the area and robustness. The lower limit of this range is dictated by the area and the upper limit is dictated by the effect of the process variations.

### 4.5 The Noise Immunity

There are several metrics for estimating the noise immunity including the noise margins for low and high inputs, the unity-noise gain, the unity-noise average, the average noise threshold energy (ANTE), and the energy normalized ANTE [66]. In this paper, the unity-noise gain ( $U N G$ ) is used in comparing the noise immunity of the conventional and proposed schemes. It is defined as the amplitude of the input noise that causes a noise pulse with the same amplitude at the output node [67]. The noise level can be varied by changing the amplitude or the width of the noise pulse. However, in this estimation, the pulse width is assumed to be constant with the amplitude of the noise pulse varied. Toward estimating the $U N G$, we will assume that all the inputs are connected to the same noise source which represents the worst-case scenario from the point of view of robustness.

For estimating the $U N G$ of the proposed scheme, $U N G_{p}$, the circuit of Fig. 17 (a) is adopted. Since the target is to find the $U N G_{p}$, the steady-state equivalent circuit of Fig. 17 (b) is adopted. It can be easily shown that the voltage, $V_{C L}$, is given by
$V_{C L}=V_{i n} \frac{n C}{n C+C_{L}}$.

Substituting $C_{L}$ from Eq. (10) and assuming that $C$ is equal to $C_{l}$, we get
$V_{C L}=V_{i n} \frac{n}{2 n+4}$.
Toward finding the relationship between $V_{\text {out }}$ and the circuit input, $V_{i n}$, the relationship between the input of the inverter, $V_{C L}$, and the inverter output, $V_{\text {out }}$, must first be find. From the definition of the $U N G$, the inverter is most likely to operate in the transition region. To simplify the analysis, two assumptions are adopted. First, the voltage-transfer characteristics of the inverter will be represented in a piecewise linear manner as shown in Fig. 18. Second, the input-low voltage and the input-high voltage are assumed to be $V_{t h n}$ and $V_{D D}-\left|V_{t h p}\right|$, respectively. So, the relationship between $V_{C L}$ and $V_{\text {out }}$ in the transition region can be represented by

$$
\begin{equation*}
V_{\text {out }}=\frac{\left(V_{D D}-\left|V_{t h p}\right|\right) V_{D D}-V_{C L} V_{D D}}{V_{D D}-V_{t h n}-\left|V_{t h p}\right|} . \tag{27}
\end{equation*}
$$

Substituting $V_{C L}$ from Eq. (25) into Eq. (27) results in

$$
\begin{equation*}
V_{o u t}=\frac{\left(V_{D D}-\left|V_{t h p}\right| V_{D D}-V_{i n}\left(\frac{n V_{D D}}{2 n+4}\right)\right.}{V_{D D}-V_{t t n}-\left|V_{t t p}\right|} . \tag{28}
\end{equation*}
$$

Putting both $V_{\text {in }}$ and $V_{o u t}$ equal to the unity-noise gain of the proposed scheme, $U N G_{p}$, in Eq. (28) results in

$$
\begin{equation*}
U N G_{p}=\frac{V_{D D}\left(V_{D D}-\left|V_{t h p}\right|\right)}{V_{D D}\left(1+\frac{n}{2 n+4}\right)-V_{t h n}-\left|V_{t h p}\right|} . \tag{29}
\end{equation*}
$$

Now, to find the unity-noise gain of the conventional CMOS stack, $U N G_{c}$, both the PUN and the PDN are replaced by their equivalent resistances. The $n$ serially connected NMOS devices in the PDN are represented by a single device with an aspect ratio equal to $(1 / n)(W / L)_{n}$. Similarly, the $n$ parallel connected PMOS devices in the PUN are represented by a single device with an aspect ratio equal to $(n)(W / L)_{p}$, where $(W / L)_{p}$ is the aspect ratio of a single PMOS device. The equivalent resistances of the PDN and the PUN are thus represented by

$$
\begin{equation*}
R_{N}=\frac{1}{k_{n}^{\prime}\left(\frac{1}{n}\right)\left(\frac{W}{L}\right)_{n}\left(V_{i n}-V_{t h n}\right)} \tag{30}
\end{equation*}
$$

and

$$
\begin{equation*}
R_{P}=\frac{1}{k_{p}{ }^{\prime} n\left(\frac{W}{L}\right)_{p}\left(V_{D D}-\left|V_{t t p}\right|-V_{i n}\right)} \tag{31}
\end{equation*}
$$

, respectively. The output voltage can thus be found by applying a simple voltage division as follows:

$$
\begin{equation*}
V_{\text {out }}=V_{D D} \frac{R_{N}}{R_{N}+R_{P}} . \tag{32}
\end{equation*}
$$



Figure 17: The equivalent circuit of the proposed scheme, (b) The steady-state equivalent circuit of the proposed scheme.


Figure 18: The piecewise-linear approximated characteristics of the inverter.

For a fair comparison between the conventional and proposed schemes, the NMOS devices are assumed to have minimum size while the PMOS devices are assumed to have double the size of NMOS ones to compensate for the mobility difference. Substituting $R_{N}$ and $R_{P}$ from Eqs. (30) and (31) into Eq. (32) and putting both $V_{\text {in }}$ and $V_{\text {out }}$ equal to the unity-noise gain, $U N G_{c}$, result after simple mathematical manipulations in

$$
\begin{equation*}
U N G_{c}=\frac{-b-\sqrt{b^{2}-4 a c}}{2 a} \tag{33}
\end{equation*}
$$

where $a, b$, and $c$ are given by

$$
\begin{align*}
& \qquad a=k_{n}^{\prime}-2 n^{2} k_{p}^{\prime}, \\
& b=2 n^{2} k_{p}^{\prime}\left(V_{D D}-\left|V_{t h p}\right|\right)-k_{n}^{\prime} V_{t h n}+2 n^{2} k_{p}^{\prime} V_{D D}, \\
& \text { and } \\
& c=-2 n^{2} k_{p}^{\prime} V_{D D}\left(V_{D D}-\left|V_{t h p}\right|\right), \tag{36}
\end{align*}
$$

respectively. The other solution is refused as it does not have a valid physical interpretation. The plots of the unitynoise gains according to the conventional and proposed schemes versus the number of inputs are shown in Fig. 19. As expected, both these metrics degrade with increasing the number of the inputs. The conventional scheme has a better unity-noise gain for all values of $n$. This is not unexpected as evident from the previous discussion.


Figure 19: The plots of the unity-noise gains of the conventional and proposed schemes versus $\boldsymbol{n}$.

Finally, as a combination of the previous performance metrics, a figure of merit is defined as

$$
\begin{equation*}
F O M=\frac{U N G}{A t_{d} P} . \tag{37}
\end{equation*}
$$

Fig. 20 shows the plots of the figures of merit according to the conventional and proposed schemes versus $n$. As evident, the proposed scheme is superior to the conventional one when $n$ exceeds four.


Figure 20: The plots of the figures of merit of the conventional and proposed schemes versus $n$.

### 4.6 The Minimum Frequency of Operation

It is evident that the operation of the proposed scheme depends on the charge accumulated across $C_{L}$, thus the inevitable leakage will set a lower limit on the frequency of operation. The leakage of $C_{L}$ occurs through the gate-oxide
tunneling currents of the attached transistors in addition to the subthreshold leakage of the discharging transistor. Assuming that the leakage current is $I_{\text {leak }}$, then the minimum frequency of operation can be estimated as

$$
\begin{equation*}
f_{\min }=\frac{I_{\text {leak }}}{C_{L}\left[V_{C L n}-V_{C L(n-1)}\right]} . \tag{38}
\end{equation*}
$$

Eq. (38) is based on the fact that the proposed scheme operates properly if $V_{C L}$ does not discharge below $V_{C L(n-1)}$. For $n=8, C_{L}=1 \mathrm{fF}$, and $I_{\text {leak }}=10 \mathrm{pA}, f_{\text {min }}$ is approximately 300 kHz which is much smaller than the adopted frequencies (which are in the MHz or the GHz range). Certainly, increasing $n$ causes $I_{\text {leak }}$ to increase, $C_{L}$ to increase, and the voltage difference, $V_{C L n}-V_{C L(n-1)}$, to decrease. However, the dominant effect is that of increasing $I_{\text {leak }}$, so $f_{\text {min }}$ increases with increasing the number of inputs.

## 5.IMPACT OF PROCESS VARIATIONS AND COMPONENT MISMATCHES

Process variations and component mismatches are inevitable in any integrated circuit. Process variations include variations in aspect ratio, carrier mobility, threshold voltage, and oxide thickness. The component mismatch that is investigated here is the mismatch in the value of $C$. This is because, as obvious from the two inequalities, (12) and (13), the only component mismatch that affects $\Delta V_{C L}$ is $C$. It is assumed that the mismatches along the input paths are the same for simplicity. The target is to ensure the reliable operation of the proposed scheme in the existence of these variations. The most critical voltages for the sound operation of the proposed scheme are the voltage difference, $\Delta V_{C L}$, and $V_{\text {thinv }}$. This is due to the fact that if $V_{\text {thinv }}$ lies out of the valid range, the circuit will operate erroneously. So, in this section, the effect of the mismatch of $C$ in the input paths on $\Delta V_{C L}$ is investigated. Also, the effects of the variations in $V_{t h n}, V_{t h p}, k_{n}=k_{n}{ }^{\prime}(W / L)_{n}$, and $k_{p}=k_{p}{ }^{\prime}(W / L)_{p}$ of the access transistors on $V_{\text {thinv }}$ are considered.

Let $C$ have a mismatch of $\Delta C$, so replace each $C$ by $C+$ $\Delta C$ into the two inequalities (12) and (13) to obtain the
 respectively. So, the range of $V_{\text {thinv }}$ is thus

$$
\begin{equation*}
\Delta V_{C L}=\frac{n(C+\Delta C) V_{D D}}{n(C+\Delta C)+C_{L}}-\frac{(n-1)(C+\Delta C) V_{D D}}{(n-1)(C+\Delta C)+C_{L}} . \tag{39}
\end{equation*}
$$

After simple mathematical manipulations, it can be shown that the change in $\Delta V_{C L}$ (due to $\Delta C$ ) is given by
$\Delta\left(\Delta V_{C L}\right)=\Delta C\left[\frac{V_{D D} C_{L}}{\left(n^{2} C^{2}-n C^{2}+C C_{L}(2 n-1)+C_{L}^{2}\right)}-\frac{C V_{D D} C_{L}\left(2 n^{2} C+2 n C_{L}-2 n C-C_{L}\right)}{\left(n^{2} C^{2}-n C^{2}+C C_{L}(2 n-1)+C_{L}^{2}\right)^{2}}\right],(4)$
where the terms containing $(\Delta C)^{2}$ were neglected. The plot of $\Delta\left(\Delta V_{C L}\right)$ versus $\Delta C$ for $n=16, V_{D D}=1 \mathrm{~V}, C_{L}=0.045(4+$ n) $\mathrm{fF}=0.9 \mathrm{fF}$, and $C=0.1 \mathrm{fF}$ is shown in Fig. 21. It can be shown that for a $20 \%$ change in $C$, there will be only approximately 8 mV change in $\Delta V_{C L}$.


Figure 21: The relationship between $\Delta\left(\Delta V_{C L}\right)$ and $\Delta C$.
Toward finding the variations in $V_{\text {thinv }}$ due to the variations in $V_{t h n}, V_{t h p}, k_{n}$, and $k_{p}$, we use the following approximations:
$\frac{1}{1 \pm x}=1 \mp x$
and
$(1+x)^{ \pm 1 / 2}=1 \pm \frac{x}{2}$.
Also, the terms containing $\left(\Delta V_{t h n}\right)^{2},\left(\Delta V_{t h p}\right)^{2},\left(\Delta k_{p}\right)^{2}$, and $\left(\Delta k_{n}\right)^{2}$ can safely be neglected. Since the threshold voltage of the inverter is given by [4]
$V_{\text {thinv }}=\frac{\sqrt{\frac{k_{p}^{\prime}\left(W / L_{p}\right)_{n}^{\prime}}{k_{n}^{\prime}\left(W / L_{n}\right)}}\left(V_{D D}-\left|V_{t h p}\right|\right)+V_{t h n}}{1+\sqrt{\frac{k_{p}^{\prime}(W / L)}{k_{n}^{\prime}(W / L)_{n}}}}$
It can be shown that the variations in $V_{\text {thinv }}$ due to the variations in $V_{t h n}, V_{t h p}, k_{n}$, and $k_{p}$ are given by

$$
\begin{align*}
& \Delta V_{t h i n v 1}=\frac{\Delta V_{t h n}}{1+\sqrt{\frac{k_{p}}{k_{n}}}},  \tag{44}\\
& \Delta V_{\text {thinv } 2}=\frac{-\sqrt{\frac{k_{p}}{k_{n}}} \Delta V_{t h p}}{1+\sqrt{\frac{k_{p}}{k_{n}}}},  \tag{45}\\
& \Delta V_{\text {mim } 3}=\frac{1}{2} \frac{\sqrt{\frac{k_{p}}{k_{n}}}\left(\frac{\Delta k_{n}}{k_{n}}\right)\left[\left[\sqrt{\frac{k_{p}}{k_{n}}}\left(V_{D D}-\left|V_{v_{n p}}\right|\right)+V_{\text {tun }}\right]\right.}{\left(1+\sqrt{\frac{k_{p}}{k_{n}}}\right)^{2}}-\frac{1}{2} \frac{\sqrt{\frac{k_{p}}{k_{n}}}\left(\frac{\Delta k_{n}}{k_{n}}\right)\left(V_{D D}-\left|V_{\text {utp }}\right|\right)}{1+\sqrt{\frac{k_{p}}{k_{n}}}}, \tag{46}
\end{align*}
$$

and

$$
\begin{equation*}
\Delta V_{\text {tuin } 4}=\left\{\frac{1}{2} \frac{\Delta k_{p}}{k_{p}\left(1+\sqrt{\frac{k_{p}}{k_{n}}}\right)}\right\} \frac{\left(\sqrt{\frac{k_{p}}{k_{n}}}\left(V_{D D}-\left|V_{\text {tup }}\right|\right)+V_{\text {tun }}\right)}{\left(1+\sqrt{\frac{k_{p}}{k_{n}}}\right)}-\frac{1}{2} \frac{V_{t \text { tup }}\left(\frac{\Delta k_{p}}{k_{p}}\right)}{\left(1+\sqrt{\frac{k_{p}}{k_{n}}}\right)}, \tag{47}
\end{equation*}
$$

respectively. Assuming that the variations in these parameters are uncorrelated, then the total variation of $V_{\text {thinv }}$ can be expressed as the sum of the products of the sensitivities of $V_{\text {thinv }}$ by the change in every parameter [68]. Thus,
$\Delta\left(V_{\text {thinv }}\right)=\Delta V_{\text {thn }}\left(\frac{\partial V_{\text {thinv }}}{\partial V_{\text {thn }}}\right)+\Delta V_{\text {thp }}\left(\frac{\partial V_{\text {thinv }}}{\partial V_{\text {thp }}}\right)+\Delta k_{n}\left(\frac{\partial V_{\text {thinv }}}{\partial k_{n}}\right)+\Delta k_{p}\left(\frac{\partial V_{\text {thinv }}}{\partial k_{p}}\right)$

Assuming that the percentage changes in all these parameters are equal, refer to Fig. 22 for the plot of the absolute change in $V_{\text {thinv }}$ versus this percentage change. It can be shown that the variation in $V_{\text {thinv }}$ is approximately 30 mV for a percentage variation of $10 \%$ in these parameters. From Fig. 8, the corresponding maximum number of inputs is eight. So, when $n$ exceeds eight, one must resort to the version of Fig. 5.


Figure 22: The absolute change in $V_{\text {thinv }}$ versus the percentage change in the parameters.

## 6.SIMULATION RESULTS AND DISCUSSIONS

### 6.1 Simulation Setup

In this section, the proposed scheme is simulated and compared with previous schemes including the conventional stack. The predictive technology model (PTM) of the 45 nm CMOS technology is adopted with $V_{D D}$ equal to 1 V [64]. The $50 \%$ criterion is adopted for estimating the time delays. Unless otherwise specified, the load capacitance is set equal to $1 \mathrm{fF}, n=8$, and the frequency of operation is $1 \mathrm{GHz} . C$ is set equal to 0.1 fF . The room temperature of $27{ }^{\circ} \mathrm{C}$ is adopted.

### 6.2 Results

The high-to-low propagation delays of the proposed scheme according to the analysis and the simulation are shown versus the load capacitance in Fig. 23. The high-tolow propagation delays of the conventional CMOS and the proposed scheme are shown versus the load capacitance according to the simulation in Fig. 24. Figs. 25 and 26 are the counterparts of Figs. 23 and 24 for the average power consumption. Finally, refer to Table 1 for a comparison between the proposed scheme and previous schemes from the points of view of the important performance metrics.


Figure 23: The high-to-low propagation delays versus the load capacitance of the proposed scheme according to the analysis and the simulation.


Figure 24: The high-to-low propagation delays versus the load capacitance of the conventional CMOS and proposed scheme according to the simulation results.


Figure 25: The average power consumption versus the load capacitance of the proposed scheme according to the analysis and the simulation.


Figure 26: The average power consumption versus the load capacitance of the conventional CMOS and proposed scheme according to the simulation results.

Table 1: A comparison between the proposed scheme and some of the previous schemes.

| $[66]$ | $[69]$ | $[49]$ | The proposed <br> scheme |  |
| :---: | :---: | :---: | :---: | :---: |
| Technology | 90 nm <br> CMOS | 45 nm CMOS | 180 nm CMOS | 45 nm CMOS |
| Power-supply voltage <br> $(\mathrm{V})$ | 0.9 | 1 | 1.8 | $\mathbf{1}$ |
| Gate type | 32 -input OR | 64 -input OR | 16 -input OR | 16-input OR |
| Time delay (ps) | NA* $^{*}$ | 93.64 | 180 | $\mathbf{4 2}$ |
| Average power <br> consumption ( $\mu \mathrm{W}$ ) | 1.99 | 28.02 | 20 | $\mathbf{3 2 . 3 4}$ |
| Power-delay product <br> (attoJoule) | NA* | 2623.8 | 3600 | $\mathbf{1 3 5 8}$ |
| Energy-delay product <br> $\left(10^{-30}\right.$ Joule.Second) | NA* | 245692 | 648000 | $\mathbf{5 7 0 4 7}$ |
| *NA means not available. |  |  |  |  |

As a further comparison of the proposed scheme with previous work, the scheme in [40] depends on utilizing a sense amplifier to decide on the output status. However, due to the cascaded nature of this scheme and the need to stack some of the transistors, it is expected that the area and time delay of this scheme are larger than those of the proposed scheme. The scheme in [70] depends on partitioning the dynamic-node capacitance through using a splitter transistor, thus reducing the power consumption compared to the conventional domino logic. The power consumed according to this scheme will, however, be larger than the proposed one due to the need to charge and discharge several internal node capacitances. The scheme proposed in [71] depends on utilizing multi-threshold devices, thus minimizing the leakage and the associated power consumption. However, the associated cost of fabrication is expected to be relatively large.

## 7.CONCLUSIONS

In this paper, a scheme that depends on charge accumulation was presented as an alternative to the conventional wide fan-in CMOS circuits to enhance the performance. It was concluded that the range of the inputs above which the proposed scheme has an advantage compared to the conventional CMOS logic is dictated by the area, power consumption, and speed. It was found that the proposed scheme has smaller area, power consumption, and delay when the number of inputs exceeds four, two, and three, respectively. However, there is no contender to static CMOS logic from the point of view of robustness for any number of inputs. The proposed scheme was compared with various previous schemes and showed better power-delay and energy-delay products.

It was evident that the percentage reduction in the average time delay can increase with increasing the number of inputs. The speed advantage of the proposed scheme is attributed to the reduction of the parasitic capacitances due to the use of smaller sized transistors and the parallel operation in the input paths instead of that in the series connection of the conventional CMOS logic or domino logic. Also, it can be concluded that the propagation delay
of the proposed scheme increases with increasing the number of inputs at a very slow rate compared to the conventional CMOS logic.

## Declaration of Competing Interest

The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## Declaration of Funding

This work was not funded by any institute or organization.

## 8.REFERENCES

[1] B. Razavi, Design of Analog CMOS Integrated Circuits, Second Edition, McGraw-Hill, New York, 2016.
[2] V. Kursen and E. B. Friedman, Multi-Voltage CMOS Circuit Design, John Wiley \& Sons Ltd., Great Britain, 2006.
[3] R. J. Tocci, N. S. Widmer, and G. L. Moss, Digital Systems: Principles and Applications, Tenth Edition, Prentice-Hall, New Jersey, 2007.
[4] A. S. Sedra and K. C. Smith, Microelectronic Circuits, Seventh Edition, Oxford University Press, New York, 2015.
[5] H.-T. Tung, J.-P. Son, C.-R. Kim, N.-N. Wang, and S.-W. Kim, "Low power high speed domino logic based on double capacitive body biased keeper," 8th International Conference on Solid-State and Integrated Circuit Technology Proceedings, pp. 1870 - 1872, 2006.
[6] A. Alvandpour, P. Larsson-Edefors, and C. Svensson, "A leakage-tolerant multi-phase keeper for wide domino circuits," Proceedings of the 6th IEEE International Conference on Electronics, Circuits and Systems (ICECS), vol. 1, pp. 209-212, 1999.
[7] L. Wang, R. K. Krishwamurthy, K. Soumyanath, and N. R. Shanbhag, "An energy-efficient leakagetolerant dynamic circuit technique," Proceedings of 13th Annual IEEE International ASIC/SOC Conference, pp. 221-225, 2000.
[8] F. Frustaci, P. Corsonello, S. Perri, and G. Cocorullo, "High-performance noise-tolerant circuit techniques for CMOS dynamic logic," IET Circuits, Devices \& Systems, vol. 2, Issue: 6, pp. 537 - 548, 2008.
[9] R. Singh, A. Kim, and S. Kim, "Footer voltage feedforward domino technique for wide fan-in dynamic logic," 23rd IEEE International SOC Conference, pp. 224-229, 2010.
[10] R. Singh, G.-Moon Hong, and S. Kim, "Bitline techniques with dual dynamic nodes for low-power register files," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60, Issue: 4, pp. 965 974, 2013.
[11] K. Roy, S. Mukhopadhyay, and H. Mahmoodi, "Leakage current mechanisms and leakage reduction techniques in deep submicrometer CMOS circuits", Proceedings of the IEEE, vol. 91, no. 2, pp. 305-327, Feb. 2003.
[12] H. Mahmoodi-Meimand and K. Roy, "Diodefooted domino: A leakage-tolerant high fan-in dynamic circuit design style," IEEE Transactions on Circuits and Systems I, vol. 51, no. 3, pp. 495-503, Mar. 2004.
[13] G. Balamurugan and N. R. Shanbhag, "The twintransistor noise-tolerant dynamic circuit technique," IEEE Journal of Solid-State Circuits, vol. 36, no. 2, pp. 273-280, Feb. 2001.
[14] V. G. Oklobdzija and R. K. Montoye, "Designperformance trade-offs in CMOS-domino logic," IEEE Journal of Solid State Circuits, vol. 21, pp. 304-306, Apr. 1986.
[15] C.-Hsien Hua, W. Hwang, and C.-Kai Chen, "Noise tolerant XOR-based conditional keeper for high fan-in dynamic circuits," IEEE International Symposium on Circuits and Systems, vol. 1, pp. 444 447, 2005.
[16] A. Alvandpour, R.K. Krishnamurthy, K. Soumyanath, and S.Y. Borkar, "A sub-130-nm conditional keeper technique," IEEE Journal of SolidState Circuits, vol. 37, pp. 633-638, May 2002.
[17] C.-Hsun Huang, T.-Lin Wu, and Y.-Ming Wang, "Compact precharging-transistor-less dynamic circuits for high noise-immunity applications," Proceedings of the International Symposium on VLSI Design, Automation and Test, pp. 266-269, 2010.
[18] W.-Hao Chiu and H.-Rern Lin, "A conditional isolation technique for low-energy and highperformance wide domino gates," IEEE Region 10 Conference, pp. 1-4, 2007.
[19] M. Nasseriana, M. K.-Kangi, M. M.-Nejad, and F. Moradi, "A low-power fast tag comparator by modifying charging scheme of wide fan-in dynamic OR gates," Integration, the VLSI journal, vol. 52, pp. 129 141, 2016.
[20] A. Dadoria, K. Khare, T. K. Gupta, and R. P. Singh, "A Novel high-performance leakage-tolerant, wide fan-in domino logic circuit in deep-submicron technology," Circuits and Systems, vol. 6, pp. 103 111, 2015.
[21] A. Peiravi and M. Asyaei, "Current-comparisonbased domino: New low-leakage high-speed domino circuit for wide fan-in gates," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, Issue: 5, pp. 934 - 943, 2013.
[22] K.-I. Oh and L.-S. Kim, "A clock delayed sleep mode domino logic for wide dynamic OR gate," Proceedings of the International Symposium on Low Power Electronics and Design, ISLPED, pp. 176 - 179, 2003.
[23] M. W. Allam, M. H. Anis, and M. I. Elmasry, "High-speed dynamic logic styles for scaled-down CMOS and MTCMOS technologies," Proceedings of the International Symposium on Low Power Electronics and Design, ISLPED, pp. $155-160,2000$.
[24] S. Heo and K. Asanovic, "Leakage-biased domino circuits for dynamic fine-grain leakage reduction," Symposium on VLSI Circuits. pp. 316-319, 2002.
[25] F. Moradi and A. Peiravi, "An improved noisetolerant domino logic circuit for high fan-in gates," International Conference on Microelectronics, pp. 116 121, 2005.
[26] S. Aiswarya, "CCD based new domino circuit using clamped bit line sense amplifier," International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. $1-6$, 2015.
[27] M. Manzoor, S. Verma, and M. Manzoor, "Noise tolerant techniques in dynamic CMOS logic style: A review paper," Fifth International Conference on Communication Systems and Network Technologies, pp. 876-880, 2015.
[28] B. Fu and P. Ampadu, "Techniques for robust energy efficient subthreshold domino CMOS circuits," IEEE International Symposium on Circuits and Systems, pp. 1247 - 1250, 2006.
[29] L. Wang and N. R. Shanbhag, "An energy-efficient noise-tolerant dynamic circuit technique," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, Issue: 11, pp. 1300 1306, 2000.
[30] M. M. Khellah and M. I. Elmasry, "Use of charge sharing to reduce energy consumption in wide fan-in gates," Proceedings of the IEEE International Symposium on Circuits and Systems, ISCAS, vol. 2, pp. 9 -12, 1998.
[31] M. M. Khellah, "Low-power digital CMOS VLSI circuits and design methodologies," Doctor of Philosophy Thesis, Waterloo, Ontario, Canada, 1999.
[32] H. M.-Meimand and K. Roy, "Diode-footed domino: a leakage-tolerant high fan-in dynamic circuit design style," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, Issue: 3, pp. 495 503, 2004.
[33] J. T. Kao, "Subthreshold leakage control techniques for low power digital circuits," Doctor of Philosophy Thesis, Massachusetts Institute of Technology, May 2001.
[34] V. Mahor, A. Chouhan, and M. Pattanaik, "A process variation tolerant low contention keeper design for wide fan-in dynamic OR gate," International Symposium on Electronic System Design (ISED), pp. 151-153, 2012.
[35] H. F. Dadgour and K. Banerjee, "A Novel variation-tolerant keeper architecture for highperformance low-power wide fan-in dynamic or gates," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, Issue: 11, pp. 1567 - 1577, 2010.
[36] S. Mehrotra, S. Patnaik, and M. Pattanaik, "Design technique for simultaneous reduction of leakage power and contention current for wide fan-in domino logic based 32:1 multiplexer circuit," IEEE Conference on Information \& Communication Technologies, pp. 905 910, 2013.
[37] S. M. Sharroush, Y. S. Abdalla, A. A. Dessouki, and E.-S. A. El-Badawy, "A novel technique for speeding up domino CMOS circuits containing a long
chain of NMOS transistors," International Conference on Electronic Design, pp. 1-9, 2008.
[38] H. Mostafa, M. Anis, and M. Elmasry, "Novel timing yield improvement circuits for high-performance low-power wide fan-in dynamic OR gates," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, Issue: 8, pp. 1785 - 1797, 2011.
[39] S. Narang, "Circuit level technique for mitigating effects of NBTI for wide fan-in domino logic circuits using supply voltage tuning," International Conference on Circuits, Power and Computing Technologies (ICCPCT), pp. 1-7, 2015.
[40] P. K. Pal, A. K. Dubey, S. R. Kassa, and R. K. Nagaria, "Voltage comparison based high speed \& low power domino circuit for wide fan-in gates," IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC), pp. 96 - 99, 2016.
[41] B.-Z. Guo, N. Gong, and J.-H. Wang, "Leakage current characteristics of footed dual Vt dominos in nanometer CMOS technologies," 8th International Conference on Solid-State and Integrated Circuit Technology Proceedings, pp. 260 - 262, 2006.
[42] A. Dev and R. K. Sharma, "An efficient design technique for low power dynamic feedthrough logic with enhanced performance for wide fan-in gates," 2nd International Conference on Signal Processing and Integrated Networks (SPIN), pp. 908 - 912, 2015.
[43] M. Asyaei, "A new leakage-tolerant domino circuit using voltage-comparison for wide fan-in gates in deep sub-micron technology," Integration, the VLSI journal, vol. 51, pp. $61-71,2015$.
[44] S. A.-Hafeez, A. G.-Ross, and B. Parhami, "Scalable digital CMOS comparator using a parallel prefix tree," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21, Issue: 11, pp. 1989 - 1998, 2013.
[45] S. M. Sharroush, Y. S. Abdalla, A. A. Dessouki, and E. A. El-Badawy, "Speeding-up MOS circuits containing stacks," Port Said Engineering Research Journal, Faculty of Engineering, Port Said University, vol. 16, no. 1, pp. 165-175, Mar. 2012.
[46] S. M. Sharroush, "A novel current-race fast CMOS circuit," 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), pp. 1 4, 2016.
[47] S. M. Sharroush, "A novel high-speed CMOS circuit based on a gang of capacitors," International Journal of Electronics, 2017.
[48] S. M. Sharroush, "A novel high-performance timebalanced wide fan-in CMOS circuit," Alexandria Engineering Journal, vol. 55, Issue: 3, pp. $2565-2582$, 2016.
[49] A. A. Angeline and V. Kanchana Bhaaskaran, "High speed wide fan-in designs using clock controlled dual keeper domino logic circuits," ETRI Journal, vol. 41, no. 3, pp. 383-395, 2019.
[50] A. A. Angeline and V. Kanchana Bhaaskaran, "Design impacts of delay invariant high - speed clock delayed dual keeper domino circuit," IET Circuits,

Devices \& Systems, vol. 13, no. 8, pp. 1134-1141, 2019.
[51] A. A. Angeline and V. K. Bhaaskaran, "Speed enhancement techniques for Clock-Delayed Dual Keeper Domino logic style," International Journal of Electronics, vol. 107, no. 8, pp. 1239-1253, 2020.
[52] A. K. Pandey, S. Upadhyay, T. K. Gupta, and P. K. Verma, "Low power, high speed and noise immune wide-OR footless domino circuit using keeper controlled method," Analog Integrated Circuits and Signal Processing, vol. 100, no. 1, pp. 79-91, 2019.
[53] A. Mehra, S. Singhal, and U. Tripathi, "A Novel Domino Logic with Modified Keeper in 16nm CMOS Technology," Electronics, vol. 23, no. 2, 2019.
[54] S. Singhal and A. Mehra, "Gated Clock and Revised Keeper (GCRK) Domino Logic Design in 16 nm CMOS Technology," IETE Journal of Research, pp. 1-8, 2021.
[55] S. Garg and T. K. Gupta, "Low power domino logic circuits in deep-submicron technology using CMOS," Engineering Science and Technology, an International Journal, vol. 21, pp. 625-638, 2018.
[56] A. K. Pandey, T. K. Gupta, A. Gupta, and D. Pandey, "Keeper Effect on Nano Scale Silicon Domino Logic Transistors," Silicon, 2021.
[57] S. M. Sharroush, "An alternative to CMOS stacks based on a floating-gate transistor," IEEE International Conference on Electronics, Circuits, and Systems (ICECS), pp. 109-112, 2015.
[58] J. Yao, "Dual-threshold voltage design of subthreshold circuits," Doctor of Philosophy Thesis, Auburn University, Auburn, 2014.
[59] J. P. Uyemura, Chip Design for Submicron VLSI: CMOS Layout and Simulation, First Edition, Thomson, USA, 2006.
[60] N. H. E. Weste and D. M. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, Fourth Edition, Addison-Wesley, Massachusetts, USA, 2011.
[61] A JSSC classic paper: "The simple model of CMOS drain current," IEEE Solid-State Circuits Society Newsletter, vol. 9, Issue: 4, pp. 4 - 5, Oct. 2004.
[62] S. M. Sharroush, "Design techniques for high performance MOS digital integrated circuits," Doctor of Philosophy Thesis, Port Said University, Egypt, 2011.
[63] S. M. Sharroush and Y. S. Abdalla, "Parameter extraction and modelling of the MOS transistor by an equivalent resistance," Journal of Mathematical and Computer Modelling of Dynamical Systems, vol. 27, Issue: 1, 2021.

| [64] Predictive | Technology | Model, |
| :--- | :---: | ---: |
| http://ptm.asu.edu/modelcard/45nm_MGK.pm, | last |  |
| access: 15 Sep. 2021. |  |  |

[65] J. E. Ayers, Digital Integrated Circuits: Analysis and Design, CRC Press, Boca Raton, 2005.
[66] S. R. Ghimiray, P. Meher, and P. K. Dutta, "Ultralow power, noise immune stacked-double stage clocked - inverter domino technique for ultradeep
submicron technology," Internatonal Journal of Circuit Theory and Applications, pp. 1-15, 2018.
[67] M. Asyaeia and E. Ebrahimi, "Low power dynamic circuit for power efficient bit lines," International Journal of Electronics and Communications (AEÜ), vol. 83, pp. 204-212, 2018.
[68] S. J. Rad, "Design and analysis of robust variability-aware SRAM to predict optimum accesstime to achieve yield enhancement in future nano-scaled CMOS," Doctor of Philosophy Thesis, University of California, Santa Cruz, USA, 2012.
[69] A. Kumar and R. K. Nagaria, "A new leakagetolerant high speed comparator based domino gate for
wide fan-in OR logic for low power VLSI circuits," Integartion, the VLSI Journal, vol. 63, pp. 174-184, 2018.
[70] S. Patnaik, S. Mehrotra, and M. Pattanaik, "A highspeed circuit design for power reduction \& evaluation contention minimization in wide fan-in OR gates," IEEE Conference on Information \& Communication Technologies, pp. 911-916, 2013.
[71] N. Sah and E. Mittal, "An improved domino logic," International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS), pp. 2641-2646, 2017.

