# High Performance Level Conversion for Dual $V_{DD}$ Design

Sarvesh H. Kulkarni, Student Member, IEEE, and Dennis Sylvester, Member, IEEE

Abstract—Multi- $V_{DD}$  design is an effective way to reduce power consumption, but the need for level conversion imposes delay and energy penalties that limit the potential gains. In this paper, we describe new level converting circuits that provide 10%-61% lower energy consumption at equivalent or better speeds compared to those available in the literature. Furthermore, we make the argument that level converters should be evaluated largely by their maximum speed since slower level converters consume valuable timing slack that can be used to reduce the energy of other gates in the circuit. Based on this criterion, we find the new structures to offer up to a 25% speed improvement over conventional level converters. Using an efficient dual  $V_{DD}$  voltage assignment algorithm, we show that this speed improvement can yield a reduction of up to 7.3% in total circuit power in small benchmark circuits. We also propose embedding the functionality of logic gates into the level converting circuits. For typical values of the second supply voltage, this technique can reduce delay by 15% at constant energy or lower energy by up to 30% at fixed delay.

Index Terms—Dual  $V_{DD}$  design, level conversion, low-power design.

# I. INTRODUCTION

YNAMIC power dissipation in CMOS circuits is proportional to the square of the supply voltage  $(V_{DD})$ . A reduction in  $V_{DD}$  thus considerably lowers the power dissipation of the circuit. Dual  $V_{DD}$  (or more generally multi- $V_{DD}$ ) design is an important scheme that exploits this concept to reduce power consumption in integrated circuits (ICs) [1], [2]. Since a reduction in  $V_{DD}$  degrades circuit performance, in order to maintain performance in dual  $V_{DD}$  designs, cells along critical paths are assigned to the higher power supply (VDDH) while cells along noncritical paths are assigned to the lower power supply (VDDL). Level conversion (from VDDL to VDDH) becomes essential at boundaries where a VDDL driven cell drives a VDDH supplied cell to eliminate the undesirable static current that will otherwise flow. This current flows since the logic "HIGH" signal of the VDDL driven cell cannot completely turn off the pMOS pull-up network of the following VDDH cell.

The use of level converters (LC) is largely determined by the algorithm used in assigning  $V_{DD}$  to gates. The two major algorithms used for  $V_{DD}$  assignment are: 1) clustered voltage scaling (CVS) [1] and 2) extended clustered voltage scaling

The authors are with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA (e-mail: shkulkar@eecs.umich.edu; dennis@eecs.umich.edu).

Digital Object Identifier 10.1109/TVLSI.2004.833667

(ECVS) [3]. In CVS, the cells driven by each power supply are grouped (clustered) together and level conversion is needed only at sequential element outputs (referred to as *synchronous level conversion*). In ECVS, the cell assignment is flexible, allowing level conversion anywhere (not just at the sequential element outputs) in the circuit. This is referred to as *asynchronous level conversion*. Level converters naturally impose a penalty on the power dissipation as well as performance of the circuit and limiting these penalties is very important in any multi- $V_{DD}$  design. Since ECVS allows more freedom in  $V_{DD}$  assignment, it can provide greater power reductions than CVS [3]. To enable this, low power and fast asynchronous level converters must be available to designers.

Another approach to reduce system level power dissipation is to uniformly lower the  $V_{DD}$  of the entire design and simultaneously scale down or use multiple threshold voltages (VTH) in order to maintain the same circuit delay. This method can provide appreciable savings in power without need for level conversion or dual-supply routing; however, these gains are much less compared to those made available by dual  $V_{DD}$  design. For example, [4] showed that dual  $V_{DD}$ /dual VTH provides higher (as much as 1.7X) power reduction as compared to single  $V_{DD}$ design, thus making dual  $V_{DD}$  design and the impact of level conversion on dual  $V_{DD}$  design an important topic of study.

In this paper, we propose six new asynchronous level converters that consume less power and are often faster than previously presented circuits. The dual  $V_{DD}$  design scheme can be generalized into a dual  $V_{DD}$ /dual VTH design scheme where the threshold voltage of the transistors may take one of two different values (we denote the lower VTH by "VTHL" and the higher VTH by "VTHH") [5]–[8]. Most high-performance CMOS processes today offer dual threshold voltages. Level converter circuits can leverage the availability of this second VTH to maintain good speed characteristics when converting from very low voltages such as those we describe in this work [9]. Transistors in the level converting circuits that we study are selectively assigned to VTHL or VTHH based on their delay criticality within the overall circuit.

ECVS (and hence, these asynchronous LCs) will be most effective when the clock cycle is not highly aggressive, since the delay overhead of several such level converters per logic path is prohibitive for heavily pipelined designs like microprocessors (having a clock cycle of only 10–15 FO4 inverter delays [10]). While the level converters we propose here could be useful in any multi- $V_{DD}$  design, they are ideally suited to ASIC designs that have moderate clock speeds and stringent power requirements (for instance, due to plastic packaging constraints).

The penalties (with respect to delay and energy) imposed by level converters can also be mitigated by a new class of circuits

Manuscript received September 30, 2003; revised March 18, 2004. This work was supported in part by the MARCO/DARPA Gigascale Silicon Research Center and in part by the Semiconductor Research Corporation under Contract 2001-TJ-915.

that we describe. These circuits embed level conversion functionality into standard logic gates. Hence, they effectively function as *level converting logic circuits*. We demonstrate substantial improvements in delay and power dissipation when using such embedded circuits instead of traditional level converting circuits either before or after logic gates.

Our paper is organized as follows. In Section II, we detail our simulation setup. In Section III, we describe the level converting circuits that we study and investigate their performance and robustness. We also discuss the impact of level converter performance on system-level power in this section. In Section IV, we present new embedded logic level converting circuits and discuss their performance. Section V concludes the paper.

# **II. SIMULATION SETUP**

We use the HSPICE Levenburg–Marquardt based circuit optimizer to optimize all the circuits we study [11]. We compare the energy-delay design space of the various circuits in order to draw comparisons among them. A delay target is set while the energy target is swept upwards until the optimizer meets both the energy and delay constraints. In this way, we obtain the minimal energy device solution for a given delay. The delay is also swept to smaller values until the target is no longer met. Thus we also report the fastest possible configurations of each circuit.

All simulations use an industrial  $0.13-\mu$ m CMOS technology. The higher power supply voltage (VDDH) is 1.2 V, while the lower supply voltage (VDDL) is varied among 0.6 V, 0.7 V, and 0.8 V. The nominal nMOS threshold voltage in this process is 0.23 V and is used as the higher threshold voltage (VTHHN). The lower nMOS threshold voltage (VTHLN) is varied among 0.11 V, 0.15 V, and 0.23 V (the latter value implies a single VTH process). Similarly the higher pMOS threshold voltage (VTHLP) is varied among -0.09V, -0.13V, and -0.21V. The fanout-of-four (FO4) inverter delay at 1.2 V and nominal threshold voltages is 40 ps.

Fig. 1 shows our simulation setup. Our testbench is similar to the one used in [12] and [13]. We constrain the circuit input capacitance to  $\leq 4$  fF. This allows each of the circuits to have a unique fanout ratio that is optimal for speed in that particular configuration. The delay across the feeding inverter shown in Fig. 1 is used to monitor and limit the input capacitance of each of the circuits. This 2X drive inverter is also used to model a typical gate feeding the input of the circuit under study. The delay target given to the optimizer is set between the input of this inverter and output node to prevent sizing the input transistors of the circuit under study arbitrarily large. All circuits are simulated at a load capacitance of 17 fF (accounting for wiring capacitance as well as the next stage input capacitance). This load is representative of the input capacitance of four 2X drive inverters as well as a typical wirelength of 35  $\mu$ m. The minimum device width in this process is 0.25  $\mu$ m. Energy is measured using a 2-ns period (Tcycle) and switching activity of 10% to represent typical on-chip signal behavior and capture the impact of both leakage and switching energy. In a separate analysis, we also use a 20 ns period with the same switching activity to compare the energy consumption of these circuits when operating at low frequencies. The energy consumed by



Fig. 1. Simulation setup.

the feeding inverter (neglecting its own input capacitance) is included in the reported energy values. Thus, the loading of these circuits on the previous stage has been considered.

# III. ASYNCHRONOUS LEVEL CONVERTER DESIGN AND ANALYSIS

# A. Circuit Topologies

Fig. 2(a) and (b) shows two level converters that have been presented previously and the new level converters that we propose, Fig. 2 (c)–(h). Fig. 2(a) shows a traditional level converter, which is a differential cascode voltage switched (DCVS) logic gate [3]. Here, the input arrives at VDDL and is up-converted to VDDH through the cross-coupled pMOS device pair formed by transistors M4 and M6. This converter consumes significant energy due to the contention at the points of connection of the cross-coupled pair and the pull-down nMOS network formed by M3 and M5. Fig. 2(b) [labeled PG] shows a level converter described in [14] that is based on a weak feedback pull-up device (M4) and an nMOS pass gate (M1). The purpose of the pass gate device is to isolate the input of the pMOS M3 from the previous logic stage. The feedback device M4 can then pull-up the internal node without consequence to the prior logic that is running at VDDL. This level converter consumes less energy than the DCVS level converter due to its fewer devices and reduced contention. All new level converters that we propose are based on this level converter.

Fig. 2(c) [STR1] shows the first new level converter that we propose. As seen in the figure, the feedback device M4 (keeper) from Fig. 2(b) is split into two devices M4 and M5. This is a known high-performance dynamic design technique and the advantage of this change is to reduce the capacitive load (gate capacitance of the keeper device) on node N [in Fig. 2(c)]. When sized properly, M5 is larger than M4 (which tends to minimum width and length) thus reducing the loading on transistors M2 and M3 by the keeper. This allows M2 and M3 to be sized smaller, reducing the total energy consumption. Fig. 2(d) STR2, is an extension of STR1, where inverter INV (supplied by VDDL) is added to drive the keeper device M5. The goal is to turn off the feedback path faster (as soon as the input starts falling) in order to speed the falling transition at the output since a falling transition at the input defines the critical path for the basic PG level conversion circuit. The addition of INV reduces the contention caused by the keeper when the input is going low as the keeper M5 is substantially weakened, though not completely turned OFF (INV which is driven by



Fig. 2. Existing and proposed asynchronous level converter topologies. Transistor labeled with (\*) indicate low-VTH devices.

VDDL cannot completely turn OFF M5 since its source is connected to VDDH). In Fig. 2(e) STR3, we propose another structure targeted at speeding the critical path. Here, transistor M6 is added to augment the efforts of M3. In this case, the inverter INV is used to drive M6 rather than the keeper device M5 as in STR2. The techniques used in STR2 and 3 can be used together and the resultant structure is seen in Fig. 2(f) STR4.

Fig. 2(g) STR5, shows another level converter that we propose. In order to understand the functioning of this circuit, it is important to reemphasize that the pass transistor M1 in the PG level converter exists solely to isolate the previous logic stage from the VDDH power supply of the level converter circuit. In the absence of this protective transistor, a reverse current will flow which originates in the VDDH supply of the level converter, passes through M4, and then through the ON pMOS transistor of the previous stage gate (assumed to be an inverter) into the supply of the previous stage gate (VDDL). This reverse current will be a source of leakage power for this circuit and hence must be controlled. This current will flow even if the gate voltage of M1, V(G) in Fig. 2(g), exceeds  $VDDL+VTH_{M1}$ . To illustrate, Fig. 3 depicts the reverse current, Lreverse, flowing if  $V(G) > VDDL + VTH_{M1}$ . Given this constraint, we can raise V(G) up to  $VDDL + VTH_{M1}$  while maintaining this property of circuit isolation. The PG level converter is a special case of this circuit, where V(G) is fixed at VDDL, and thus results in suboptimal performance of both M1 and the entire circuit.

This concept is exploited in STR5 with transistors M5 and M6 added to realize a higher V(G). Transistor M5 acts as a pull-up device and raises V(G) to  $VDDH - VTH_{M5}$ . Clearly the number of such (series-connected) transistors (and their respective threshold voltages) needed to pull-up V(G) to the maximum value of  $VDDL + VTH_{M1}$  depends on the actual values of VDDL, VDDH, and  $VTH_{M1}$  used in the circuit. In practice it will not always be possible to meet this exact value. However, V(G) can be set to a level close to the target value by an appropriate choice of the pull-up devices. In particular, it will always be possible to raise this voltage above VDDL, which is the value of V(G) in level converter PG and all previously discussed variants. This larger gate voltage provides an improve-



Fig. 3. Reverse current flow mechanism in pass-gate based level converters if the gate voltage of M1 becomes too large.

ment in the performance of M1 and reduces the contention at node C. The final result is better overall performance for the level converter. Transistor M6 is added to prevent V(G) from rising above its allowed maximum value of  $VDDL + VTH_{M1}$ . This is easily achieved by connecting M6 as seen in the figure and assigning its threshold voltage to the same value as for M1. Without M6 V(G) could rise above its allowed value due to the leakage current of M5 and hence M6 is essential. A buffer capacitance, Cbuf, is added to stabilize V(G) to its designed value. This is needed since the gate-drain overlap capacitance of M1 can cause node G to spike as node IN transitions. We use a value of 8 fF for Cbuf in our analyses-this corresponds to roughly  $0.5 \,\mu {\rm m}^2$  of gate area in this process. In this circuit a leakage current can potentially flow into VDDL through M6 (when V(G)) rises above VDDL). However, this current can be limited to a negligible value by proper choice of the pull-up transistors that set V(G). The value of V(G) thus represents a tradeoff between leakage power and the circuit speed. We have ensured that this leakage current is small in all our designs. The added devices M5 and M6 in Fig. 2(g) for STR5 are minimum sized and their intrinsic capacitances do not toggle making the energy and area overhead small.

Finally, Fig. 2(h) STR6 shows the last level converter that we propose. This level converter combines the techniques used in STR1 and 5. By investigating the split keeper structure (STR1) and the boosted gate voltage technique (STR5) independently



Fig. 4. Energy versus delay design space for the various level converters.

we can assess their relative contributions. Then, in STR6 we can evaluate the total improvement expected by using both the proposed techniques.

In all the above circuits, we assign the lower threshold voltage to a selected subset of transistors to balance performance and power dissipation. Devices set to the lower threshold voltage are marked with a (\*) in Fig. 2.

## B. Simulation Results

Fig. 4 shows the energy-delay behavior of the different level converters in Fig. 2. The leftmost data point for each curve reflects the energy-delay values for the fastest possible delays of each circuit. The reported energy does not include that of the load capacitance or the input capacitance of the input inverter that are held fixed over all simulations. The energy-delay analysis was carried out at different values of VDDL and low threshold voltages (VTHLN and VTHLP) as indicated on the respective plots.

While STR1, 5, and 6 perform better than both the existing LCs (DCVS and PG) for *all* combinations of the studied VDDL and VTHLN/VTHLP, STR2, 3, and 4 were found to be appreciably better in certain select conditions (in the remaining cases they either performed as well as DCVS and PG, or marginally better). In order to improve the readability of the paper, we have hence included the energy-delay plots of STR2, 3, and 4 only for these select cases.

Under a common VDDL scenario (VDDL = 0.8V, or 2/3 of the nominal power supply) with dual thresholds, the level converters can achieve delays of < 80 ps which is less than 2 FO4 inverter delays in the reference 1.2 V technology. The fastest structures under all scenarios are STR5

and 6 due to the larger gate voltage applied to the pass transistor. As observed in the plots, STR1, STR5, and STR6 have the best overall energy-delay performance over all combinations of VDDL and VTHLP/VTHLN. In particular, the improved energy consumption and delay performance of all the new circuits is best seen in the VDDL = 0.6V, VTHLP/VTHLN = -0.21 V/0.23V scenario. In these conditions, STR1 and 3 consume 40%–50% less energy than existing level converters at their minimal delay points [~ 150 ps in Fig. 4(d)] while STR2 and 4 are only slightly less efficient.

STR5 and 6 are most effective in cases where VDDL is low and VTHLN is high [e.g., Fig. 4(d) and (f)]. This is as expected, since the technique used in these level converters will be most effective in cases where the gate overdrive (VGS–VTH) for M1 is small. In highly scaled sub-1 V technologies [15], the threshold voltage is not expected to be scaled as aggressively as the supply voltage due to leakage constraints. This leads to reduced  $V_{DD}$ /VTH ratios and greatly penalized gate overdrive. In these cases, and in cases where a second VTH is not available or is conservatively set, STR5, and 6 are the ideal choices for level conversion despite their higher design complexity.

Indeed, all the new level converters perform best in cases with small gate overdrive making them suitable for future technologies. As evidence, the PG and DCVS level converters are seen to become much less efficient in the VDDL = 0.6 V and VTHLP/VTHLN = -0.21 V/0.23 V case [Fig. 4(d)], while the proposed circuits provide excellent delay and energy properties in this case.

At larger values of VDDL, the STR1 level converter consumes 40% less energy than the DCVS structure and 15% less energy than the PG level converter [e.g., Fig. 4(a) at

#### TABLE I

COMPARISON OF STR5 AND STR6 TO THE DCVS AND PG LEVEL CONVERTERS FROM LITERATURE NOTATION: "SPD+" GIVES THE DEGREE TO WHICH STR5/STR6 OUTPERFORM DCVS/PG IN MINIMAL DELAY. "SAME EN." REPRESENTS THE DELAY IMPROVEMENT OF STR5/STR6 OVER DCVS/PG FOR THE SAME ENERGY. THE ENERGY AT WHICH THIS COMPARISON IS MADE IS REPORTED IN BRACKETS. SIMILARLY, "SAME DEL." REPRESENTS THE ENERGY CONSUMPTION IMPROVEMENT OF STR5/STR6 COMPARED TO DCVS/PG FOR THE SAME DELAY. THE DELAY AT WHICH THIS COMPARISON IS MADE IS REPORTED IN BRACKETS

|                     |        |              | (a) STR5    |  |       |              |             |  |
|---------------------|--------|--------------|-------------|--|-------|--------------|-------------|--|
| VDDL, VTHLN, VTHLP  | DCVS   |              |             |  | PG    |              |             |  |
|                     | SPD+   | SAME EN.     | SAME DEL.   |  | SPD+  | SAME EN.     | SAME DEL.   |  |
| 0.8V, 0.23V, -0.21V | 10.50% | 19% (33fJ)   | 50% (95ps)  |  | 13%   | 13% (27fJ)   | 26% (100ps) |  |
| 0.8V, 0.11V, -0.09V | 6.50%  | 15% (45fJ)   | 33% (77ps)  |  | 7.70% | 4% (40fJ)    | 8% (78ps)   |  |
| 0.7V, 0.15V, -0.13V | 5.40%  | 17% (35fJ)   | 42% (93ps)  |  | 7.40% | 7% (35fJ)    | 28% (95ps)  |  |
| 0.6V, 0.23V, -0.21V | 17%    | 20% (27fJ)   | 50% (153ps) |  | 16%   | 18% (27fJ)   | 50% (150ps) |  |
| 0.6V, 0.11V, -0.09V | 3%     | 12% (40fJ)   | 36% (100ps) |  | 8.50% | 8% (37fJ)    | 19% (105ps) |  |
| 0.7V, 0.23V, -0.21V | 20%    | 22% (31fJ)   | 48% (120ps) |  | 15%   | 17% (31fJ)   | 40% (115ps) |  |
| (b) STR6            |        |              |             |  |       |              |             |  |
| VDDL, VTHLN, VTHLP  |        | DCVS         |             |  | PG    |              |             |  |
|                     | SPD+   | SAME EN.     | SAME DEL.   |  | SPD+  | SAME EN.     | SAME DEL.   |  |
| 0.8V, 0.23V, -0.21V | 14.5%  | 21% (32.5fJ) | 55% (96ps)  |  | 17.0% | 14% (27.5fJ) | 38% (98ps)  |  |
| 0.8V, 0.11V, -0.09V | 7.9%   | 17% (47.5fJ) | 35% (76ps)  |  | 11.0% | 4% (40fJ)    | 10% (77ps)  |  |
| 0.7V, 0.15V, -0.13V | 6.7%   | 21% (34fJ)   | 48% (92ps)  |  | 10.5% | 12% (34fJ)   | 34% (95ps)  |  |
| 0.6V, 0.23V, -0.21V | 22.7%  | 25% (29fJ)   | 61% (155ps) |  | 21.7% | 24% (29fJ)   | 59% (155ps) |  |
| 0.6V, 0.11V, -0.09V | 3.0%   | 17% (32fJ)   | 46% (101ps) |  | 9.4%  | 14% (32fJ)   | 32% (105ps) |  |
| 0.7V, 0.23V, -0.21V | 24.6%  | 25% (35fJ)   | 53% (120ps) |  | 20.0% | 17% (32fJ)   | 42% (115ps) |  |

a fixed delay of ~ 99ps]. At lower values of VDDL with high VTHLP/VTHLN, STR1 consumes 31%-33% less energy and is 3%-4% faster than the DCVS and PG level converters [Fig. 4(d)]. At lower values of VDDL with low VTHLP/VTHLN [Fig. 4(e)], it consumes 37% and 15% less energy than the DCVS and PG level converters, respectively, for the same delay (~ 105 ps).

We emphasize the significant improvement enabled by the inclusion of the split keeper topology to the PG LC (15%–31% energy reduction at fixed delay). Since there is little quantitative analysis of how much performance benefit the split keeper offers in traditional domino circuits, we performed a similar analysis as the above with a simple footless domino buffer replacing the level converter. Conditions on the input and output capacitance as well as switching activity were maintained. We found zero energy improvement for the split keeper topology for two different delay points indicating that the PG LC topology is particularly well suited to the use of the split keeper technique.

The boosted gate voltage approach of STR5 provides excellent results overall. Table I(a) summarizes the performance benefits of this LC. At higher values of VDDL it operates 5%–13% faster than the DCVS and PG level converters. At lower values of VDDL this performance improvement varies from 3%–17% as inferred from the plots. At a fixed delay, and at high values of VDDL and VTHLP/VTHLN, STR5 consumes 50% and 26% less energy than DCVS and PG, respectively.

Table I(b) summarizes our results for STR6. This level converter (and also STR5) provides excellent energy and delay properties for all the studied VDDL and VTHs. In particular, STR6 provides up to a 25% speed-up over the existing level converters (VDDL = 0.7V, VTHLP/VTHLN = -0.21 V/0.23 V). It also provides up to 61% lower energy consumption for the same delay (VDDL = 0.6 V, VTHLP/VTHLN = -0.21 V/0.23 V).

While the presented circuits offer the maximum benefit at lower VDDL values, there are signal integrity concerns when

TABLE IICOMPARISON OF LEVEL CONVERTERS AT DIFFERENT CLOCK FREQUENCIES(Tcycle = 2 ns and Tcycle = 20 ns). The Energies areCOMPARED AT A FIXED DELAY OF  $\sim$  100 ps for VDDL = 0.8 V,VTHLP/VTHLN = -0.21 V/0.23 V

| Tcycle (ns) | DCVS   | PG     | STR1   | STR5   | STR6   |
|-------------|--------|--------|--------|--------|--------|
| 2           | 42.7fJ | 28.2fJ | 23.4fJ | 21.0fJ | 17.0fJ |
| 20          | 47.5fJ | 31.3fJ | 26.2fJ | 34.4fJ | 31.4fJ |

such low-voltage signals are exposed to VDDH generated noise. Since the use of VDDL values on the order of 50% of VDDH appear to provide minimum power consumption at the system level [2], [9], the use of conservative design guidelines such as increased spacing rules between VDDL and VDDH signals or aggressive shielding policies may be required. The increased area and power due to such guidelines would need to be weighed against the power savings achievable by ultra-low VDDL values.

In Section III-A, it was pointed out that the pass transistor gate voltage V(G) creates a tradeoff between leakage and speed for STR5 (and similarly STR6). We have constrained the leakage current to a small value by appropriately controlling V(G). However, with longer clock cycles, this leakage power will diminish the energy savings of STR5 and STR6. Table II compares STR1, STR5, STR6, DCVS, and PG under such conditions (Tcycle = 20 ns compared to the original Tcycle = 2 ns). The reported energies for all level converters are at a fixed delay of  $\sim 100 \text{ ps}$  (VDDL = 0.8 V, VTHLP/VTHLN = -0.21 V/0.23 V). STR5 continues to outperform DCVS while consuming about 10% more energy than PG. On the other hand, STR6 also outperforms DCVS but consumes almost the same energy as PG. However, we point out that STR5 and STR6 may continue to save more energy at the system level if they are designed for minimum delay, since they will potentially allow greater VDDL cell assignment due to their speed improvement. Also, the remaining new level converters (STR1, 2, 3, 4), which do not use a raised gate voltage

 TABLE III

 COMPARISON OF THE DIFFERENT LEVEL CONVERTERS WITH RESPECT TO THE ED<sup>2</sup> METRIC

 THE MINIMAL ED<sup>2</sup> FOR EACH CIRCUIT IS REPORTED HERE (NORMALIZED TO THE DCVS LEVEL CONVERTER). VALUES IN BRACKETS ARE THE DELAYS OF EACH CIRCUIT AT THE MINIMAL ED<sup>2</sup> POINT OF OPERATION. VALUES ARE REPORTED FOR THE DIFFERENT CHOICES OF VDDL, VTHLP/VTHLN STUDIED IN THIS WORK

| VDDL, VTHLN, VTHLP  | DCVS        | PG             | STR1           | STR2           | STR3           | STR4           | STR5           | STR6           |
|---------------------|-------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
| 0.8V, 0.23V, -0.21V | 1 (118.1ps) | 0.68 (112.4ps) | 0.59 (113ps)   | 0.69 (118.9ps) | 0.66 (119.9ps) | 0.70 (119.4ps) | 0.60 (105.4ps) | 0.66 (83.6ps)  |
| 0.8V, 0.11V, -0.09V | 1 (92.6ps)  | 0.65 (94.7ps)  | 0.60 (96.3ps)  | 0.67 (93.4ps)  | 0.69 (92.8ps)  | 0.74 (91.4ps)  | 0.71 (86.9ps)  | 0.53 (99.6ps)  |
| 0.7V, 0.15V, -0.13V | 1 (122.4ps) | 0.70 (115.5ps) | 0.61 (116.3ps) | 0.72 (109.2ps) | 0.69 (124.4ps) | 0.73 (112.4ps) | 0.66 (102.1ps) | 0.60 (97.8ps)  |
| 0.6V, 0.23V, -0.21V | 1 (172.6ps) | 0.79 (179.1ps) | 0.57 (160.3ps) | 0.66 (168.3ps) | 0.62 (171.1ps) | 0.68 (167.9ps) | 0.55 (143.2ps) | 0.44 (135.6ps) |
| 0.6V, 0.11V, -0.09V | 1 (118.6ps) | 0.72 (133.3ps) | 0.61 (124.9ps) | 0.69 (131ps)   | 0.71 (119.1ps) | 0.75 (130ps)   | 0.75 (108ps)   | 0.65 (109.4ps) |
| 0.7V, 0.23V, -0.21V | 1 (142.8ps) | 0.71 (129.9ps) | 0.58 (136.5ps) | 0.66 (132.0ps) | 0.63 (132.8ps) | 0.68 (129.8ps) | 0.65 (108.8ps) | 0.56 (103.9ps) |

on the pass transistor continue to provide the same energy saving trends and compatible/faster speeds for a larger Tcycle. The numbers for STR1 in this table confirm this. The value of V(G) in STR5 and 6 can be appropriately set to tradeoff leakage power for speed at the expected circuit activity level.

In [16], the authors suggested that a VDDL of (VDDH + VTHH)/2 and a VTHL of VTHH (i.e., a single VTH process) minimize the power dissipation of the dual  $V_{DD}$  system. Hence, in Fig. 4(f) we report results obtained for these values of VDDL and VTHLN/VTHLP. STR1, STR5, and STR6 continue to provide highly improved results as in the cases described earlier. In particular, STR6 provides a 20%–25% speed-up over DCVS and PG. It also enables a 42% and 53% reduction in energy consumption at a fixed delay compared to PG and DCVS, respectively. In addition, STR2, 3, and 4 also provide energy and speed improvements over DCVS and PG at this supply and threshold voltage combination.

When comparing energy-delay products (EDP) [17], a popular metric for low-power design, we find that STR1 provides 8%–22% lower EDP than the PG design and 40% lower EDP than the DCVS structure. Other designs (STR2, STR3, STR4, STR5, and STR6) yield EDP values that are up to 29% smaller than PG and 15%-46% lower than DCVS. However, we suggest that the use of EDP as a metric or design guide for level converters is somewhat misleading as level converters with smaller delays will allow more logic gates to be assigned to VDDL and lead to lower total power. Since optimizing EDP leads to design points that are far from the minimal delay point, a minimal EDP level converter is undesirable. Instead, a metric that weights delay more heavily than energy could be particularly useful for level converter comparisons. In [18] and [19], the authors suggest a metric based on energy multiplied by delay to a power m. As m gets very large (e.g., > 3), the ED<sup>m</sup> product approaches a delay-only metric. It is shown in [18] that  $ED^2(m = 2)$  is most properly suited for trading off between the switching power and the standby power of a design. In Table III, we show the resulting minimal  $ED^2$  products for each of the level converter structures. Minimal EDP design points typically correspond to 10%–20% larger delays compared to the minimal  $ED^2$  design. Overall, the new structures result in 25%-56% lower ED<sup>2</sup> compared to DCVS and up to 45% lower ED<sup>2</sup> than the PG structure. The new structures provide lower  $ED^2$  products at all design points with STR1 and STR6 showing excellent performance throughout due to their low energy/delay properties.

As observed from the data, these level converters can achieve delays as low as 1.75 times the FO4 inverter delay in the reference technology. This indicates that it will be prohibitive to use



VDD + 10% (25C) CORNER

FAST (0C) CORNER

Fig. 5. Level converter supply and process variation sensitivity. (s1 = % spread of delay at  $\pm 10\% V_{DD}$  corners measured from nominal (typical) corner. s2 = % spread of delay at fast and slow process corners from typical corner.).

several such level converters on a single logic path in a design with a small clock cycle time on the order of 10–15 FO4 delays (e.g., heavily pipelined microprocessors). Thus, we suggest that these circuits and asynchronous level conversion in general, are most appropriate for high-performance ASIC designs with tight power budgets (in which clock cycles are more on the order of 50–70 FO4 delays [20]).

## C. Robustness Analysis

Maintaining robustness is an important concern when circuits are operated at low voltages such as those we have considered. Also, the circuits we proposed have a pass transistor at the input. They may thus appear to have more susceptibility to noise because of the lack of input isolation. However, as we explain below, this is not the case with these circuits since the exposed pass transistor is always tied high. The proposed circuits were found to be closely comparable in robustness to the DCVS circuit and other standard logic gates such as inverters. While typical pass-transistor circuits require input isolation as they may pass erroneous values that are sampled on the output, the PG-based level converters in this work only use their pass transistor to pass the input voltage to an internal node that is connected to the gate of another MOSFET. Since the pass transistor is always ON, there is no chance of a noisy signal being sampled (i.e., disconnected from the input) and stored on the internal node. Thus, from a noise perspective the circuit becomes similar to the case where the input is tied directly to the gate of the pull-up pMOS [e.g., M3 in Fig. 2(b)]. In particular, the



Fig. 6. Circuit robustness analysis.

problematic "Pass 0" noise source [21] where a negative noise pulse on the input can turn ON an nMOS device with 0 V at its gate and mistakenly pass a 0 to the output, cannot occur here since the input to the pass transistor is tied high. We studied and compared the robustness of the various LCs by adopting the following methods to represent typical on-chip switching behavior.

We first studied the performance of the level converters at different process corners and with varying power supply voltage and temperature. This study gives insight into the sensitivity of each of the circuits to such variations. Results using VDDL = 0.8V and VTHLP/VTHLN = -0.09 V/0.11 V are shown in Fig. 5. We studied the susceptibility of all level converters to  $\pm 10\%$  dc supply noise on both VDDL and VDDH and across worst-case 130 nm fast-slow process corners. The delay variation is nearly the same for all level converters and shows acceptable spread. For comparison, the FO4 inverter delay in this technology varies by 18% and 51% for  $\pm 10\% V_{DD}$ variation and fast/slow process respectively with these numbers rising to 20% and 56% at reduced voltages ( $V_{DD} = 0.8 \text{ V}$  and VTHLP/VTHLN = -0.09 V/0.11 V).

In addition, triangular noise pulses with base width of 80 ps (2 FO4 inverter delays) and peak magnitude of 0.3 V (25% of VDDH and 37.5% of VDDL in this case) were applied as inputs to each of the level converters when they were sized for optimal speed. In all cases, there was no output glitching whatsoever, implying that these asynchronous level converters are tolerant of substantial input noise. The static voltage transfer characteristics all show large gain in their transition regions which are within 50 mV of VDDL/2 in all cases.

Since circuit robustness is expected to be worst for the lowest supply voltages (VDDL = 0.6V), we further investigated the robustness at such low voltages. We applied more pessimistic triangular noise pulses of width equaling 120 ps (twice the FO4 delay at VDDL=0.6V) and varied the amplitude (Vpk) until the circuit failed (i.e., the output reaches 0.5\*nominal\_output\_high\_voltage; the nominal\_output\_high\_voltage for the level converters in our studies is 1.2 V, while for the inverter being studied for comparison here, it is 0.6 V). Fig. 6(a) shows this setup. We compared the DCVS and STR5 level converters to an inverter (with similar input and output capacitance) and observed that the circuit robustness of these circuits compares closely to standard logic

#### TABLE IV CIRCUIT ROBUSTNESS ANALYSIS

[VDDL = 0.6 V; VTHLP/VTHLN = -0.09 V/0.11 V] (A) THE FAILURE VOLTAGE OF THE CIRCUITS IS FOR BOTH POLARITIES OF NOISE GLITCHES AT THE INPUT (POSITIVE GLITCH STARTING AND SETTLING AT 0 V AND NEGATIVE GLITCH STARTING AND SETTLING AT VDDL).
(B) THE FAILURE COUPLING CAPACITANCE IS TABULATED BELOW FOR BOTH SWING DIRECTIONS (VDDH TO 0 AND 0 TO VDDH) OF THE AGGRESSOR. (c) THE ANALYSIS IN (A) ABOVE IS REPEATED IN THE PRESENCE OF +10% VDDL AND VDDH VARIATION

|                                                       | (a)      |        |          |
|-------------------------------------------------------|----------|--------|----------|
| Glitch Type                                           | INVERTER | DCVS   | STR5     |
| Positive-going<br>(higher value means<br>more robust) | 0.48V    | 0.53V  | 0.53V    |
| Negative-going<br>(lower value means<br>more robust)  | 0.06V    | 0.14V  | 0.14V    |
|                                                       | (b)      |        |          |
| Aggressor Swing<br>Direction                          | INVERTER | DCV    | S STR5   |
| VDDH to 0<br>(higher value means<br>more robust)      | 6.5fF    | 5.8fF  | 5 8.5fF  |
| 0 to VDDH<br>(higher value means<br>more robust)      | 9.4fF    | 11.2fl | F 10.8fF |
|                                                       | (c)      |        |          |
| Glitch Type                                           | INVERTER | DCVS   | STR5     |
| Positive-going<br>(higher value means<br>more robust) | 0.51V    | 0.56V  | 0.56V    |
| Negative-going<br>(lower value means<br>more robust)  | 0.09V    | 0.18V  | 0.18V    |

gates such as inverters. Table IV(a) reports our results for this study. Here we have only reported numbers for STR5, since STR5 is expected to be more susceptible to noise among the PG based LCs because of the raised pass transistor voltage. Robustness of PG, STR1, STR2, STR3, STR4, and STR6 is expected to be comparable or better than that for STR5.

We also studied a scenario where the level converter is a part of a larger dynamic circuit [Fig. 6(b)]. The input of the circuit under test acts as the victim line (a dynamic node with a weak keeper) and a capacitively coupled aggressor (operating at VDDH) is considered as the coupling noise source. For a fixed ground capacitance of the victim line (10 fF), the coupling capacitance was increased until the circuit failed. Table IV(b) summarizes our results for this study. The capacitance reported in the table is the coupling capacitance at which the circuit failed. A higher capacitance thus implies superior robustness. Under this scenario too, we found the level converters to be at least as robust as the inverter (i.e., required a larger amount of coupling capacitance and hence coupled noise).

The scenario described by Fig. 6(a) was also examined in the presence of +10% dc supply noise on both VDDH and VDDL to test the circuits under even more aggravated noise conditions. Table IVc reports results for this study. Again, we observe that the level converters are comparable in robustness to the inverter.

#### D. Level Converter Area

Table V compares the areas (calculated using total transistor width) of the level converter circuits for the VDDL = 0.8V, VTHLP/VTHLN = -0.21 V/0.23 V case. All new level converters, except STR5 and STR6, have less total device width than DCVS. As compared to PG, STR1, and STR2 have comparable area, while STR3, STR4, STR5, and STR6 have higher area. The sharp rise in the area of STR5 and STR6 is due to the added buffer capacitance, Cbuf, of approximately 0.5  $\mu m^2$  (estimated for the 8 fF Cbuf based on the use of a gate-oxide capacitor). It is important to note that PG, STR1, STR5, and STR6 have the added advantage that they have no pMOS devices with sources tied to VDDL, thus allowing for a single n-well in the cell. This is in contrast to DCVS, STR2, 3, and 4, which require this added N-well spacing due to the inverters at their inputs. This will ameliorate the area penalty of STR5 and STR6 in particular, such that the relative reported areas are very conservative for STR1, STR5, and STR6 (and also PG).

# *E. Impact of Level Converter Performance on System Level Power Dissipation*

To investigate how the new faster and lower energy LCs will benefit overall chip-level (or system) power consumption, we first point to empirical results reported in [3], [22], in which level converters in earlier multi-V<sub>DD</sub> designs (using ECVS) contributed to an 8%-10% power overhead. Extrapolating from this data point, a LC with identical delay properties and 40% lower energy could reduce overall system power by 4%. In [3], DCVS-based level converters were used; in this case the energy reduction due to the newly proposed LCs can approach 55% and the system power reduction could be as much as 5.5%. Alternatively, the proposed LCs, by virtue of their speed improvement, could save timing slack and thus allow a greater portion of the logic cells to be assigned to VDDL. Without implementing a complete ECVS infrastructure it is difficult to precisely determine how much impact a faster or lower power LC will have.

To address this, the authors of [23] extended their tool to implement an ECVS approach, allowing us to study the system-level impact of the level converters for various ISCAS85 benchmark circuits [24]. This optimization tool is a linear programming based approach that minimizes total

TABLE V Level Converter Area

| LC   | AREA<br>(μm²) |
|------|---------------|
| DCVS | 0.91          |
| PG   | 0.61          |
| STR1 | 0.67          |
| STR2 | 0.67          |
| STR3 | 0.82          |
| STR4 | 0.81          |
| STR5 | 1.27          |
| STR6 | 1.33          |



Fig. 7. Sensitivity of total power savings to LC delay and energy for c880 and c1908 benchmarks. Reported power saving is with respect to the initial single  $V_{DD}$  optimized design.

(static + dynamic) power under delay constraints (i.e., holds the circuit's total delay fixed) using dual supplies, dual threshold voltages, and gate resizing simultaneously (while considering



Fig. 8. Concept of embedded logic level converting circuits.



Fig. 9. Level converting NAND circuit. (a) DCVS implementation. (b) STR1 implementation.

the level converter power–delay penalties). The sensitivity of the system level power dissipation to level converter performance was obtained by sweeping the level converter power and delay overheads. Fig. 7 shows this sensitivity for two sample circuits (c1908 and c880) [VDDH = 1.3 V, VDDL = 0.84 V].<sup>1</sup> The reported power saving in this figure is the power saved compared to an initially optimized design that uses a single  $V_{DD}$ . The level converter energy and delay were swept staying within the improvement levels promised by the proposed level converters. The energy and delay are shown normalized with respect to the DCVS level converter in this figure. Thus, in this figure, the data point corresponding to a normalized LC energy and LC delay of 1.0 represents the design obtained when the DCVS level converter is used.

Given a maximum expected improvement of 50% energy at a fixed delay, the total system power can be reduced by 3.8% [e.g., moving from a normalized LC energy of 1.0 to 0.5 in Fig. 7(a)]. Using the STR6 topology, which provides  $\sim 20\%$ faster speeds than traditional LCs at constant energy, the total system power can be cut by up to 7.3% [e.g., moving from a normalized LC delay of 1.0 to 0.8 in Fig. 7(b)]. These results also support the position that faster LCs are potentially more beneficial to total system power than are lower energy designs. Furthermore, these appreciable total system power reductions do not incur any tradeoffs as the new LCs can simply be used in a standard cell library to replace the traditional LC designs.

<sup>1</sup>Although, the VDDH used here is different from the remainder of the paper, we expect the general trends to be unaffected by this small discrepancy.

#### IV. EMBEDDED LOGIC LEVEL CONVERTING CIRCUITS

#### A. Concept

Fig. 8 shows the possible VDDL-driven/VDDH-driven cell placement scenarios that may arise in a dual  $V_{DD}$  system. In Fig. 8(a), level conversion is done at the output of the logic gate (NAND in this case) before the next gate is fed. The level conversion is implemented by a dedicated LC as seen in the figure. Fig. 8(b) shows the case where level conversion is done prior to the logic gate. Both these scenarios will be nonoptimal from the standpoint of energy as well as delay as the level conversion will impose penalties on both these metrics. In Fig. 8(c), we see that the circuit acts as a logic gate as well as a level converter. It thus behaves as a "level converting NAND gate" as depicted by the signal waveforms. The level converting functionality of such embedded logic circuits can be built using any of the circuits we discussed earlier. We studied the performance of such a level converting 2-input NAND using the standard DCVS level converter and the level converter STR1. The circuits that actually implement this level converting NAND functionality are shown in Fig. 9.

#### B. Simulation Results

We studied the performance of these circuits with the value of VDDL set to 0.6 V and 0.8 V and the value of VTHLP/VTHLN set at -0.09 V/0.11 V. All three configurations shown in Fig. 8 were optimized using the HSPICE optimizer to draw comparisons among the three circuits.



Fig. 10. Energy versus delay characteristics for embedded NAND configurations compared to traditional implementations.

Fig. 10 shows the energy-delay design space of the embedded logic circuits of Fig. 9(a) and (b). The leftmost data point for each curve reflects the energy-delay values for the fastest possible delays of each circuit.

These plots show that appreciable gains in both energy and delay are possible by the use of the embedded logic circuits especially at higher values of VDDL. The NAND-LC configurations are considerably slower than the corresponding LC-NAND as well as the embedded NAND configurations for both types of level converters. This degradation in performance happens because of the combination of two logic stages plus slow stacked nMOS devices when operating at low voltage. The embedded NAND structure also has stacked nMOS devices at low voltages but there is only one logic stage.

At 0.8 V [Fig. 10(a)], the embedded STR1 circuit is 4% faster with 55% lower energy consumption than the embedded NAND DCVS structure. The embedded STR1 circuit is the best choice at VDDL = 0.8 V from the standpoint of speed as well as energy. The embedded DCVS circuit is 17% faster than the corresponding LC-NAND circuit, but at 56% higher energy. However, at the same energy (~ 55 fJ) it is still 10% faster. Thus, it is desirable to use embedded logic structures for level conversion whenever possible at this value of VDDL. In comparison, the embedded STR1 circuit is 15% faster than the corresponding LC-NAND circuit at the same energy (~ 45 fJ)

At 0.6 V [Fig. 10(b)], the LC-NAND (STR1) circuit outperforms the embedded circuits. This configuration is 8% faster than the corresponding DCVS circuit at the same energy (~ 47 fJ), due to the superiority of STR1. In contrast to the VDDL = 0.8 V case, the embedded DCVS circuit is the faster circuit among the two embedded circuits. It is 8% faster than the embedded STR1 circuit at a 32% energy penalty. Most existing multi- $V_{DD}$  designs use a VDDL value of ~ 70% of VDDH, similar to the 0.8 V case in our analyses [16], [25], [26]. However, recent work has shown that total power is minimized by using a very low second supply voltage of roughly half of VDDH or 0.6 V in our studies [2], [9]. Our results imply that the usefulness of embedded logic level converters is a strong function of the voltage levels used in a particular multi  $V_{DD}$  design.

We point out that for the 0.6 V study [Fig. 10(b)], while the embedded DCVS circuit is faster than the corresponding LC-NAND circuit, the contrary holds for the STR1 circuits. This is due to the fact that the DCVS embedded circuit has an output buffer while the STR1 embedded circuit does not. We have used these specific circuit structures in order to perform a comparison for the NAND logic functionality (precluding the possibility of adding an output buffer to the STR1 case). The absence of the buffer results in a high load on the output node [node "out" in Fig. 9(b)] for the STR1 circuit which considerably slows it down (a high load on this node results in bigger pull-down and pull-up stacks aggravating the problem of contention). The DCVS embedded circuit, on the other hand, has an output buffer that reduces the load on the analogous node (the buffer's input), thus resulting in superior performance. This behavior is aggravated by the fact that the pull-down stack consists of two nMOS transistors in series, with their gates at 0.6 V thus leading to a gate overdrive ( $V_{DD}$ -VTH) of only 0.6 V - 0.11 V = 0.49 V (this overdrive is further reduced due to the body effect). The nMOS stack also explains why the embedded DCVS circuit is 16% faster than the corresponding LC-NAND configuration for VDDL = 0.8 V, while being only 5% faster at VDDL = 0.6 V. In comparison, the LC-NAND circuits do not have VDDL-driven nMOS stacks. With further voltage and process scaling, we expect these observations to become more pronounced. This is mainly due to the strongly reduced gate overdrive (because of leakage-imposed limitations on VTH scaling).

# V. CONCLUSIONS

In this paper, we presented six new asynchronous level converters that are 10%–61% more energy efficient than previously presented level converters. We described a level converter that can be designed up to 25% faster than existing level converters. The superiority of the new level converters compared to those previously proposed is enhanced when the second power supply is aggressively scaled (e.g., 0.6 V in a 130 nm process) and gate overdrive is small which is the case in sub-1 V technologies. The impact on total system power of these improved level converters is estimated using an ECVS-based algorithm; we find that improving delay by 20% can reduce total power by up to 7.3% while an energy savings of 50% per LC yields savings of 3.8%. Our study also indicates that the delay of an asynchronous level converter can be reduced to less than two FO4 delays of the technology being used. Finally, we described a new class of circuits that embed the functionality of standard logic gates into the level converting circuit. These circuits promise major improvements in performance (15% faster or 30% lower energy) and help mitigate the cost of level conversion in multi  $V_{DD}$  designs.

# ACKNOWLEDGMENT

The authors wish to acknowledge the helpful discussions with V. Sathe of the University of Michigan, Ann Arbor, regarding the operation and optimization of circuit STR5. We also thank D. Chinnery from the University of California, Berkeley, for generating the data for the impact of level converters on system-level power dissipation.

#### REFERENCES

- K. Usami and M. Horowitz, "Clustered voltage scaling technique for low-power design," in *Proc. Int. Symp. Low-Power Electronics Design*, 1995, pp. 3–8.
- [2] C. Chen, A. Srivastava, and M. Sarrafzadeh, "On gate level power optimization using dual-supply voltages," *IEEE Trans. VLSI Syst.*, vol. 9, pp. 616–629, Oct. 2001.
- [3] K. Usami, M. Igarashi, F. Minami, M. Ishikawa, M. Ichida, and K. Nogami, "Automated low-power technique exploiting multiple supply voltages applied to a media processor," *IEEE J. Solid-State Circuits*, pp. 463–472, Mar. 1998.
- [4] R. Puri, L. Stok, J. Cohn, D. Kung, D. Pan, D. Sylvester, A. Srivastava, and S. Kulkarni, "Pushing ASIC performance in a power envelope," in *Proc. Design Automation Conf.*, 2003, pp. 788–793.
- [5] N. Rohrer *et al.*, "A 480 MHz RISC microprocessor in a 0.12-μm L<sub>eff</sub> CMOS technology with copper interconnects," in *Proc. Int. Solid-State Circuits Conf.*, 1998, pp. 240–241.
- [6] S. Sirichotiyakul, T. Edwards, C. Oh, J. Zuo, A. Dharchoudhury, R. Panda, and D. Blaauw, "Stand-by power minimization through simultaneous threshold voltage selection and circuit sizing," in *Proc. Design Automation Conf.*, 1999, pp. 436–441.
- [7] Q. Wang and S. Vrudhula, "Algorithms for minimizing standby power in deep submicron, dual–V<sub>t</sub> CMOS circuits," *IEEE Trans. Computer-Aided Design*, vol. 21, pp. 306–318, Mar., 2002.
- [8] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, "1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS," *IEEE J. Solid-State Circuits*, pp. 847–854, Aug. 1995.
- [9] A. Srivastava and D. Sylvester, "Minimizing total power by simultaneous V<sub>dd</sub>/Vth assignment," in *Proc. Asia South Pacific Design Automation Conf.*, 2003, pp. 400–403.
- [10] R. Ho, K. Mai, and M. Horowitz, "The future of wires," *Proc. IEEE*, vol. 89, pp. 490–504, Apr. 2001.
- [11] Avant! Star-Hspice Manual, 2001.2 ed., 2001.
- [12] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, "Comparative delay and energy of single edge-triggered & dual edgetriggered pulsed flip-flops for high-performance microprocessors," in *Proc. Int. Symp. Low-Power Electronics Design*, 2001, pp. 147–152.
- [13] F. Ishihara, F. Sheikh, and B. Nikolic, "Level conversion for dual-supply systems," in *Proc. Int. Symp. Low-Power Electronics Design*, Aug. 2003, pp. 164–167.
- [14] M. Hamada et al., "A top-down low power design technique using clustered voltage scaling with variable supply-voltage scheme," in Proc. Custom Integrated Circuits Conf., 1998, pp. 495–498.
- [15] "International Technology Roadmap for Semiconductors,", 2001.
- [16] M. Hamada, Y. Ootaguro, and T. Kuroda, "Utilizing surplus timing for power reduction," in *Proc. Custom Integrated Circuits Conf.*, 2001, pp. 89–92.
- [17] R. Gonzalez and M. Horowitz, "Energy dissipation in general purpose microprocessors," *IEEE J. Solid-State Circuits*, vol. 31, pp. 1277–1284, Sept. 1996.
- [18] K. Takeuchi and T. Mogami, "A new multiple transistor parameter design methodology for high speed low power SoC's," in *Proc. Int. Electron Devices Meeting*, 2001, pp. 22.6.1–22.6.4.
- [19] V. Zyuban and P. Strenski, "Unified methodology for resolving powerperformance tradeoffs at the microarchitectural and circuit levels," in *Proc. Int. Symp. Low-Power Electronics Design*, 2002, pp. 166–171.
- [20] D. Chinnery and K. Keutzer, *Closing the Gap Between ASIC and Custom*. Norwell, MA: Kluwer, 2002.

- [21] K. Bernstein, K. Carrig, C. Durham, P. Hansen, D. Hogenmiller, E. Nowak, and N. Rohrer, *High Speed CMOS Design Styles*. Norwell, MA: Kluwer, 1998.
- [22] S. H. Kulkarni, A. Srivastava, and D. Sylvester, "A new algorithm for improved V<sub>DD</sub> assignment in low power dual V<sub>DD</sub> systems," in *Proc. Int. Symp. Low-Power Electronics Design*, Newport Beach, CA, 2004.
- [23] D. Nguyen, A. Davare, M. Orshansky, D. Chinnery, B. Thompson, and K. Keutzer, "Minimization of dynamic and static power through joint assignment of threshold voltages and sizing optimization," in *Proc. Int. Symp. Low-Power Electronics Design*, 2003, pp. 158–163.
- [24] F. Brglez and H. Fujiwara, "A neural netlist of 10 combinational benchmark circuits and a target translator in fortran," in *Proc. Int. Symp. Circuits and Systems*, May 1985, pp. 695–698.
- [25] M. Takahashi *et al.*, "A 60-mW MPEG4 video codec using clustered voltage scaling with variable supply-voltage scheme," *IEEE Journal of Solid-State Circuits*, vol. 33, pp. 1772–1780, Nov. 1998.
- [26] T. Kuroda and M. Hamada, "Low-power CMOS digital design with dual embedded adaptive power supplies," *IEEE J. Solid-State Circuits*, vol. 25, pp. 652–655, Apr. 2000.



**Sarvesh H. Kulkarni** (S'01) received the B.Tech. degree in electrical engineering and the M.Tech. degree in microelectronics from the Indian Institute of Technology, Bombay, India, in 2001, and the M.S. degree in electrical engineering from the University of Michigan, Ann Arbor, MI, in 2003.

Since 2001, he has been a Research Assistant with the VLSI Design–Automation Laboratory at the University of Michigan, where he is currently working toward the Ph.D. degree in electrical engineering. In winter 2004, he was with the High-Per-

formance Circuits Research Group at Intel Corporation's Circuit Research Laboratory, Hillsboro, OR, where he worked on high-performance arithmetic circuits and interconnect signaling circuit techniques. His current research interests include circuit design and algorithmic approaches for low-power VLSI design.



**Dennis Sylvester** (S'95–M'00) received the B.S. degree in electrical engineering, summa cum laude, from the University of Michigan, Ann Arbor, in 1995, and the M.S. and Ph.D. degrees in electrical engineering from University of California, Berkeley, in 1997 and 1999, respectively.

From 1996 to 1998, he was with Hewlett-Packard Laboratories, Palo Alto, CA. After working as a Senior R&D Engineer in the Advanced Technology Group of Synopsys, Mountain View, CA, he is now an Assistant Professor of Electrical Engineering at

the University of Michigan, Ann Arbor. He has published numerous articles in his field of research, which includes the modeling, characterization, and analysis of on-chip interconnect, low-power circuit design and design automation techniques, and variability-aware circuit approaches.

Dr. Sylvester's doctoral dissertation research was recognized with the 2000 David J. Sakrison Memorial Prize as the most outstanding research in the UC-Berkeley Electronics Engineering and Computer Science department. He received an NSF CAREER award, the 2000 Beatrice Winner Award at ISSCC, two outstanding research presentation awards from the Semiconductor Research Corporation, and a best student paper award at the 1997 International Semiconductor Device Research Symposium. He is the recipient of the 2003 ACM SIGDA Outstanding New Faculty Award and the 1938E Award for teaching and mentoring, which is the highest award given to a junior faculty in the Michigan College of Engineering. He has served on the technical program committee of numerous design automation and circuit design conferences and was general chair for the 2003 ACM/IEEE System-Level Interconnect Prediction (SLIP) Workshop. In addition, he helps to define the circuit and physical design roadmap as a member of the International Technology Roadmap for Semiconductors (ITRS) U.S. Design Technology Working Group. He is a member of ACM, American Society of Engineering Education, and Eta Kappa Nu.