<processor> An implementation or the Advanced RISC Machine microprocessor
architecture using the micropipeline design style. In April 1994 the Amulet
group in the Computer Science department of Manchester University took delivery
of the AMULET1 microprocessor. This was their first large scale asynchronous
circuit and the world's first implementation of a commercial microprocessor
architecture (ARM) in asynchronous logic.
Work was begun at the end of 1990 and the design despatched for fabrication in
February 1993. The primary intent was to demonstrate that an asynchronous
microprocessor can consume less power than a synchronous design.
The design incorporates a number of concurrent units which cooperate to give
instruction level compatibility with the existing synchronous part. These
include an Address unit, which autonomously generates instruction fetch requests
and interleaves (nondeterministically) data requests from the Execution unit; a
Register file which supplies operands, queues write destinations and handles
data dependencies; an Execution unit which includes a multiplier, a shifter and
an ALU with data-dependent delay; a Data interface which performs byte
extraction and alignment and includes an instruction prefetch buffer, and a
control path which performs instruction decode. These units only synchronise to
The design demonstrates that all the usual problems of processor design can be
solved in this asynchronous framework: backward instruction set compatibility,
interrupts and exact exceptions for memory faults are all covered. It also
demonstrates some unusual behaviour, for instance nondeterministic prefetch
depth beyond a branch instruction (though the instructions which actually get
executed are, of course, deterministic). There are some unusual problems for
compiler optimisation, as the metric which must be used to compare alternative
code sequences is continuous rather than discrete, and the nondeterminism in
external behaviour must also be taken into account.
The chip was designed using a mixture of custom datapath and compiled control
logic elements, as was the synchronous ARM. The fabrication technology is the
same as that used for one version of the synchronous part, reducing the number
of variables when comparing the two parts.
Two silicon implementations have been received and preliminary measurements have
been taken from these. The first is a 0.7um process and has achieved about 28
kDhrystones running the standard benchmark program. The other is a 1 um
implementation and achieves about 20 kDhrystones. For the faster of the parts
this is equivalent to a synchronous ARM6 clocked at around 20MHz; in the case of
AMULET1 it is likely that this speed is limited by the memory system cycle time
(just over 50ns) rather than the processor chip itself.
A fair comparison of devices at the same geometries gives the AMULET1
performance as about 70% of that of an ARM6 running at 20MHz. Its power
consumption is very similar to that of the ARM6; the AMULET1 therefore delivers
about 80 MIPS/W (compared with around 120 from a 20MHz ARM6). Multiplication is
several times faster on the AMULET1 owing to the inclusion of a specialised
asynchronous multiplier. This performance is reasonable considering that the
AMULET1 is a first generation part, whereas the synchronous ARM has undergone
several design iterations. AMULET2 (currently under development) is expected to
be three times faster than AMULET1 - 120 kdhrystones - and use less power.
The macrocell size (without pad ring) is 5.5 mm by 4.5 mm on a 1 micron CMOS
process, which is about twice the area of the synchronous part. Some of the
increase can be attributed to the more sophisticated organisation of the new
part: it has a deeper pipeline than the clocked version and it supports multiple
outstanding memory requests; there is also specialised circuitry to increase the
multiplication speed. Although there is undoubtedly some overhead attributable
to the asynchronous control logic, this is estimated to be closer to 20% than to
the 100% suggested by the direct comparison.
AMULET1 is code compatible with ARM6 and is so is capable of running existing
binaries without modification. The implementation also includes features such as
interrupts and memory aborts.
The work was part of a broad ESPRIT funded investigation into low-power
technologies within the European Open Microprocessor systems Initiative (OMI)
programme, where there is interest in low-power techniques both for portable
equipment and (in the longer term) to alleviate the problems of the increasingly
high dissipation of high-performance chips. This initial investigation into the
role asynchronous logic might play has now demonstrated that asynchronous
techniques can be applied to problems of the scale of a complete microprocessor.
AMPPL-II « AMS « AMTRAN « Amulet » an »
analog » Analog Hardware Design Language