Day 4 - Neuromorphic Circuits: past, present and future - Melika Payvand, Giacomo Indiveri, Johannes Schemmel

today's authors: Eleni Nisioti, Muhammad Aitsam, Tobi Delbruck

Florian Engert setting off for morning run, big surf


Today was probably the windiest morning so far, with surfable waves forming at the hotel beach. Quite appropriately in this Day 4 lecture we are going to talk about "riding the wave of Moore's law".

 The lecture is titled "Neuromorphic Circuits: past, present and future" and Giacomo Indiveri from ETH Zurich is talking about the past (which made him realise the passage of time and ruminate over a jacuzzi invitation from his post-doc years in Switzerland).

Giacomo said he will talk about two circuits. The first one is a simple MOSFET transistor: you apply some voltage between the Gate and the Source  and there is a current flow from the Drain to the Source.


If we look at the relationship between the voltage and the current then we get this two-phase relationship we see on the left: there is a first phase where there is a linear relationship and then there is saturation, the transition happens at the threshold

Normally we treat this as a digital system: what is below the threshold is off and what is above is on. 

But why ignore this interesting linear region that gives you large gains as you increase the voltage? 

This is where the neuromorphic comminity comes into the picture, somewhere in the 90s. A long story of anecdotes and papers starts with people trying to leverage silicon for neuromorphic computation. This linear region is reminiscent of how biological neurons scale (albeit Tobi pointed  these electron devices have mobilities 10 million times higher than the mobilities of ions in solution that brains use).


The problem with this area is that, if you have some noise, it will also be amplified, making analogue systems much less robust.

Then he showed another circuit (the diff pair universally used for differential amplifier input stages) with three transistors that is important because, by differentiating two signals is capable of subtracting noise. 

One big limitation with these circuits was that capacitors were very big, so they could only fit a few biophysically realistic neurons on a chip. The size of transistors was continuously decreasing though, which enabled putting more on a chip. Riding Moore's law meant being able to fit more things on a chip and, therefore, do more interesting things. Different big ideas and projects came about, such as the Silicon cortex.

The evolution of this hardware closely tracked that in neuroscience. As we were discovering the importance of synapses, dendrites people were progressively introducing them in neuromorphic hardware.

He then drew a schematic of a neuron on neuromorphic hardware. Its input is a single line on which multiple synapses are placed one after the other. You can add all these current to do the accumulation.

In a classical von Neumann architecture you have a single circuit that is time multiplexed in order to have multiple neurons. Same with biology, here you have a physical neuron for each neuron in your computation. This makes neuromorphic hardware costly as it requires more silicon for the same amount of computation.

Yet there is an advantage in memory consumption that is closely linked to the in-memory computing capacity of these circuits. In a von Neumann architecture, you need to move parameters around to do computation. Flipping bits can be very expensive. To realize the scale, Giacomo talked about the chips as if they had the size of a city: if you move bits in level-1 cache it's like moving one kilometer. In SRAM it's 50 kilometers. on a GPU you are moving 400 kilometers. If you need to transfer to the cloud, you are reaching Jupyter. In contrast, in memory computing hardware does in-memory computation using local circuits including capacitors and nonvolatile memory to store information.

Fabricating neuromorphic hardware is costly. But if you want to have something like a one-layer perceptron to measure on a device it can be a good solution. Complex neural networks are a bit far (but how far are they really? We talked a bit about this later in the Lecture when  Johannes Schemmel talked about the future).

Why do such things as implement spikes if it creates practical problems? One reason could be that it is a better methodology for understanding biological mechanisms. Von Neumann architectures does not have a problem with multiple timescales, but organisms do. Or how does the brain implement backprop?

We then opened into an introspective discussion. Should we stick to silicon or explore other materials? Giacomo said that both should be explored and that although he has been focusing on silicon there are other approaches, such as self-constructing hardware that Rodney Douglas brings up as a prototype for systems that develop from a compressed genetic code. And what is the future for becoming more robust? 1) Using populations of neurons instead of mean firing rate that can be noise 2) using feedback 3) averaging across populations.

 We then moved to the present of neuromorphic hadware with Johannes Schemmel. Looking into the present just requires looking into this room filled with neuromorphic computing experts and enthusiasts.

Johannes began by drawing a simple schematic of a neuron as a leaky capacitor: a resistance and a capacitor connected in parallel. This gives us an analogue signal that can suffer both from noise and from quantization errors when simulated digitally.


 After this we entered a long round table discussing what kind of neuromorphic hardware different people are using at the workshop. We positioned them in a table where we kept track of the features of different tools, such as whether they are analog, digital, and synchronous, or asynchronous.

Name

Analog

Digital

Software

 

asynchronous

synchronous

asynchronous

synchronous

 

Loihi2

 

 

 

x

 

SpiNNaker2

 

 

 

x

X

BrainScale2

x

 

 

 

*

Dynaps

x

 

 

 

 

GeNN

 

 

 

 

X

SynSense (Xylo)

 

 

 

x

 

 

SynSense (Speck)

 

 

X

 

 

Lui

x

 

 

 

 

Lx3D

 

 

 

 

x

Norse

 

 

 

 

x

Aqua sensor

x

 

 

x

 

NEST

 

 

 

 

x

TPU

 

 

 

X

 

Tactile Sensor

x

 

 

x

 

DVS Camera

x

 

 

X

 

This information is being collected to the CapoCaccia24 neuromorphic platforms google sheet.

When we came back from the coffee break, we discussed what patterns stood out in our table. Why is it so imbalanced? Why is there so much hardware?

Johannes then moved to discussing the future, starting off with the disclaimer that it is uncertain. Turning back to our original neuron circuit that Giacomo drew we now look at it with a different eye: before we cared more about what current we get when the transistor is on. But an equally big question is what current it is when it is off (we can see it is not zero). It is the ON/OFF ratio of these two that we want to maximize so that the system power consumption is not dominated by FETs that are off, which is nearly all of them at any one time.

As transistor dimensions shrink to achieve higher density, the channel becomes short which leads to some undesired leakage currents that degrade its performance. This has led to the design of the FINFET and the Gate All Around (GAA) nanosheet and mulitchannel transistors, which are now three-dimensional. The first has a raised channel structure (the fin) and the latter totally engulfs the Source and Drain within the gate. Measuring the channel length is not so obvious in these new types of transistors: to report improvements people extrapolate the length from the performance. The fabrication of these GAA FETs is fiendishly complex and even the largest multinationals share the development and tooling costs via the monopoly suppliers of fabrication equipment like ASML.


Johannes pointed out that there is a misconception that these transistors are not appropriate for analog. They are, there is no change in the underlying transistor IV physics. It's just that there is no support for it yet.

Could you run ChatGPT on a chip with these transistors? Whether you see it is as a thought experiment or a serious question, this was a nice exercise to talk about some numbers.

 GPT-3 has 175e9 parameters. With the FINFETs on 16nm technology using the most compact PWM synaptic weighting scheme sketched below you have 1 megasynapse per square millimetre. We would need 1751k mm^2 to fit this model, so we'd need 3 16mm^2 wafers. So it is not infeasible, but at this stage, it is less cost-effective compared to both GPU (needs 40 times less silicon than using wafers and SRAM) and can use HBM DRAM (needs 4200 mm^k2).



But there is a longer discussion to be had here. Training on GPUs takes GPU-months so more efficient algorithms could actually save cost once the infrastructure is there.

Melika Payvand from the University of Zurich came then on stage to talk us about memory, a crucial aspect of any AI technology including neuromorphic. First, we need to keep in mind that when talking about memory people may meen two different things: storage and dynamics. Melika will focus on storage, which can be volatile (it disappears when you disconnect power) or non-volatile.

 A problem with MOSFET is electron tunneling: even if the transistor is off some electrons may pass the gate causing current leakage. You can solve this by increasing the length but will also increase the area, which is the main cost.

 Ideally, we would like a memory with zero size, cost, and latency and infinite capacity. But due to the cost, we need to find ways of achieving a good trade-off. The current solution is having hierarchies of memory. SRAM, DRAM, SSD, HDD and other types of memory that trade-off cost for efficiency.


When we read a spike from a neuron, we want to be on-chip to do event-based processing. But there is a limited amount of on-chip memory, which can be remedied by: 

  • Doing local computing as much as possible. This requires clever architectural and algorithmic solutions
  • Trying to make SRAM with much more memory by increasing its local density. Riding Moore's law is how we have done it so far but it seems we are at the end of it (not totally, you can still get large gains for example through stacking). We can also have memristors, which sandwich some material between electors to get embedded memory. The principle of embedded memory is that you can change the properties of the material by applying voltage and controlling current flow. However, the memristor requires being serially connected with a transistor in order to access the memory, which increases its cost.

Melika then plotted the evolution of density with time, showing us that there is an exponential improvement over time of all types of memory technology though the densisty of monolithic SRAM is finally flattening out despite the development of so-called 2nm GAA nodes.

There was some discussion at the end on how much memory we really need.

We also talked about an exciting research direction: integrating non-volatile memory (such as NAND) with logic. For people like us, jumping into this before the industry takes over can be a great path (but also a very challenging one as it requires expert analog, digital and material knowledge). 

Tobi shared with us some work in the direction of embedded RRAM that he admires

Johannes pointed out that traditional NAND flash which offer by far the highest memory density with its stacks of hundreds of layers of SiNi trapping sites is not compatible with CMOS (for reasons that are not clear at least to Tobi) but that industry is mainly exploring nonvolatile RRAM and MRAM because strings of NAND flash bits on CMOS are simply too costly in 2D area, thus the drive to increase embedded memory density to enable storing massive programs or static data (e.g. weights for embedded DNNs) on uCs.

The talk left us excited about the cataclysmic developments we expect in the future of hardware, but we were even more surprised by the cataclysmic hail that followed during lunch.


Tomorrow will be another exciting day. 
Till tomorrow goodbye!














 

 


Comments

  1. Wish I could be there... But perhaps we should be looking to more recent neurophysiology, as well as considering novel fabrication techniques, like memristors (e.g. Sarwat, S. G., Moraitis, T., Wright, C. D. & Bhaskaran, H. Chalcogenide optomemristors for multi-factor neuromorphic computation. Nat. Commun. 13, 2247 (2022)). In particular pyramidal neocortical neurons with two sites of integration (see e.g. Adeel, A. et al. Unlocking the potential of two-point cells for  energy-efficient and resilient training of deep nets. Arxiv (2022) doi:10.48550/arxiv.2211.01950.). Time to move on from the simple LIF neurons - based on 75 year old analogues of neurons!

    ReplyDelete

Post a Comment

Popular posts from this blog

Day 2: Sensing in animals and robots - Florian Engert, Gaby Maimon, and Wolfgang Maass

Day 12 - Final presentations and demos, goodbye activities