Day 11 - Evolution of intelligent systems + Neuromorphic Opera!

Today's discussion with Catherine Schuman, Dániel Barabási and Sebastian Risi was about the process that gave rise to the neural mechanisms that our community is so fond of: evolution.

The first speaker is Catherine Schuman. Catherine talked to us on Day 8 about her projects on optimizing robots with neuromorphic hardware. Today she will give us some more insights into how you can use evolutionary algorithms to solve these kinds of problems.

She first drew a block diagram of the main components of evolutionary search: initialization, fitness, selection, and reproduction.

(Top) Catherine's block diagram of an evolutionary process (Below) Classification of evolutionary algorithms


You initialize a population of solutions, you evaluate them on some tasks to compute their fitness, then you use some criterion to select who will reproduce and, during reproduction, you may implement cross-over and mutations. Not all algorithms use all these components but these are all the components you may find in all algorithms.

Then, she drew a table with different ways of categorizing algorithms based on two decision choices:

  • do they optimize the weights or the network structure?
  • do they optimize the parameters they are interested in directly or some indirect encoding that will then lead to them?


We, then, went through the cells of this table.

If you are on the top left box, then your genome is a list of synaptic values.

Catherine said that if that's what you are doing then you might as well evolve the structure. This will bring you to the box on the bottom, where your genome has the form of a graph.

Then, we had some questions from the audience.

What are the scales of solutions we are talking about? Catherine said you should expect not more than 500 to 1000 neurons and that there are two to three synapses per neuron.

And is there any class of problems we cannot express with a fitness function?  Catherine that it's a very general framework: you can map supervised and reinforcement learning to it.

What about evolutionary strategies? This is a family of algorithms proposed recently that a lot of people in deep learning are using, as they perform on par with state-of-the-art deep learning methods. This algorithm is pretty simple conceptually, lacking for example cross-overs during reproduction. To this, Catherine said that she believes that cross-over is a useful component that enables making big jumps in search space, while mutations are useful for local search.

We, then, moved to the second column: indirect algorithms. By this, we mean that we do not directly the object of interest but some rule. For example, you can optimize learning rules for updating the weights, as they do in Hebbian learning, or you can optimize a developmental algorithm that will grow a structure. 

Evolutionary algorithms are expensive: you are walking around in a high-dimensional space without using derivatives. This can be useful if you have a system that is not differentiable. Catherine's word of wisdom is that "evolution is always the second best solution". This is because there is probably a more efficient solution but evolution can eventually find the way.

Catherine said that she finds the bottom right box of her table the most exciting: evolving developmental rules for growing neural networks. An example algorithm is Hyper-NEAT, an algorithm that evolves the weights of neural networks using an indirect encoding called Compositional Pattern Producing Networks (CPPNs).

Despite its scary name, this encoding is quite intuitive. You position the neurons on a grid so that each has a spatial location (x,y) and, in order to predict the weight of a connection you give as input the location of two neurons (so four scalar values) as input to a neural network (called the CPPN) and get a single scalar output. The trick is that this CPPN has activation functions that induce modularity and other architectural priors. So it can be seen as an abstract model of development.  


 From the original Hyper-NEAT paper: how we position the nodes of a neural network (left) to give the coordinates as input to the CPPN and predict the weight value of each connection

We, then, discussed the pros and cons of evolutionary approaches (compared to the alternative of gradient-based approaches). Some of the pros we mentioned are that they are naturally parallelizable, they can be multi-objective (Catherine-s favorite algorithm for this is NSGA-2), and they can be implemented in hardware relatively easy. A disadvantage (that has haunted Catherine in her projects) is that they are very good at exploiting flaws in your design (e.g. defining the fitness function, and designing the environment). Perhaps their biggest disadvantage is that they require many samples compared to methods that leverage gradients.


Some of the pros and cons of evolutionary algorithms

Then Katherine shared an anecdote from her experience with how evolutionary algorithms can cheat: they were trying to solve a classification task with a balanced dataset and were evolving the architecture of the network (this includes its size). The algorithm came up with an empty network that had 50% accuracy ...

Why do people not use evolutionary algorithms so much but prefer gradient-based methods? Perhaps the reason is that they do not scale well with the problem size. In contrast, reinforcement learning is widely used even though it is also sample inefficient.

*****

We, then, moved to Sebastian Risi from the IT University of Copenhagen. Sebastian has been working with evolutionary strategies for many years but he uses them in a rather alternative way: instead of searching for the optimal solution to a specific problem he is interested in the ability of these algorithms to discover diverse solutions that remind us of the open-endedness of biological life. Sebastian is fascinated by the fact that biological evolution gave us this universe in a single run while most artificial algorithms, based on evolution and reinforcement learning, can generate very limited diversity and complexity.

He started off by circling the "creativity" pro and adding a con in Catherine's table: local optima.

He then took us on a tour of the open-ended algorithms he has studied, starting with the drawing of a maze:

(Top) The maze, with x being the target location, the center of the smudge being the agent and lines showing the walls of the maze (Bottom) The t-maze, a classical problem for studying lifetime plasticity

The agent wants to reach the target needing to navigate around the walls. How would you reward the agent? Someone from the audience said "Random search". This was not the answer Sebastian was waiting for. Random search would eventually find the target but that could take a while. Then someone else said: "distance to target". That's indeed the most common fitness function that people use in navigation tasks. But,  in a maze, it is quite sub-optimal: the agent would keep bumping into the wall separating it from the target forever. This is what Sebastian calls a deceptive task.

 He, then, mentioned Picbreeder,  an evolutionary algorithm that his PhD supervisor Kenneth Stanley developed that has by now acquired sort of a cult value in the Artificial Life community. Picbreeder is an example of interactive evolution, where you put a human in the loop of evolutionary optimization in order to capture a fitness function that is hard to provide a closed-form definition for creativity. This system basically evolves images (encoded using the CPPNs we saw above), then humans choose which pictures they like and the algorithm reproduces the preferred ones. 

Picbreeder evolved some beautiful pictures that resembled complex items such as cars. The interesting bit was that, when they tried to use these pictures as labels in a supervised setting without human feedback, they just could not reproduce them. This means that the fitness function of "human creativity" is hard to capture.

This study gave them the idea of novelty search: instead of optimizing for performance in a certain task, this method optimizes for reaching states that have not been encountered before. In the case of the maze, it would help the agent explore the grid and eventually find the target.

Sebastian emphasized that novelty search may sound a bit like random search but it is much more efficient: for example, if you are optimizing a bipedal walker using it then there are only a few ways of falling down. Once you learn how to stand then the only novelty is learning to move forward.

Florian said that this algorithm sounds a bit like you are spreading tokens in the maze and the agent needs to pick them up. Sebastian said that this is a nice parallel for the maze but not very useful for high-dimensional spaces, where this method is also used.

He, then, talked about another task: the t-maze. Here there is a corridor that has one rewarding element on the left and one on the right. One of them is more rewarding than the other, let's say one gives you a reward of 1 (R) and the other of 0.5 (r). The agent needs to go down the corridor and then make the right turn: the one to the most rewarding element. The trick with this task is that the two elements switch locations at some point (around the middle of the agent's lifetime).

 Solving this task with an evolutionary algorithm can be challenging. A non-adaptive agent (that does not detect when the goal location switches) can find a sub-optimal solution with a relatively good reward: always going to the same place will give you about 50% success. This solution will out-compete a solution that detects the switch but instead of going to R goes to r. This is bad because this solution would have been a good stepping stone to the optimal one. This is an example of how a non-adaptive move can be a local optima for evolution.

Novelty search comes with its limitations. In unconstrained environments (imagine removing the walls of the maze) it would never manage to exploit the optimal solution.

Sebastian, then, moved to another family of algorithms: Quality-Diversity.


 MAP-Elites is a Quality Diversity algorithm

Here we explicitly optimize for diversity by keeping a map of solutions. The map is usually two-dimensional with each axis representing a behavioral dimension (for example the x-axis could be movement of the right leg and the y-axis movement of the left of the robot). At each generation of the evolutionary algorithm, we characterize the behavior of an individual and assign them to the corresponding cell of the map. At the end of the generation we only keep the fittest individual in each cell. In this way, we are optimizing both for fitness and diversity.

Are there any cool examples of applying this algorithm to a robot? Sebastian mentioned the work Robots that can adapt like animals that appeared in Nature in 2015 which showed that using this method a robot can recover from damages, such as losing a leg. As it has a diversity of solutions, it can fetch that works best despite any unexpected events.

We closed the discussion with a perspective on how considering diverse environments can have the same effect with Quality-Diversity: each niche in the environment can be seen as one of the cells in the map. Sebastian mentioned that a related work is POET, an algorithm that co-evolves agents (in particular bipedal walkers) and environments (different kinds of terrain).


*****

The next speaker was Dániel Barabási who talked about Neural Architecture Search (NAS). This sub-field of machine learning has been concerned with finding better architectures for neural networks. Architectures have experienced an evolution: we went from multi-layer perceptrons (MLPs) to Recurrent Neural Networks to LSTMS to the Transformer. But these architectures were found by human engineering and not by NAS). Instead in nature neural architectures have been found by evolution. How did nature do it?

Dániel thinks that an important part of the solution is that nature-created building blocks (such as cortical columns ) that you can then copy and utilize in parallel. How can we implement this in an evolutionary algorithm? He mentioned three different principles.

The first is using neural weight embeddings. By this he means that you do not learn weights directly but some process that maps to them. He, then, drew the equation that describes how his model does this mapping:

W_ij is the weight of the synapse connecting neurons i and j. Then x_i and x_j are the genes of the two neurons and O is a matrix that represents a linear mapping. He, then, drew the diagram of this operation below.
 
To compute the genomic expression of each gene x_i, they positioned them on a two-dimensional grid and applied a Gaussian on them modeled by three variables (mean in x direction, mean in y and variance).
 
When applied on the recognition of images (MNIST) this model found convolutional filters, which is a good indication that the method can infer useful biases for the task.
 
The second principle is having composable tasks, i.e., tasks where solutions are compositions of previous solutions. He, then, turned to the audience saying that there is a lack of such environments. People in the audience mentioned that there are relevant environments in deep learning where agents can manipulate items and combine them, such as OpenAI's Hide-and-seek and Deepmind's Alchemy.

The last principle is having composable, growable networks. In this case it is not just composition that matters but also the ingredients you start with.

Daniel with his model

 

***** 

The last speaker was Dániel Czégel from Arizona State University who said that he will convince that the universe is turtles all the way down. He also said that biology and not physics is the fundamental science of our world. 

Dániel is one of the authors of the Assemply theory paper that created quite a lot of controversy on Twitter for over-selling its contribution (explaining the universe).  Dániel will explain to us its basic parts.

He, first, asked how you can describe the space of all hierarchical designs. He said that the main intuition is that the optimal one is the one that builds structures in the shortest possible way. 

 He, then, drew an example: a sequence of bits. How can we describe how the sequence was composed? The simplest explanation is that you have module 1001, you then copy it to get 6 bits and then you break them into two smaller modules. This forms a tree.


He then said that they are interested in whether you can tell, just by looking at the module, whether it is a result of historical contingency or selection.

Dániel said that they don't have any ideas on how this theory can be useful in practice, so the discussion stayed at an abstract level.

*****

This concluded our series of lectures.  But before deeping our heads into our demos for the next day we had a unique evening performance of Anna Mura's neuromorphic opera performed at the foyer of our hotel. Organized by Anna Mura (the MC), Saray (the human), Ameca (the machine), Stan Kertsjens (piano), Jakub Fil (Ameca coding), Friedrich Wolf (electronic sounds), Paul Verschure (animations), and Andreas Andreou (declaimer in greek toga and LED spiking neuron crown) this performance was  artistic exploration of our relationship with our field.


Short version from Tobi

Full version from Giacomo

The stage was set next to the piano, played by Stan. In the middle, there was a glass case filled with the spiking neuron toys that the BrainScales team led by Yannik Stradmann had built at the workshop to showcase their educational value. These objects flash their LEDs to show spiking and synaptic activity and gave the ambiance of a neuromorphic Christmas tree. On the wall above the scene, there was projections of swimming dolphin patterns reminiscent of cellular automata. 





Ameca, the robot brought by Jens Stuckmeir, Jakob Fil and Fred Wolf all from Waiys robotics was one of the two protagonists. The other was Saray and the opera explored their growing interaction over 4 acts, the realm of human experience and emotions and the ability of humans and robots to participate in it an optimistic future.

It is hard to put what happened into words, so you are better off watching some of the videos that people took during the performance:


The performance concluded with a reading of the poem Ithaka read by Andreas in his Greek bass, only with a coda warning about "BEWARE OF AI" encountered along the journey through life.

One thing that was discussed after the performance is that many found Ameca more agreeable, a bit less artificial after this performance. Despite the many days we had them around in the disco room, it took this special experience to see them from a different perspective.

Spiky Piano, a project by Muhammad Aitsam, is a unique piano that leverages event camera technology to detect small objects. In this innovative project, piano skills take a backseat to creativity. Participants are invited to drop rice grains one at a time into a designated slit. Each grain triggers a distinct piano note, allowing the individual to craft melodies based purely on the timing and sequence of their drops. This showcases the impressive capabilities of DVS cameras in capturing fast-moving, minute objects—something traditional RGB/frame-based cameras struggle with.



What a fun and inspiring day it has been! Looking forward to seeing everyone tomorrow for the final day of the CapoCaccia workshop. 

Let's make it a memorable conclusion!




 





Comments

Popular posts from this blog

Day 2: Sensing in animals and robots - Florian Engert, Gaby Maimon, and Wolfgang Maass

Day 4 - Neuromorphic Circuits: past, present and future - Melika Payvand, Giacomo Indiveri, Johannes Schemmel

Day 12 - Final presentations and demos, goodbye activities