@Florida. Loved your clip, but its relevance eludes me.
@Bloodmarks. Thanks for the link. It is difficult to judge based on blog articles, I hope better documentation of this algorithm will show up soon.
I remember philosophizing one day with a cousin, over life, the universe and everything. It was the time that 800x600 was considered as high quality resolution for a monitor. I then had what I thought was a brilliant epiphany. I told him: imagine a screen of 800x600 pixels, and all the images that could be created with every possible combinations of colors at every pixel. You would have in fact, everything that has ever been, or will be, in those images.
I found out a few days later that somebody else, the Argentinian Borges, had beat me to the idea some 50 years before. But, since they did not have computers at that time, he had used the analogy of a library (he was in fact a librarian
The Library of Babel - Wikipedia, the free encyclopedia ), in which every possible content of a 400 pages volume was to be found.
Object recognition is something like that: any image is just one of this infinite series of possible images. So, algorithms, to remain tractable, have, one way or another, to limit the number of combinations before searching through the remaining possibilities.
Researchers know that as long as the processing capabilities are less than that of a (human) brain, any algorithm will be a palliative to this lack of power.
I think they are right and wrong at the same time. As long as they consider the brain as a computer, they will keep looking in the same direction, and confusing
reaction speed with
processing power.
Let me try to put in in practical terms.
The brain has billions of neurons (
How Many Neurons Are in the Brain?). And every neurons has hundred of connections with other neurons, so that the possible connections are really astronomical. If we could build a similar computing device, the quality of search algorithms would stop being critical. Almost any algorithm would do.
But what do I mean by reaction speed? Like I said, a short while back, many groups of neurons are involved in the perception of a single object. And even though I do not believe the binding problem is a real one, there remains the mystery of how the brain assembles different sensations into unified objects.
In physics/chemistry, H20 gives water, that is an empirical fact. In vision, red + green = yellow. Yellow is easily distinguishable from red and green, and we never think of its composition when we are looking at a yellow object. Why would the brain need to do that? If a green neuron and a red neuron are activated at the same time, we will have a yellow sensation. Let us take this argumentation and interpolate a little. When looking at a multi colored patch, many neurons are activated at the same time, giving us many color sensations. What almost all researchers presuppose, is that those different sensations have to be processed individually, each neuron computing its own reaction, based on the feedback it receives from other neurons.
What if there was no computing involved at all. And that the fact that a neuron is activated is in itself meaningful?
That would mean that every configuration is unique at such. The
melange of sensations when we look at
B is similar, but also different from
B. We do not need to analyze the image into its components. The fact that the vertical side of the character is more pronounced in one case than in another is certainly a distinctive identifying trait. But talking about it and taking it into consideration means that we have already seen it. We could probably train animals to distinguish between these two images.
This approach has often been called
holistic. Gestalt theory is such a holistic approach. It would certainly advocate the idea that we "immediately" see both characters as distinct (which does not mean we have to be able to pinpoint the difference).
So far, nothing really original. But the question is why is it that the brain is able to look at images, or even at movies (25 images a second), and immediately grasp the gist of each scene?
Remember the idea of black boxes? What if, every time we look at an image or a scene, certain black boxes light up, while others are turned off, or remain off? And that any such configuration is what we call the
gist? That is not such a far fetched idea if you think that we consider an object as being the sum of different features. That might also explain how difficult it can be to express some of our ideas. We get them holistically, but expressing them mean that we have to translate those holistic configurations in a sequential series of sounds or letters. Artists know that, and they prefer to paint or draw, rather than to speak about their inspirations. The paint brushes may be also sequential, but they are relatively coarser than sounds or characters.
But I wanted to be practical. I will try again.
Since we cannot have neural networks with the precision of a human brain, we will have to settle for the next best thing. Take an image of let us say 256x256 pixels or less. Let each pixel be connected to at least 2 values, one precise, one a range of values.
We have now a precise translation of the image which can be compared with other translations of other images. And we have an array of 256x256 arrays of values. I realize that the number of elements to be compared is staggering. But what I am interested in is testing the theory first, its practicability comes later.
I think that is how the brain does it, except that is does not take into consideration all possible combinations, but only those that it already knows more or less partially. Which would make it perfectly legitimate to reduce the search domain with the use of wow databases.