Schankian Remindings

November 6th, 2012

Steven Pinker is a linguist, and so you might think it natural for him to be coining new terms regularly. But in fact, as he explained in The Stuff of Thought (which i’ve finally finished, and so this may be my last post on it – but don’t count on it), coinages rarely catch on, and so he knows enough not to bother trying. (Consider, courtesy of Rich Hall, ”peppier” n. The waiter at a fancy restaurant whose sole purpose seems to be walking around asking diners if they want ground pepper. How perfect is that? I’ll try to remember to use it, but i probably won’t. If you just heard this for the first time, it’s because no one else did either.)

The truth is, i doubt that Pinker even meant to coin by invoking the term “Schankian Remindings”. As so much else in his books, it probably just sounded clever or appropriate at the time. And sure enough when i ask Google about it, the only real results are references to the book. (Making it not quite a googlenope, a rare term that did actually catch on.)

But i was so intrigued by the idea that this is now my official term for: episodes in which one event reminds us of another. It was coined thus because Roger Schank gave some examples of during a talk. Schank’s main area of interest appears to be learning, and i didn’t find any evidence of him exploring the idea too deeply. Pinker gives about three page over to the topic, which is pretty insignificant in a book of 439 (without references and notes), especially when he justifiably labours over metaphors in general for 44. I’m inclined to rectify this, and i believe the easiest place to start is to reproduce the examples in the book.

Someone told me about an experience of waiting in a long line at the post office and noticing that the person ahead had been waiting all that time to buy one stamp. This reminded me of people who buy a dollar or two of gas in a gas station.

X described how his wife would never make his steak as rare as he liked it. When this was told to Y, it reminded Y of a time, 30 years earlier, when he tried to get his hair cut in a short style in England, and the barber would not cut it as short as he wanted it.

X’s daughter was diving for sand dollars. X pointed out where there were a great many sand dollars, but X’s daughter continued to dive where she was. X asked why. She said that the water was shallower where she was diving. This reminded X of the joke about the drunk who was searching for his keys under the lamppost because the light was better there even though he had lost the keys elsewhere.

While jogging, I was listening to songs on my iPod, which were being selected at random by the “shuffle” function. I kept hitting the “skip” button until I got a song with a suitable tempo. This reminded me of how a baseball pitcher on the mound signals to the catcher at the plate what he intends to pitch: the catcher offers a series of finger-codes for different pitches, and the pitcher shakes his head until he sees the one he intends to use.

While touching up a digital photo on my computer, I tried to burn in a light patch, but that made it conspicuously darker than a neighboring patch, so I burned that patch in, which in turn required me to burn in yet another patch, and so on. This reminded me of sawing bits off the legs of a table to stop it from wobbling.

A colleague said she was impressed by the sophistication of a speaker because she couldn’t understand a part of his talk. Another colleague replied that maybe he just wasn’t a very good speaker. This reminded me of the joke about a Texan who visited his cousin’s ranch in Israel. “You call this a ranch?” he said. “Back in Texas, I can set out in my car in the morning and not reach the other end of my land by sunset.” The Israeli replied, “Yes, I once had a car like that.”

In college, a friend and I once sat through a painful performance by a singer with laryngitis. When she returned from intermission and began the first song of the set, her voice was passably clear. My friend whispered, “When you put the cap back on a dried-out felt pen, it writes again, but not for long.”

Later in the book, Pinker provides another example, although he doesn’t identify it as such. He is speaking about how children’s names go in and out of fashion.

… Then the next generation of parents will react to the [now-common names] by looking for a new new thing… The dynamic is reminiscent of Yogi Berra’s restaurant review: “No one goes there anymore. It’s too crowded.”

Such remindings might not be much to write home about if they were individualistic. If it only made sense to me, it could be explained away as experiential or evidence of mental issues. But each of the above examples, even though i know little or nothing about the people who originally uttered them, is meaningful to me in a way that is either funny or profound or both.

In a non-Schankian way, these remindings called to mind Stan Franklin’s presentation on sparse distributed memory (available here), and in general ways of creating relational memory systems. The problem of knowledge representation has been around since day one, i.e. how do you take the input to a system and store it so that it can be retrieved in useful ways. It’s been well known of decades that humans do not literally store visual input as if the brain were a movie camera. The input is processed through dozens of systems that decompose it into objects and relationships, tags these with labels and multiple types of timestamps, and then submit the result to multiple types of memory systems, in which the content may or may not stick. Memories are probably retrieved by snippets of the content, which causes the matching objects to activate, causing its relationships to activate, which causes the relationships relationships to activate, eventually recomposing the original input, at least in its objectified version.

The question is: what are the bottom-line concepts that all humans understand that allows us to share things like Schankian remindings? We already know that all humans have a very similar sense of 3-dimensional space, and that we all separate humans from non-humans and animals from non-animals. We all have similar understandings of causation, and universally our relationships fall into the categories of sharing, ranking, and trading. So, it’s not far out there to assume that there are basic concepts into which we decompose ideas. And in the same way that we combine a finite set of words to express infinite ideas, the bottom-line concepts can be combined in novel ways to prevent us from falling into conceptual ruts. Also, i believe we can have a finite set of concepts (they could number in the hundreds or thousands) that we can decorate in order to give an idea just the right colour or texture out of a practically infinite set of choices.

So, what are the bottom-line concepts that humans use? If we knew, of course, we’d already be building relational memory systems that behave in ways similar to humans. The reason the Schankian remindings were so intriguing to me is because, i believe, they provide the necessary clues to solving the puzzle. The examples above each provide two scenes that are widely divergent in their sensual content; the ways in which they relate may reveal the ways that the brain most deeply stores the content. This is because it was only by at least one common feature that the one scene can recall the other.

Take the case of the singer with laryngitis. We’d have to reach pretty deep into our trick bag to figure out how a woman reminds us of a felt-tip pen, and even if we did find something i doubt that any rationale would, for most people, pass the giggle test. So what is the similarity? There is an essence of time to the story: the singer took an intermission; the felt pen had its cap put back on for some period. But time isn’t enough. There is also an element of recovery. So, perhaps the basic concept is recovery after a period of time. But then why didn’t the singer remind the hearer of a flu-sufferer, or how a battery recovers it’s charge if you turn off the device for a minute or two?

I think the similarity has to be a bit deeper, or in each scene the core similarity has to be decorated with the same traits. The closer the trait match the better. In the case of the felt pen, the period during which the cap should be left on and the typical time for an intermission are similar. Plus, the length of time before the pen stops writing and the time until the singer’s hoarseness returns is also similar. Taking this second feature into account, perhaps the core concept is fatigue from exertion and recovery after time, something all humans are familiar with. By matching the respective time periods in each scene, we have the basis of our reminding. Even still, there is another decoration: the singer has laryngitis, and the felt pen is dried out. This comparison is not exact, because the singer will presumably recover from the illness, while the pen is done for (unless it is refilled, which no one ever does), but we can still finger infirmness as the concept, something else all humans know well enough.

I’ll go one speculative step further and suggest that the analogy gets better (funnier and more profound) the more the core similarity matches, and the less everything else does. Comparing the singer with laryngitis to a flu-sufferer is calling a rock a stone: the comparison is trite. But comparing her to a dying battery is actually pretty good. The time periods are similar, and the infirmness is pretty much the same as for the pen. Actually, the battery compares better to the pen than either do to the singer, come to think of it.

So, the only question – at least for this example – that remains is: why should fatigue and recovery and infirmness be bottom-line human concepts? I think it’s obvious, but i’m in a pedagogical mood, so i’ll go on. It’s critical for all animals to understand at least their own physical limitations and recovery, and in the case of social animals, prey, and predators that of others as well. We gauge many of our actions and resolve on our stamina vs that of our competitors and collaborators. And it is critical to be able to recognize infirmity as a source of both danger (don’t catch the disease) and opportunity (go after the wounded member of the herd because it will be easier to catch). Now, how these concepts get shaken out of visual input is a big question, but hopefully a more technical one once we agree that that is what we need to do.

Dear reader(s), i would be delighted if you could, 1) provide your own Schankian remindings, and/or 2) provide your analysis for those given here or provided by others. Maybe if there is enough interest in this topic – and i for one can’t imagine how their could not be, but i’ve been wrong before – a web site could be spun up to manage the information.

 

And another reason we don’t have AGI yet

November 1st, 2012

My Iraqi friend (whether he knows it or not) was on TV again tonight. God bless the man (his god or mine or whatever) because i always get an idea or two out of his broadcasts. For the record, his name is Jim Al-Khalili, and this time the show was called Shock and Awe: The History of Electricity. I can’t help but throw the man some props for his choice of show title, him being from Iraq and all, although i’m not sure if he attended the second run of Desert Storm or not.

Anyway, lately i’ve been playing around a bit more with the LIDA framework, which i’ve described in my last post. After several discussions with Ryan McCall (PhD student at U of Memphis and apparently the current primary LIDA maintainer), i’m still very hopeful about the potential of the work. But i do now, i think, have a better handle on where it stands. The framework as it is represents a good implementation of a good theory of cognition. It has at least interfaces for all of the major components, and in many cases concrete implementations of those components. It is an excellent start. Of course, there are components that are really just stubs at the moment, and other components that are very basic, and they will need work before any serious run at AGI (with LIDA) can be made. And even when such a run is made, many of the more complete components will need substantial rework, and it’s pretty much certain that the overall design of the framework will go through many iterations (which, each time, will have major implications on all of the existing components). All of which brought me to the conclusion that there is a lot of work to do. A lot. Like years. Decades even. Maybe lots of decades.

Kind of like electricity. I know that the typical AGI comparison is with human flight, but after watching only a bit of Jim’s show i wonder if electricity is a better one. Flight had all of it naysayers in the face of existential proof and all, and people wondering what the point of it all could be anyway, but after only a bit of research i think the study of electricity provides an analogy that parallels the profound scientific mysteries of the subject.

According to Wikipedia, the earliest recorded recognition of something like electricity was from ancient Egyptian texts that described the “Thunderers of the Nile” – now known as electric fish – back in 2750 BC. The Egyptians either killed all the little buggers off or shrugged their shoulders and stayed out of the water, because there doesn’t seem to be much that happened on the topic for around 4.5 millennia, when William Gilbert in 1600 described the difference between the loadstone effect and rubbing a balloon on his head. If we take this as the real start of electrical research, we need only decide on when we feel we had a decent handle on the topic to determine how long the whole effort took. Maxwell’s work in 1861/62 seems like a reasonable choice except that it was only theoretical. Again, Wikipedia provides great satisfaction to me if only by its choice of words: “… the late 19th century would see the greatest progress in electrical engineering” (my emphasis).

We can then conclude that it took about 300 years to convert electrical study from research into engineering. Readers of my last post will recall that i made a big deal about the difference between treating AGI as research vs engineering. When i first started working with LIDA i had hoped that we were entering the engineering stage, but it appears my hope was premature. There is still a lot of research to do. Those who are frustrated that after over 60 years we still have little to show might take some comfort in knowing that something similarly mysterious, but relatively simple, as electricity took nearly 3 centuries for some very big-headed fellows to get their heads around.

Ray Kurzweil was most likely optimistic in predicting the existence of AGI by 2035. Which sucks, really, because i was very much looking forward to climbing into my Vanilla Sky reality simulation over being spoon fed by R2D2 prototypes in a seniors’ home.

To round off… The other reason we don’t have AGI yet? It’s really, really hard.

The real reasons we don’t have AGI yet

October 22nd, 2012

I was as perturbed as anyone trying to help the AGI effort after reading the piece by David Deutsch with the same name as this post. Like many i thought he made a number of fair points, but many unfair ones as well. In particular, i think it’s odd to seemingly ignore the progress in psychology where people such as Bernard Baars have developed very compelling theories of mind and consciousness, as if to say that such things are not contributing to AGI at all, when in another sense this appears to be exactly what he says is missing. He refers to AGI needing a grand idea, but it’s unlikely that some researcher is going to solve the AGI problem with a shower-time epiphany. More likely we are doing a whole bunch of things wrong, and there will need to be a kind of aligning of the stars to get us going in the right direction.

Personally, i think a great deal of aligning was done by Baars, and Stan Franklin et al at the University of Memphis’ Cognitive Computing Research Group (CCRG) have done the field a great service by implementing Baars’ concepts into the LIDA framework. I’ve had some time recently to play with it a bit, and must say that the work is impressive. As a Java developer i can confirm that the technical architecture is sound and well written. As an AGI researcher i’m excited by the potential, because not only does it embody a compelling theory of consciousness, but it does so using a cognitive cycle, i.e. the temporal stuff that i’ve been banging on about for years now. What is most interesting is that it takes the massive problem of AGI and divides it up into still-very-difficult-but-far-more-managable chunks: modules like sensory memory, workspaces, declarative memory, procedural memory, action selection, etc. The framework wires them all together based upon Baars theories (there are a great number of implementation issues that the CCRG had to solve themselves) and also provides a number of module implementations, such as a slip-net for perceptual associative memory. Some modules are not implemented (like sensory motor memory), and some have very basic implementations that will certainly need to be expanded upon. But still, if you agree with me that the approach makes sense you will probably agree that the work is a significant advancement of the field. Arguably the most important feature of the frameworks is that, as an AGI researcher, your work is now much more defined in a concrete way. If you have an interest in transient episodic memory, for example, you can now look at the API for it, and you will understand the constraints under which your code has to work in order to integrate with the rest of the system, rather than having to build an entire system from scratch.

I have to thank Ben Goertzel for pointing me towards LIDA. And i also recommend reading his well-penned response to Deutsch. And in an attempt to emulate the fine way Ben has of gently placing laurels on the heads of those he’s about to contradict, i must commend him for all of his extensive and tireless efforts – many effective – to forward the much-maligned field of AGI over what must be multiple decades now.

But when he mentions the paper Mapping the Landscape of Human-Level General Intelligence, i think he ended up scoring an own goal. This paper is chock full of all sorts of rephrasings of the Turing Test, and certainly not in any useful way. The gist of it is to mark out a path to AGI by following the psychological development of a child. The authors are careful not to specify an order, but some of the milestones include motor control development such as building block towers, story comprehension and retelling, and school learning. Of course no such paper would be complete without including video game playing – it’s part of every complete childhood. Just to round out the weirdness the Wozniak Test is thrown in too, in which the embodied agent has to knock on the door of an unfamiliar house, be let in, and then go to the kitchen and make a cup of coffee. No 10 year old, artificial or otherwise, has yet knocked on my door to make me a coffee, so although i can’t speak for the realism of the test, i can appreciate the technical difficulty.

But the difficulty is precisely the problem: none of the tasks that are mentioned in the paper – much less the milestones – are even remotely on the engineering horizon for AGI right now. It was all well and good of Turing to provide us with his test back in the 1950s, but here we are still scratching our heads. The existence of his test has not helped a whit, and neither will the well-intentioned but pie in the sky Landscape paper.

Rather, what is needed is a path to AGI that at least begins with goals that are achievable using incremental improvements to the engineering that we have now. The Landscape paper decries this approach as being too tempting to narrowly defined solutions, but i disagree. If we start by saying that solutions must be part of an overall AGI framework such as LIDA (or perhaps OpenCOG, although i don’t know enough about it to say), then researchers will be compelled to at least give a solemn nod towards generality. If we go further and say that qualifying solutions must address at least two tasks of different classes, the benefit of generality will eventually overcome the temptation of narrowness.

We must discard the notion of AGI development goals based upon ontogenical development. A newborn child is not a blank slate on which knowledge and skills are written. There is staggering complexity in the built-in wet ware, the understanding of which we are only beginning to scratch the surface. It’s nonsense to assume we can casually step over this massive knowledge chasm and merely concern ourselves with how we can get a computer to understand a children’s book. Besides, the way in which a child understands a book is a product of the entire brain; there is no “book comprehension” lobe to which we can neatly restrict our work. Without simulating an entire human brain, however young, you will never have human-like comprehension.

Instead, we should be focusing on the phylogenical development of the brain. Tasks should begin with the challenges that the first nervous systems faced. It is well within our current capabilities to build a virtual world that is rich enough to provide a reasonable simulation. (This was the intention behind GoiD, although it never got to where i really wanted it to be, and without help from other likely never will. I should say, “more help”: many thanks to Max Harms for his contributions.) It is reasonable to assume that, as we make the environment more difficult and otherwise add more challenges we will clearly see the reasons why brains evolved the way that they did, and why they work now the way they do. This approach also has the great benefit of providing clear grounding to early ideas, which has the same effect as the adoption of the LIDA framework: you understand the constraints under which you need to make improvements, which turns research problems into engineering problems. Basically, we need to start doing less AGI research, and more AGI engineering in the form of incrementally improving on an agreed-upon general framework.

In a future post i will write some ideas for phylogenically-based tasks. Individually they will sound like fodder for narrowness, but the important point to remember is that the same, programatically-unmodified agent must be able to solve multiple tasks.

Relevancy and common sense

September 21st, 2012

So, it was that time of the year again, and i was watching TV. This time it was a space show hosted by an Iranian fellow with a British accent. Yeah, him. He was talking about how vast the universe is, and how Einstein figured out that gravity bends three-dimensional space into a fourth dimension. There was also talk about how the universe is expanding, which naturally leads (for me anyway) into the question of, expanding into what? Just like back in high school when i thought about this stuff, my brain felt like it was going to hull-breach in multiple places. I can just manage to visualize a 3D lemon in my head and turn it around, roll it, flip it, etc. (This is something a dyslexic – lysdexic? – friend of mine said he couldn’t do. I told him he wasn’t really missing much.) But, as logical as four dimensions is – three plus one, right? How simple is that? – i can’t convince my brain to visualize it.

Turns out this is a common thing: even Stephen Hawking complains of the same inability, and here’s a guy who’s thing pretty much is trying to visualize stuff. What gives? Why is there such an obvious disconnect between our logical facilities that can so easily conceive of a fourth dimension, along with our mathematical knowledge that so easily incorporates it, and our visual facilities which refuse to even entertain the possibility? (If you actually can visualize more than three dimensions, you are not normal, and you should go volunteer for studies at your nearest neurological research institute immediately.)

I submit that the reason is our brains’ built in architecture, which was born and raised over millions of years in a 3D world, and until very recently (very, genetically speaking) had no need of any more dimensions. We are incapable of such visualizations because our visual processing is physically incapable of it. This is unfortunate at the moment, but that’s what evolution is like: if you don’t need it to survive phylogenically, then it’s just taking up valuable space, no pun intended.

I’ll go further and say that these kinds of neurological built-ins are often referred to as “common sense”. Along with our inate conceptions of space and time, we are also born with linguistic structure. All of the languages in the world – i’m told, i don’t know them all – treat verbs not simply as an action, but an action that has an agonist and an antagonist, from which we can deduce causality. But the actors that are the agonist and antagonist can’t just be anything. It’s not the match that lit the campfire; nor was it the heat from the strike nor the oxygen in the air nor the carbohydrates in the kindling nor any other necessary ingredient. All of that stuff was already naturally there or a downstream result, and so is not relevant to casual determination. The casual pathways lead back to the person who struck the match and applied the flame to the kindling. The reason is that, of all of the necessary elements in the scenario, only the person can be influenced either by praise or censure to modify future behaviour. You can’t teach the oxygen to burn, or the match to strike. But you can thank the nice ranger for lighting a safe, toasty warm campfire, and hope that by doing so she will do it again tomorrow night. It’s common sense. Only that which can be changed is relevant to causation. And identifying what can be influenced is very important to the achievement of human goals.

In general it is necessary to be able to hone in on relevant information within the torrent of data that enters the mind. What is relevant depends entirely upon the goals of the agent. Some people look at trees and see shade. Some see fruit, some see wood. Most people do not count the individual leaves, even though they could.

This reveals a problem with AGI approaches that are too general. When presented with pictures of trees, what patterns will it extract? If counting leaves is only one of things it does, then it’s very likely wasting its resources. There is typically only a very small amount of data in the world that is relevant to humans, and our inate common sense structures help us tease it out and attend to it. You might conceive of an AGI with such wide open goals that it includes a requirement to count tree leaves, but i’d suggest that that might not be a practical place to start.

An AGI will require some manner of common sense, e.g. built in pattern recognizers, mechanisms that work on given temporal intervals, DSPs that only work on frequencies in the range of human speech, that sort of thing. The specifics will only depend upon the goals of the AGI, which are also required.

Where’s the learning?

September 12th, 2012

Boris asked in a comment on the previous post how i could be ignoring learning in my AGI implementations. My caveat was that it was temporary, but of course any implementation without learning isn’t going to be much of an AGI. Still, i believe i have good reasons.

In an early post, i made an analogy with the mazes you find on kids’ place mats in family restaurants. Basically, it’s faster to solve the mazes by making at start at both ends because you get a better idea of the overall structure of the maze. In retrospect it’s a terrible analogy because the mazes are sooooo dead easy, but hopefully you get the point.

And the point is that learning is not all that brains do. Sure, it’s important. But i’d argue that learning in any particular limited field – such as language – is asymptotic. Once you are fluent your brain primarily does inference, at least to do with building sounds into words, and words into concepts. Also, in some athletic situations – imagine getting a breakaway in a soccer game – you go on autopilot and simply act. There’s no point in arguing that the brain isn’t learning in some manner during these times, but hopefully you get the point.

The question was, how does this inference/autopilot work? You could go the full Monty and build a system that is hard-coded with known patterns, and that takes an input, does a future projection, references its action options to determine goal optimization, and performs an action. I contend that this system is difficult enough to build without worrying about learning, which is probably a bigger problem than all of this put together. (Astute readers will note that i left out not only learning to facilitate future projections, but also learning how actions influence the environment to facilitate goal optimization, which are related, but still different things.)

In fact, i went even simpler. I built a hierarchical system that takes Morse code as input. Possible signals are dot, dash, letter space, word space, and all other space. Signals are temporal, and noisy both temporally (the input can speed up and slow down) and spatially (dots and dashes are roughly 0.8 and spaces roughly 0.2, but have overlapping Gaussian distributions), plus i also added deliberate mistakes. I hard-coded patterns into the system that would match on the input and produce an output, say 1 to 5, and send it up the hierarchy. The higher level was hard-coded with letter patterns that would match on the 1 to 5 input sequences and produce a character, say 1 – 50. (26 alpha, 10 numeric, and a bunch of punctuation.) A third level converted letter patterns into words. The words were output as they were recognized.

That’s it. There was no motor planning or actuation. All it did was take an input and produce a future projection at multiple hierarchical levels. It worked pretty well, i have to say, but that wasn’t the goal. The goal was to inform me about what the output of learning should be. Before beginning i didn’t precisely know what a “pattern” was, i.e. the specific software implementation in terms of structure and behaviour. When i was done, i knew what patterns were for inference purposes, which didn’t tell me how to build a learning system but at least placed constraints on the design of such a thing. Plus, i solved a lot of implementation issues around hierarchy communication. Overall i’d call the endeavour a great success, relative to other AGI efforts.

It’s certain that bolting in a learning system will be no simple task, as the hierarchy will then also need to be able to work with patterns that appear, change, and possibly disappear as knowledge builds up. But still, reducing the scope of a vast research problem can only be seen as a good thing.

I’m now planning to do roughly the same with a system capable of actuation. The plan is to use someone else’s architecture, like LIDA or CogPrime, mostly to help educate myself on the details of those efforts.

Time and time again

September 9th, 2012

“Do not squander time,” said Benjamin Franklin, “for that is the stuff life is made of.” This quote appears midway through chapter 4 of The Stuff of Thought, by Steven Pinker. Once again i find myself banging on about the importance of basing AGI development on cognitive loops instead of on-event algorithms. My apologies if i’m preaching to the choir, but some folks seem to need more convincing. I’ll let Dr. Pinker take over, quoting from the same spot.

Our consciousness, even more than it is posted in space, unrolls in time. I can imagine abolishing space from my awareness – if, say, I were floating in a sensort deprivation tank or became blind and paralyzed – while still continuing to think as usual. But it’s almost impossible to imagine abolishing time [Pinker's emphasis] from one’s awareness, leaving the last thought immobilized like a stuck car horn, while continuing to have a mind at all. For Decartes the distinction between the physical and the mental depended on this difference. Matter is extended in space, but consciousness exists in time as surely as it proceeds from “I think” to “I am”.

Pinker later drops a William James quote:

The practically cognized present is no knife-edge, but a saddleback, with a certain breadth of its own on which we sit perched, and from which we look in two directions into time. The unit of composition of our perseption of time is a duration, with a bow and a stern, as it were – a rearward- and forward-looking  end…. We do not first feel one end and then feel the other after it, and from the perception of the succession infer an interval of time between, but we seem to feel the interval of time as a whole, with its two ends embedded in it.

James called this the “specious present”. Pinker elaborates:

How long is the specious present? The neuroscientist Ernst Pöppel has proposed an answer in a law: “We take life three seconds at a time.” That interval, more or less, is the duration of an intentional movement like a handshake; of the immediate planning of a precise movement, like hitting a golf ball; of the flips and flops of an ambiguous figure [refering to optical illusions elsewhere in the book]; of the span within which we can accurately reproduce an interval; of the decay of unrehearsed short-term memory; of the time to make a quick decision, such as when we’re channel-surfing; and of the duration of an utterance, a line of poetry, or a musical motif, like the opening of Beethoven’s Fifth Symphony.

Practically speaking, from a very low level, if the AGI implementation were monitoring a data stream, one that was merely event-based (i.e. running an algorithm on the receipt of data) would be incapable of acting upon the absence of data. This was the original reason why i started to focus more on cognitive loops, but these days the more i read stuff like Pinker’s books, the more i’m convinced that this is the only way to go.

I am personally unaware of any serious AGI project that uses cognitive looping as a basic architectural feature. Do any readers know of any?

The full HTM

September 5th, 2012

It seems congratulations are in order: Dileep George recently landed $15 million (!) in Series A funding from the likes of Peter Thiel et al. (See their press release.) Dileep started the company about 1.5 years ago by partnering with a fellow who specialized in finding funding for startups, so… mission accomplished I guess. Now the hard part.

The last time i spoke with Dileep was around the time that he was starting the new venture. He gave me a brief overview of what his goals were (besides funding). It was high level, but enough information for me to know that it wasn’t what I wanted to work on even if Dileep wanted to hire me (which he never said he did). But thoughtfully, he directed me to speak with Subutai at Numenta, saying they were planning a big engineering push and needed bit-heads for it, which shortly thereafter resulted in my consulting gig with Numenta for roughly 9 months.

Ok, nothing new so far. It’s just that the conversations at the time were interesting. I had always had a problem with the initial implementation of the HTM as described in Jeff’s book, On Intelligence, as implemented by Dileep and Jeff at Numenta all those years ago. I was a member of what was known then as the Algorithmics Working Group (AWG), which met, as I recall, a grand total of twice. I attended the first meeting. The second one took place a few weeks after the birth of my daughter, so I skipped on that excuse, but I honestly wasn’t overly interested in going. The problem was that what Numenta really had built back then was not an HTM, but an HM. There was no notion of temporality in the Zeta1 algorithms. The HM did some cool stuff, and they had built some serious clustering technology around it, but the blank looks on the faces around the room when I asked how to process aural signals was enough for me to know they weren’t heading my direction.

What I gathered from Dileep, Jeff, and Subutai was that some time in 2010 there were grave existential conversations happening in the Numenta offices. Apparently, Jeff and Subutai had come around to thinking that temporality in the algorithms might be useful after all. Dileep, on the other hand, was perfectly happy with what they had done so far. I assume a flurry of NDAs and other lawyerly invoicing ensued, with the eventual result of Numenta embarking on an entirely new conceptual course, and Dileep taking his Zeta1s and going home. (The original Vicarious office appeared to be Dileep’s apartment, although I never saw for myself, thank goodness.)

So while Vicarious was only interesting to me at the time by way of the inexplicable origins of the company name (which I still don’t understand but have a few theories), Numenta’s appeal had vastly improved. After reading their whitepapers and having a few conversations with Subutai, I had confirmed to my great satisfaction and excitement that they, indeed, had rediscovered the ‘T’, and were embracing it fully. Hallelujah!

Well, sort of. As I mentioned before, I was hired there as a web development engineer. (If you saw the early web application, or used the original API, that was pretty much all me – at first at least. There was some great code in there, if I say so myself.) Very soon I understood at a high level what the engine did, and early on I decided not to attend the algorithm meetings, both because I had a lot of web development stuff to do, and because I wanted to maintain a distance from the gory details. In short, I wanted to be able to plausibly claim that the algorithms I was personally developing were not influenced significantly by what Numenta was doing.

The reason for this was that I again realized that Numenta hadn’t built an HTM. This time they had built a TM. Conspicuously missing was any notion of a hierarchy. Which was really weird because it was so prominently a part of the previous effort. WTF? Additional problems included the implementation of the temporality, which is restricted to a single “time step”. This may be one second, five minutes, a day, or whatever you like, but to me it is still an eye-roller of a constraint. The TM was pretty good at predicting – in my synthetic data – that value B at t+1 was the same as value A at t, but it completely missed in a different data set that B at t+2 was the same as value A at t. It was surprisingly good at predicting the next value in data sets such as 15 minute energy consumption and daily grocery sales, but these were the data around which the algorithms were tuned, so maybe it would be a bigger surprise if it couldn’t predict the next value.

My own work at the time was around erasing the time step constraint and reinventing the hierarchy, which is why I lost interest in the TM. (I still like the SDR/FDR stuff though, which certainly helped the TM deal with noisy data.) I’m happy to say that i did find a bit of time to work on my ideas, and indeed also found a commercial application. (Selling it is another matter.) I’m even happier to say that I should have some time coming up to work on it more.

Somewhere around November of last year Numenta offered me stock options to pretend that I was a full time employee. (The logistics of actually hiring a Canadian living in Canada by a U.S. company are frightful I’m told. You basically have to open a Canadian branch office. I offered the den in my house in this regard, but they declined.) I turned them down, both because I felt there was a strong resistance to algorithm ideas that didn’t come from high enough up, and because of the strange lawyerly (there’s that word again – if it is a word) language around how the options actually worked. About 2 months later, I knew my gig was up. There was less work coming my way, and a general hate-on for anything that wasn’t python. (I worked in Java.) It was about another month and a half before I was asked if I could come to the meeting room.

Numenta is stacked up and down with talented and inspiring people. I’d easily say my time there was the best job I’d ever had, full time or contract. If they asked me to come back and help again, I would do so happily, do my best, and as before keep my formidable cynicism to myself. But I do hope they revisit their algorithms again sometime soon.

A final note about Vicarious. Their Recursive Cortical Network (™!) sounds interesting, although I have a history of being sucked in by the marketing. The reason is because one of the ways in which I avoid a time step is by running a cognition in a loop, rather than just running some code when you get a new row of data. (“Recursive” is a way sexier word.) Yes, this runs in a digital computer, and so you can never really get away from a time step, but you can make it as small as practical for your application, which in the limit will approach analog. Then, as new data arrive, no matter when it arrives, you feed it into the loop and it influences the cognition. (In fact, the timing of the arrival can influence the cognition regardless of what the data actually are.) If Dileep does something similar, and has also managed to keep the old hierarchy from harm, he may be on to something. Maybe even the full HTM.

Quick disclaimer

June 3rd, 2011

The rumours are true… I’m working with Numenta. I’ve been silent for a while not because i can’t talk about the work (i can – at least i think so), but because there is just a ton of work to do. I hope to have some new posts up over the next few weeks.

Atheleticism – the final frontier

March 5th, 2011

A while back i read about how the most highly respected of economists that make predictions are those whose predictions are usually wrong. More specifically their fame is typically the result of a single prediction that somehow managed to come true. Then, their subsequent predictions are received as divinely inspired, and even though they turn out to be spectacularly wrong, no one had bothered to follow up, and so they never have to face the music. This post is not about how humans brains do the same thing (although there is a lot of evidence that they do). It’s not even about how Monsieur Kurzweil could be called to task on some of his Age of <insert here> Machines books predictions (although he could). No, this post is for me to make some predictions. I figure that in the 10 or so years it would take before someone could come to do some reckoning, this post will be hopelessly buried under at least 4 other posts, and i’ll end up scot free. Unless they come true, in which case i’ll be posting links to this from everywhere and i’ll be a superstar. Yeah!

1) I predict that within the next 10 years significant progress will have been made in understanding how human intelligence works, and in the development of computer algorithms that simulate it. There will still be a long way to go before they start taking over truly intelligent tasks, but enough of a stir will have been created that…

2) A burgeoning industry will develop around this intelligence beachhead as it will be easy to see the massive economic incentives involved. At the same time there will be individuals, groups, lobbyists, NGOs, and all other manner of organizations (including governments) that will be wringing their hands in collective worry about what the implications of this work will be. And rightly so, since it will be difficult to follow.

3) Some time later – no time prediction except inevitable – truly intelligent machines will be developed. I’ve written about this before, and so i won’t go over the details, but suffice to say i believe there is a roughly 95% chance that these machine will be beneficial to human civilization. Cross your fingers.

4) The machines will get into every human endeavour – and many more of which humans have never conceived – except one: athleticism. In the same way that narrow AI has produced artificial checkers, chess, and Jeopardy! champions, manufacturing has produced robots that can create products orders of magnitude better than any human with dumb tools could hope. (Check out the guy on TED that made a toaster from scratch.) There’s certainly been progress on both sides. But there’s a big difference. Continued advancement in AGI can, by all indications, be achieved with better software and more computing power. Better robots need these too, but they will also need better sensors and actuators, and you won’t have truly athletic robots until you have both the brains and the bodies. By sensors, i mean touch, heat, pinch, proprioception, et al. There are places where such things are being developed, but i believe they will lag far behind brain development. Besides, with the ability to create any specialized hardware they need for their own purposes, there will be as little incentive to create athletic robots as there is to create airplanes that fly like birds.

(Corollary: the T1000 will never be created. This may seem unfortunate, but lets face it… If the machines want to off us they will just start a city-size grow-up, burn it, and then roll around on wheels whacking us while we lie giggling in the streets. Fish in a barrel man.)

A final note. While i watched the last World Cup, i marveled at the skill of the players. Not only could they stand up and balance (hard enough for robots); not only could they run (beyond current robots); not only could they run and kick a ball at the same time (no comment needed)… These players can manage the game at at least 5 levels all at the same time. They can 1) manage to get a ball moving in the direction that they want, 2) keep said ball away from opponents by reading their movements and predicting their next actions, 3) track their teammates to assess who may be in a good position, 4) determine where to kick the ball such that a chosen teammate can successfully intercept the pass instead of an opponent, and 5) keep in mind and integrate the overall strategy of play for the entire game. Anyone who gives any thought to this cannot seriously pay due to the phrase “dumb jock”. These guys are athletic geniuses, and i predict they will be considered such for a long, long time to come.

Animal electricity

February 18th, 2011

Was inspired by TV again. This time it was a TVO program (originally BBC) called The Story of Science. It’s a clever show where, “Michael Mosley takes an informative and ambitious journey exploring how the evolution of scientific understanding is intimately interwoven with society’s historical path”. I’m thinking this TV thing might actually be useful. If i get inspired by Two and a Half Men, i’ll know i’m really on to something.

Anyway, Mosley was talking about “animal electricity”, a concept pioneered by Luigi Galvani. It was later to be rechristened bioelectromagnetism, but at the time Galvani truly thought that he was on to the source of life itself. And honestly, you can’t really blame him. The year was roughly 1780, and although people knew about it, electricity was still pretty much a mystery. You can cut the guy some slack if he concluded somewhat prematurely that this type of energy – which even the majority of people alive today don’t understand – held a higher place in the meaning of life than it actually turns out. From his point of view, if this electricity thingy could cause muscles to move, well, that kind of solves it, no? If you’re still skeptical, recall that Galvani – in his obvious enthusiasm – quickly turned his jumper cables towards cadavers, fully expecting that the bodies would leap up from the slab like Jason Statham in Crank (two hours i’ll never get back). I honestly think he was quite disappointed – Galvani, that is. I would have been too, because, seriously, i would have tried it.

I’ve mentioned before that my obsession with AGI causes me to make parallels with damn near everything that enters my brain, but this one caused a whole clan of neurons to fire. Remember when pattern classification was the very soul of AI? And then memory became the hot thing. Then learning, whatever the hell that was; but whatever it was, we needed it. But wait, what about inference? Hierarchies? Fuzzy logic? Bah! Fools! All you need is LOGIC (sans fuzziness)!

Well, we now know that Galvani, god bless him, was correct in that electricity is a necessary component of most life as we know it, but not a sufficient one. To really fulfill his Frankensteinian dreams he would also need to be intimate with any number of biological disciplines, not least of which is cellular chemistry. And likewise, most of us AGI folk ought to be on to the fact that any of the narrow AI approaches are likely necessary in some way as well, but, alas, insufficient on their own. To some degree this can be considered the Binding Problem. (It’s a leap, i admit. But i did say, “to some degree”.)

Probably the reason why this thought jumped out at me (oddly, from inside my brain) is that to an extent i’ve already been working on such approaches. I imagine others are too, but i claim that it was a completely original thought, having not actually heard of it anywhere else. It first started when i was working on predicting single value data streams, such as those produced in industrial control. I was trying to characterize “modes” of operation of equipment from the streams, and realized that such could be done if i measured multiple attributes of the values, especially over time. Example attributes would be instantaneous amplitude, boxcar amplitude, instantaneous change, volatility, sudden change of averages, etc. Independently, none of these attributes is normally of much use, but when they are converted into events (which interestingly start to look like neuron traces) they can start to be predicted. And when they are formed into temporal patterns and/or arranged into hierarchies it gets even better. Multi-modal prediction can occur by creating a hierarchical level that looks for patterns in the identification and prediction of events in the combination of individual streams. And then, those predictions can be arranged into hierarchies just like they are the lower levels.

Such arranging has its limits, of course. But it’s easy to make psychological parallels with this approach, so it remains intriguing. The problem i personally have, as i complained about in my last post, is finding decent data to work with.

Going off on a tangent for a moment, let me just say that i like MLComp and Kaggle. The latter especially so because they’ve formed a commercial structure around their stuff, which inevitably is going to be more resilient and spawn more ingenuity than open source (speaking from long experience here). But the big problem that i have is that their data sets are static. The algorithm has no opportunity to affect its environment. No matter what output it provides, the next set of data will always be the same. It’s like reading a book: no matter what you think of it or mutter to yourself, the next page will have all the same words it always had. Contrast this with a dynamic environment, where the actions of the agent can change what will happen next, like a choose-your-own-ending book. I hate to bang on it again, but this is exactly what GoiD provides. It allows the AGI implementations you write to have the element of action. And if you don’t think that’s useful, well, thanks for reading this far.

But back at AGI, the bottom line in my opinion is that diversity is good. Don’t bother looking for silver bullets – they don’t exist. Create small algorithms that do something useful well enough, and then find a way to arrange them such that an overall structure can develop confidence in the ones that work in particular contexts, and ignore them when they don’t.