The full HTM

September 5th, 2012

It seems congratulations are in order: Dileep George recently landed $15 million (!) in Series A funding from the likes of Peter Thiel et al. (See their press release.) Dileep started the company about 1.5 years ago by partnering with a fellow who specialized in finding funding for startups, so… mission accomplished I guess. Now the hard part.

The last time i spoke with Dileep was around the time that he was starting the new venture. He gave me a brief overview of what his goals were (besides funding). It was high level, but enough information for me to know that it wasn’t what I wanted to work on even if Dileep wanted to hire me (which he never said he did). But thoughtfully, he directed me to speak with Subutai at Numenta, saying they were planning a big engineering push and needed bit-heads for it, which shortly thereafter resulted in my consulting gig with Numenta for roughly 9 months.

Ok, nothing new so far. It’s just that the conversations at the time were interesting. I had always had a problem with the initial implementation of the HTM as described in Jeff’s book, On Intelligence, as implemented by Dileep and Jeff at Numenta all those years ago. I was a member of what was known then as the Algorithmics Working Group (AWG), which met, as I recall, a grand total of twice. I attended the first meeting. The second one took place a few weeks after the birth of my daughter, so I skipped on that excuse, but I honestly wasn’t overly interested in going. The problem was that what Numenta really had built back then was not an HTM, but an HM. There was no notion of temporality in the Zeta1 algorithms. The HM did some cool stuff, and they had built some serious clustering technology around it, but the blank looks on the faces around the room when I asked how to process aural signals was enough for me to know they weren’t heading my direction.

What I gathered from Dileep, Jeff, and Subutai was that some time in 2010 there were grave existential conversations happening in the Numenta offices. Apparently, Jeff and Subutai had come around to thinking that temporality in the algorithms might be useful after all. Dileep, on the other hand, was perfectly happy with what they had done so far. I assume a flurry of NDAs and other lawyerly invoicing ensued, with the eventual result of Numenta embarking on an entirely new conceptual course, and Dileep taking his Zeta1s and going home. (The original Vicarious office appeared to be Dileep’s apartment, although I never saw for myself, thank goodness.)

So while Vicarious was only interesting to me at the time by way of the inexplicable origins of the company name (which I still don’t understand but have a few theories), Numenta’s appeal had vastly improved. After reading their whitepapers and having a few conversations with Subutai, I had confirmed to my great satisfaction and excitement that they, indeed, had rediscovered the ‘T’, and were embracing it fully. Hallelujah!

Well, sort of. As I mentioned before, I was hired there as a web development engineer. (If you saw the early web application, or used the original API, that was pretty much all me – at first at least. There was some great code in there, if I say so myself.) Very soon I understood at a high level what the engine did, and early on I decided not to attend the algorithm meetings, both because I had a lot of web development stuff to do, and because I wanted to maintain a distance from the gory details. In short, I wanted to be able to plausibly claim that the algorithms I was personally developing were not influenced significantly by what Numenta was doing.

The reason for this was that I again realized that Numenta hadn’t built an HTM. This time they had built a TM. Conspicuously missing was any notion of a hierarchy. Which was really weird because it was so prominently a part of the previous effort. WTF? Additional problems included the implementation of the temporality, which is restricted to a single “time step”. This may be one second, five minutes, a day, or whatever you like, but to me it is still an eye-roller of a constraint. The TM was pretty good at predicting – in my synthetic data – that value B at t+1 was the same as value A at t, but it completely missed in a different data set that B at t+2 was the same as value A at t. It was surprisingly good at predicting the next value in data sets such as 15 minute energy consumption and daily grocery sales, but these were the data around which the algorithms were tuned, so maybe it would be a bigger surprise if it couldn’t predict the next value.

My own work at the time was around erasing the time step constraint and reinventing the hierarchy, which is why I lost interest in the TM. (I still like the SDR/FDR stuff though, which certainly helped the TM deal with noisy data.) I’m happy to say that i did find a bit of time to work on my ideas, and indeed also found a commercial application. (Selling it is another matter.) I’m even happier to say that I should have some time coming up to work on it more.

Somewhere around November of last year Numenta offered me stock options to pretend that I was a full time employee. (The logistics of actually hiring a Canadian living in Canada by a U.S. company are frightful I’m told. You basically have to open a Canadian branch office. I offered the den in my house in this regard, but they declined.) I turned them down, both because I felt there was a strong resistance to algorithm ideas that didn’t come from high enough up, and because of the strange lawyerly (there’s that word again – if it is a word) language around how the options actually worked. About 2 months later, I knew my gig was up. There was less work coming my way, and a general hate-on for anything that wasn’t python. (I worked in Java.) It was about another month and a half before I was asked if I could come to the meeting room.

Numenta is stacked up and down with talented and inspiring people. I’d easily say my time there was the best job I’d ever had, full time or contract. If they asked me to come back and help again, I would do so happily, do my best, and as before keep my formidable cynicism to myself. But I do hope they revisit their algorithms again sometime soon.

A final note about Vicarious. Their Recursive Cortical Network (™!) sounds interesting, although I have a history of being sucked in by the marketing. The reason is because one of the ways in which I avoid a time step is by running a cognition in a loop, rather than just running some code when you get a new row of data. (“Recursive” is a way sexier word.) Yes, this runs in a digital computer, and so you can never really get away from a time step, but you can make it as small as practical for your application, which in the limit will approach analog. Then, as new data arrive, no matter when it arrives, you feed it into the loop and it influences the cognition. (In fact, the timing of the arrival can influence the cognition regardless of what the data actually are.) If Dileep does something similar, and has also managed to keep the old hierarchy from harm, he may be on to something. Maybe even the full HTM.

2 Responses to “The full HTM”

  1. Could you please clarify this:

    “that value B at t+1 was the same as value A at t, but it completely missed in a different data set that B at t+2 was the same as value A at t.”

    How can A and B be the same?

  2. Matthew Lohbihler says:

    Here’s a sample of the “t+2″ data, with A as the first column and B as the second. Each row is a new t. I.e. it’s just a temporal offset.

    0.387034905 0
    0.392073992 0
    0.492282476 0.387034905
    0.922831358 0.392073992
    0.651688379 0.492282476
    0.780181491 0.922831358
    0.361700701 0.651688379

RSS feed for comments on this post. And trackBack URL.

Leave a Reply