During the research to the second edition of my book, I became fascinated by the work of a scientist called Jon Kleinberg. He’s the developer of an algorithm known as HITS (Hypertext Induced Topic Search). The intuition behind HITS is very important as it’s based on the notion of “hubs and authorities”, a term I’m sure you’ve heard before.
The simplest way to explain it, perhaps, is think about links between documents as being viewed this way:
- Authority comes from in-edges (pages which point to yours)
- Being a good hub comes from out-edges (pages which you point to)
This creates a mutually reinforcing relationship:
- A good authority is a page that is pointed to by many Good hubs.
- A good hub is a page that points to many good authorities.
However, it’s vital to remember that, this process is a way of, not just identifying linkage patterns, but also identifying web communities and the major players within them. If you’ve attended any of my presentations, you’ll know I describe it quite simply this way: The web sees all links as being equal. No link has any preference to another: however, with search engines, some links are certainly more equal than others: and some are infinitely more equal.
This is why we talk about “link quality” and not just quantity. A single quality link can frequently have 50 times more power than 100 random or “less qualified” links.
Both Google and Teoma are prime examples of search engines which base their ranking algorithm around the nature and the characteristics of linkage data.
There’s a lot of information about PageRank the Google algorithm online, but not as much about HITS. And HITS is possibly the most influential algorithm in information retrieval on the web. I wrote a document about HITS (which is free) a short while ago. If you don’t have it, I’ll give a link at the end of this, long and very in-depth feature.
Teoma has been fairly open about the similarities in its algorithm to that of HITS. But the technology they have developed goes well beyond that which was achieved by Kleinberg, or another variation on the algorithm called CLEVER, which was an IBM project.
I was delighted that Paul Gardi, SVP Search at Ask Jeeves/Teoma, was willing to meet me for lunch to help Find an analogy which would be useful to readers of this newsletter, about how the ranking process at search engines, is based very much on social sciences as much as pure linkage data. I wasn’t disappointed at all. If you’ve ever wondered how you’re supposed to do well in the linkage based race to the top of the major search engines when you’re a brand new and “linkless” site, then pay very special attention at the point where Paul says: “I’d say the same thing as I’d say to some guy who’s just arrived in town and said, you know, I need a job. I’d say: What are you good at?
By the way: I’m not sponsored by McCormick & Schmick’s! Even if they do get mentioned so many times in this newsletter (but if you have one in your city, be sure you’ll find me sitting there when I visit!). So, here is where I’m having lunch with Paul Gardi. Paul is based in New York and I’m in Boston for a conference which Paul’s attending also. So, it quite simply makes sense to meet in… you know where!
If this is the first time you’ve read this newsletter, let me tell you that, I publish the entire transcript of my interviews verbatim. It’s just as though you were sitting at the table listening. However, as you’ll discover, this interview ends with my tape being switched off. You’ll discover why.
Paul and I are also joined by the delightful Alexa Rudin, Director of Communications at Ask Jeeves (third party witness). If you REALLY want to know where search technology is going: read between the lines!
<< Play >>
Mike: [Settling down into the corner of a cosy booth] I can’t quite place your accent Paul.
Paul: It’s South African.
Mike: Ahh! It’s not that strong, but I could just detect a little “something”. So, anyway, I always do this, it makes it a little easier to get up to speed: you’ve mentioned South Africa, so what’s your background before Jeeves/Teoma?
Paul: Well, my background is Procter and Gamble on the brand management side; venture capital with a couple of Internet companies and then I invested in Teoma.
Mike: Right. So you were approached about the Teoma deal and saw that it was going to be something big?
Paul: Yes, when we met Apostolos [Apostolos Gerasoulis, founder of Teoma], we could tell that he had a very definite vision. Also that his approach was unique to any other approach which was being taken in the search space and he just had the background and the credibility to build this kind of thing. The way he described his idea to us, and after due diligence it became clear to us, that his path was unique. And we developed enough confidence in him and his ability to actually build it.
When we initially invested, it wasn’t really anything yet. It was a bunch of early algorithms which had been put together as the DiscoWeb project. But there was certainly enough there to see that this would definitely become something. And we decided to back him… that’s really how I got involved, on the investor side at first.
About a year later, the company had matured, it had matured its algorithms and they approached me and asked me if I’d come in and consult for them for a short while. They just wanted a little assistance on the way forward… you know, how to take it to the next level…
[And, as if he had been following me since my conversation with Jill Whalen in the last issue - up pops the waiter - completely oblivious to the interview taking place. He hands Paul his Coke, makes some small talk about the weather and then... notices the microphone. At which point, he skilfully returns to the kitchen with an eyes down, backward gliding motion.]
Paul: [continuing] … my background’s really technology VC and business building…
Mike: I was going to ask about that, because you’d probably have fried your brains going through all of that technology and the HITS algorithm, as well as everything else, if you didn’t have some sort of [adopts fingers like Rabbit ears twitching in the air pose] “techie” kind of background too.
Paul: Yes, and for a lot of it, not many people in the world, even those with PhD’s, understand what’s really going on with Teoma.
Mike: [With knowing look - and laughing] I know…it took me one year just to try and put some of this down. To put it down in simple English so that people may be able to understand at least some of the principle operators in this type of information retrieval!
While we’re on the subject of technology, I’ve been very lucky, in that, during the research for my book, I’ve been able to get to speak to some of the prime movers in the industry. I talked to Brian Pinkerton, developer of the web’s first full text retrieval search engine [WebCrawler] and discussed how he implemented early IR [Information Retrieval] technology such as the vector space model and also how linkage data was used back then. But, I think now, the science and technology is developing in its own right. In the early days of the web, there was only work which had been done in the field of text retrieval for digital libraries and that sort of thing. Now, at least we have years of research into information retrieval on the web to call upon.
Paul: I’d say, it’s now more years of research, into what I call social network theory. In universities, people like Kleinberg [Jon Kleinberg developer of the HITS algorithm] understanding how people interact and how networked structures are predictive of certain links, like hubs and authorities. Basically, being able to draw out of a structure some predictive and very interesting analytics.
Really, what we’re doing now is consolidating a well known theoretical construct from others on how this should work, and putting it into a practical application.
Mike: Yeah, I know what you mean. I’ve tried to explain this, about bibliometrics and citation analysis… in fact, I had a conversation with Andre Broder, former chief scientist with Alta Vista about the best way to describe it some time ago. Trying to simplify it is the hard part. Trying to get the average SEO to look beyond basic HTML code and the technical stuff surrounding the creation of web pages and become more aware of the social sciences aspect…
Paul: Well that’s really where I got into Teoma. The company was made up of a bunch of really smart guys who had taken a technology from point X where they started and on to point Y where they passed all the milestones that were required for us to continue to invest. But, what they hadn’t done… well put it this way, it was so advanced, there was no one who could understand what they were doing, it wasn’t down in plain English. It was very difficult because they had no real way of communicating to investors, or customers what they were doing. By that I mean, they hadn’t developed a way of simplifying down to a general 30 second spiel. Two hours was hardly enough time to explain the technology behind it!
So, one of the first things I had to do was to try and distil it down, into a language that makes sense to someone outside… a set of concepts that make sense to someone in, say, five minutes of conversation. We haven’t quite done it yet - but we’re getting there.
Mike: I’m just quietly sitting here grinning and agreeing. When I saw the initial press release for Teoma, I thought, well… I’m not quite sure that it says everything it could - but it certainly gets a lot of it down in two pages - considering it took me personally about five hours to cover it! [bursts out laughing]
Paul: That original press release was a bit “non specific” to anything we were doing to be honest. In fact, you say it didn’t say everything it could: it might have been “off the mark” deliberately [adopts a slight "knowing" grin]. We did not want anyone to know exactly, especially because, around that time, there were several people coming out and saying that what we were doing wasn’t possible. CLEVER [an IBM project which further developed Jon Kleinberg's HITS algorithm] had not demonstrated the success of…
Mike: You mean the run-time-analysis problem?
Paul: Yes, basically being able to calculate the local structure in real time… or in any time in a useable fashion. They [CLEVER] had not demonstrated that they could get the additional information, the much more pertinent information about these pages, about who they are, what they are and how they fit, which allows you to rank them. No one had succeeded in bringing that to the surface. And there were some papers (and some people) going around saying that this was just not possible and that you’d need a server farm the size of Texas and all the electricity in the world, basically, to calculate the amount of information that’s required to work out the communities. The local subject communities for every subject so they could then understand who the hubs and authorities are.
Mike: Am I right in saying that in the early stages, as the DiscoWEB project, they managed to get it to about ten minutes?
Paul: It might have been about ten minutes. But our speed run now is sub .03 seconds…
Mike: And that’s just so fantastic. I know when I was going through my research for the last edition of my book, I was thinking: “ten minutes to do all that is pretty good - but nobody’s ever gonna wait ten minutes for a result!”
Paul: And if you process this on the fly, I mean… if you were to try and present this information… like there’s a lot of talk about how they’re [other search engines] all providing information on clusters and communities and being local subject specific. But unless they’re coming to our engine and secretly scraping our results, well, to do even one calculation would be… CLEVER and IBM were trying to do it using some very smart people, but it just takes a very long time unless you approach it in the way we approached it.
Mike: The patent for Kleinberg’s HITS algorithm is owned by IBM as it’s wrapped in the CLEVER project. I don’t believe that CLEVER was ever used in a commercial search engine though. I mean I never actually ever saw a search engine with the words “Powered by CLEVER” or “Powered by IBM” or anything like that. So, it doesn’t seem to have made it at all into the commercial world of search.
Paul: I can’t say whether CLEVER did work or not, I don’t work for IBM. We certainly haven’t seen anything in the “real world” that appears to be an outgrowth of CLEVER. I believe the project still exists at IBM, I just don’t really know what they’re working on right now.
Mike: I’d like to come back to the HITS algorithm and Teoma in a few minutes, but can we talk a little about the connection with Jeeves? How did that come about?
Paul: Basically we were moving along and thinking about our business model and we approached all the players in the industry at one point or another. We were looking to see if we could find the right partner. And Jeeves were really the ones that came back with the right level of interest. They gave us the right resources to make the thing happen. It had to be taken through to sub-second processing times and we knew that this was going to be the next big thing in search. We’d solved the problem: This was like the Holy Grail in the search industry. And Jeeves were great, they showed that they were committed to taking it all the way, right through to the end.
Of course, my responsibility was to both the shareholders and the scientists who’d been shedding blood, sweat and tears on the project for three years. And those guys needed to be somewhere where they could see the way forward.
Mike: And Teoma must have been an attractive proposition because of the crawling aspect, as Jeeves didn’t actually crawl the web anyway.
Paul: Actually Mike, Jeeves were working on their own technology at that time. And part of that was crawling technology. We actually integrated a lot of what they had into Teoma. There was some good stuff. They’d done some good work. They had a good team and done some good research.
Mike: But from a search engine marketing point of view, they did present some problems in that, you simply couldn’t get into Jeeves… well, not without some clever moves. They didn’t crawl the web, they were a kind of directory you couldn’t submit to… they were… yuk! I’ve started using the paid inclusion service for some of my clients, and we’re seeing some pretty healthy traffic on certain searches now. So, I guess the crawling and the paid inclusion kind of… paid off!
Paul: Paid inclusion? You know, some people are saying that it’s solved the problem, some say we’re simply scratching the surface… I think we’re only just moving our hand down to the surface.
I think the paid inclusion program, along with a number of other models such as PPC is going to provide a much more positive ROI for significantly expanded audiences in the search industry.
The combination of algorithmic search and other data we have to identify structures is incredible. What we do ensure with paid inclusion though, is that it has no impact whatsoever on relevance i.e. paid inclusion is guaranteed entry to the index, but no priority or preference is shown. We maintain absolute integrity within the ranked results. We’ve come so far with all of this research into structures and hubs and authorities in order to be able to determine exactly what are the authoritative sites. So we’re all about absolute relevance. If you see the shift in Jeeves - Jeeves has come along way in terms of relevance. It’s so much better than it ever was a few years ago…
Mike: As I mentioned earlier, it was pretty hard to describe what Jeeves was at the time.
Paul: Yeah I know: Was it a question and answer service, was it a directory, was it a search engine? I think users were a little confused about exactly what it was.
Mike: When I first got involved in search engine marketing, or search engine optimisation as it was… Oh for heavens sake, Spam is probably what it was back then… There was a very shaky kind of, “them and us” sort of division with search engines. And certainly there was a technology battle going on between the big time Spammers and the search engines. The division isn’t so great these days because of the commercial element which has crept in. I mean, I kind of figured that we’d had such a good free ride with the search engines, that they’d have to come and charge us for something anyway. But, the models which have been developed, I mean, pay for consideration: what is that? And pay for inclusion… I use paid inclusion for my clients, but you know, when they ask me what it means, I sometimes say to them: “You know how you used to be at 13,397 in the index? Well now you’re paying to be there [laughs].”
Paul: Well, the motivation for paid inclusion is not about just getting into the engine. I mean we’re not doing our job if it simply becomes a way of getting into the engine. The real motivation is about being refreshed often enough in the index. You know, making sure the fresh information on your page is just that. And if you’re building sites with dynamic pages etc. then you’re targeting in the proper way if you use paid inclusion. The ability for a user to find your pages in the index is much higher. You know what you just said about being 13,397 in the index and paying for it - well you may be if your pages are not relevant to a user query. With paid inclusion you get a chance to constantly update and optimise your pages so that all the more they become relevant to user queries.
Sure, if someone is looking in the index to see if they’re included in there, then paid inclusion is a way of doing it. But one thing is certain, as I said earlier, there is no way that paid inclusion will in any way artificially increase or improve the rank of your pages. Of course, our crawlers are random, so it’s likely that there are some sites we’re just going to miss (as with any other crawler engine). And out of the stuff we do crawl there’s a whole load which simply has to get dumped because it’s no good. If it’s just about being found, then you can try submitting if it’s free anywhere, but if you’re linked well - we’ll find you eventually anyway.
Mike: I think it’s important, because of the kind of mysteries which go on at search engines and also the fact that I have two audiences for my newsletter, those who are very new to search engine marketing and seasoned veterans, that I always pitch two different types of questions.
So, I do need to cover some of the basics. For me personally, I use paid inclusion because I know how to optimise pages for specific search engines. But for the guy who’s new to all this and simply gets his web development team to create a beautiful 100% Flash web site with a neat php back-end powering it up - even with paid inclusion he’s still in trouble, yeah?
Paul: Well, actually, with Flash, through the methods we use, we’d be able to find that page more easily. We would have a better picture of what that page is and what it’s about than most. But more to the point, we’d understand why people were looking for it. Even though it’s a Flash page. Whereas, if our crawler looked at that Flash page - it just sees nothing… maybe just a couple of words that really aren’t relevant…
So, Flash, dynamically generated pages, these are the areas where paid inclusion is almost essential at this time. But in the long term, the real win is gonna be that, any company, selling anything, has an opportunity to expand significantly the opportunity to provide information about their pages which they can make available to us. So then we can be much smarter about finding them when our algorithm tells us it’s relevant. That’s the opportunity which is developing here. It’s the local corner store, sending me data about all of the products they have and even if there’s only one in ten million search queries which is relevant for that store, we’ll make sure it comes up at number one [if that's where it belongs].
We’re going well beyond, so far beyond what humans could do or achieve. We’re moving into a realm of very, very smart algorithms looking at information in a way which will really improve the relevance and the customer experience.
Mike: Learning machines? This is something I’m really into at the moment, vector support machines are something that I’m looking at very closely. I’m just so into AI [artificial intelligence] my wife is going to kill me for getting so excited about a lot of intelligent zeros and ones [bursts out laughing]… Seriously, it’s a good job my wife is a Russian intellectual - actually, she’s so smart she scares me sometimes…
Paul: They’re learning all the time. That’s why no one can know what they really are. These algorithms are continually being tweaked and tuned…
Mike: Is it too far fetched to start and imagine these machines beginning to start and think for themselves?
Paul: This is real, what’s actually happening. It becomes an interesting philosophical discussion…
Mike: So you realise I’m asking that question for myself and nobody else [laughs]
Paul: How intelligent are these machines? Think about your brain and think about how things work, like how do we make decisions? So how could a computer make decisions? How different is that? In the same way, I, guess, that the brain uses facts, figures, intuition.
What the algorithms are doing is gathering this type of information. And in the case of Teoma, it turns out, because we’re going down to the level and depth of information we are doing, it happens to be extremely valuable information. This is very targeted stuff and these machines are very smart and they can find this very valuable information and then process it at these almost unimaginable speeds. No one could have imagined this… It is artificial intelligence. Our job is to put all of that information, to input it to the engine so that it gets smarter and smarter and has more and more to think about and things come together in a better completeness.
Mike: There has been a bit of a technological battle going on between the search engines and the search engine optimisers or, Spammers, I guess is what I’m really saying. And that’s one of the major problems that search engines have to endure… I mean how difficult is that? How difficult is to deal with the Spam?
Paul: I don’t really want to talk about Spam Mike. I don’t want to give the impression to some hackers out there that… well… Spam is a significant issue on the web because it affects our user experience and that’s what we are concerned about. If I take another perspective on it, the truth is, Spammers spend more time at looking at these methods, when if fact, if they spent more time creating great content, they’d score anyway - without fear of retribution. You know, we have ways of dealing with Spam. We don’t talk about them, naturally. We’re successful where we have to be. We always see new techniques being used and we watch it, and if we don’t like it - why would anyone else?
Mike: I think just generally dealing with it is a problem, but the fact is, there is just so much bad advice and downright bulls–t out there. A lot of it is out of ignorance and lot of it’s usually some jerk who’s proclaimed himself some sort of SEO Guru and he’s trying to sell his latest eBook nonsense or “magical” search engine software scam…
I never get tied down on this whole ethical issue about what happens online - it’s a free place and it’s a big place - so there’ll be good and bad guys out there. I just wish that these guys selling trash to poor innocent newbies would have the guts to tell them that they’re being sold Spam techniques and software. Some of these poor guys, and they’re not all just mom and pop operations, there are still larger companies buying into this garbage, and yet they are the ones who find themselves delivering their brand into the search engine Spam cesspit.
Paul: The problem is, once you get caught, it’s just so difficult to bring yourself back into the good books. Our machines will find it… you know, a significant amount of Spam we see is not touched by human hands. So, you have machines making decisions about whether you’re good or bad - and the bar is very, very high. So if you’re doing something to breach the norm, then you may never actually get to a human to decide whether you really are on the good or bad side.
Mike: So, I have to ask you, if you were talking to some guy who’s just starting with this stuff… You know, I think I know what best practice is but, there’s a lot of stuff that’s just not about basic web pages. There’s linkage data and all that. If you’re just new to the web and you have no linkage and don’t know much about the “black-box” side of search engines… what do you do?
Paul: He wants to be a good… Hmmm… I’d say the same thing as I’d say to some guy who’s just arrived in town and said, you know… I need a job. I’d say: “What are you good at?”
And then what kind of advice would you give that person? Well this is exactly the way the web works. I’d say knock on a few doors. Go and meet people. Talk to them and tell them what you’re good at. And the ones who will like you for that will put you in their address book or their filing system so they have a record of you. They can then refer to you when they have a question, or refer you to someone who may be looking for whatever it is you do. You become known for something and you become a member of a community. You mix in that community.
If you’re good at tennis, you join the tennis club. You become a member of a community and if you’re a genuine value provider, of genuine interest they will treat you and accept you as part of the community as well. And it doesn’t take as long as some people think it takes. The structures are already there, we’re refreshing the pages all the time. If a new page comes up, we may not find it straight away, so you can use paid inclusion so we actually know you’re there right at the beginning. And if you’ve got some links on your page going out, we’ll see where they are going and we’ll start understanding what community you’re in and… well I can’t go much further, but that’s the key to success. Certainly at our level of understanding. We’re mostly concerned with, are you in communities and are they good communities. Are you an authority or not an authority? And by that I don’t mean that you have to be the leading authority at something, it’s just about being weighted as part of that subject community.
Of course, we don’t know what the subject is, we simply can’t know exactly what all these subjects are, but when someone types in a word [or words] that brings up that community, as a subject specific community and related to that word [those words] then you’re part of that and we know. And you’ll be found quicker than you might think. It’s not that hard to do. And communities don’t have to be that big. If it’s a good community on a subject…
Mike: Okay, a pet subject here, while we’re discussing communities, and that’s the subject of themed web sites. I’m sure you know all about this. [Paul has a knowing smile here!] It’s long been bandied around that if you have a web site which sticks to one theme and one theme only, which is centred around a few keywords then this is the ticket to success. A themed web site wins by pure mass, or dense aggregation, or something…
[And timed perfectly with me getting animated... the waiter arrives with lunch... so let me hit the fast forward button << FF >> ]
So, the themed web site thing, we have some poor guys labouring away desperately trying to create one hundred pages of material on the same subject so the entire site talks about blue widgets, every page is a blue widget page…
Paul: You mean creating page after page on the same subject? Again, they’re focusing on the wrong thing…
Mike: Let me jump in again and put it this way: “Does the guy who has a blue widget web site with 100 pages beat the guy who has only one page - but one very IMPORTANT page?
Paul: No the larger site does not do better: Because we don’t count the number of pages. We care about this: Are other pages on the same subject considering this to be a GOOD PAGE. And you know, even Google and what they do and the other methods, they can’t do this. Sure, they do look at who’s referring to the page but they don’t look at the subject - the subject of the page. Yes, we look at all the information that the others do as well as everything else…
It’s important to understand that… Let’s go back to what I was saying earlier. For instance, if you came into this room and said: “I’m popular!” Maybe you had gone to the phone book and taken 2000, who knows, 20,000 addresses out of it and put them in your Palm Pilot and said to me: “Look how many people know me!” Now, if you create, say, one thousand sites with other peoples names and they all point back to you and say: “Look how many people know me!” I can look and say: “Funny, you tell me that 30,000 people know you - but, funnily enough - I’ve never met anyone who knows you!”
And that’s the way we work. If you want to be prominent then simply become known on your subject. Become good at what you do, become valuable to somebody else online for something. Go ahead and optimise your page - but don’t make stupid mistakes as we talked about earlier. If you’re selling, I don’t know, window dressings, just make sure you’ve got a term on there that says “window dressings”. You know, there are many people who make that kind of stupid mistake by not having the actual text on the page. And we are matching text at some level. The problem is that, some people may stick that in a graphic and we can’t read that.
But back to the main point: Become a member of your community. It’s not so hard. If you’re about something commercial there are many places you can go to get noticed. And then, of course, someone linking to you, well that’s a good reference.
Mike: Just going back to the “who knows you” thing, a while ago I discussed this with Andrei Broder, who was chief scientist at Alta Vista at the time. And we discussed the premise of fake linkage and attempting to artificially inflate popularity, as it were. You know, a thousand fake links from a thousand fake domains all pointing back to you. Of course, he said that, at first it may look like something on the map, but then you realise they’re not clever enough to notice that nobody then points to their fake domains.
Paul: You know, if I’m an expert on a subject and there’s a charlatan running around trying to parade as an expert on the subject - and I know this - I’m not going to point at them! That’s it. Especially if they’re actually belittling my area of expertise.
Again, it’s essential to come out of the realm of only thinking about this as the realm of ones and zeros, like it’s only complex mathematical equations and very complex architectures and just think about it this way: How do I relate to other people and organisations?
Mike: Paul - I just love the analogy you use about being new in town. It’s just so perfect about how we relate and interact with each other… It’s as much about you finding people as it is about them finding you… and that’s what the web is all about.
Paul: It even goes down to, and I guess I’ll credit the CLEVER project for naming them as hubs…and that’s the only relationship we have with anything they were doing. In fact that’s it, everything else we do is different…
Mike: Actually, it was Jon Kleinberg who coined the term hubs and authorities in the original HITS algorithm As I said in passing earlier, IBM owns the patent on CLEVER but hubs and authorities was coined by Kleinberg himself.
Paul: Philosophically our approach is the same - or similar should I say. But the methodology is not the same. In fact completely different.
Mike: Because yours works [laughs]
Paul: Yes, because ours works. Maybe there was a crazy scientist who wasn’t building a search engine - but was solving a problem. I mean, was Apostolos [Gerasoulis, founder of Teoma] doing that? He went down a route that he really knew how to go down and he wasn’t really thinking about what anybody else was doing. It’s pure innovation.
Mike: I think his background helped a lot. He has a fairly unique background in the way that he was already dealing with massive amounts of data… It just seemed to gel together the whole thing. And certainly, knowing what the obstacles were, I applaud him for doing that. In fact, I sometimes think - how the hell did he do that!
Paul: He’s an absolute genius. In fact beyond that…
Mike: I’ve been doing this job for many years now, so I’ve seen many changes in the technology and, perhaps more so, in the difference in the relationship which search engines have with search engine marketers, as we are now known. In fact, just a couple of years ago, it’s likely that we probably wouldn’t have had this conversation. Search engine optimisers were kind of, you know, Voodoo and the black art or something. How do you see it shaping up between both sides, I mean do you see a kind of preferred relationship situation with search engine marketing and a kind of recognition in the way that advertising agencies have with conventional media.
Paul: We’re already developing relationships with selected partners. We have our partners in the paid inclusion program. And these really are trusted partners. And any partner who violated that trust would get notice immediately. Because we allow them to provide us with information that expands our ability to understand what’s there, we have to be certain that it’s the right information as it’s just slightly below our defences.
Mike: So you’ve got two levels here: you’ve got third party suppliers who work on the pay for inclusion side. Whether that’s subscription or an XML trusted feed like at Position Technologies. And then at the next level you’ve got the guys with the search engine marketing firms, the smaller agencies (and the larger for that matter), can they just apply to become a partner?
Paul: Absolutely, but I do have to say that we limit the number because we can’t manage that many ourselves right now. Personally, I’m always open to new partners coming in. We’ll work with them as long as they meet a minimum threshold. If they prove themselves to be good partners and valuable assets to their own customers, which is very important to us, then we can choose to work with them on a more permanent basis.
Mike: Let me take you to the Teoma brand then. Where does that fit. I mean it is kind of overshadowed by Jeeves. Is Teoma out there to compete, I mean like with Google? I remember the launch and reading in the New York Times that it’s Teoma up against Google…
Paul: Well sure, it’s Teoma versus Google and all the other engines in the market place. Whether it’s branded Teoma and delivering results to Ask Jeeves and not being recognised is not that important as such. We think about ourselves more as Teoma Technologies not simply teoma.com and it’s our goal to get this next wave of understanding, this evolution, in fact revolution, of understanding in the structure of the web to as many users as possible. 25% of the web experiences Teoma right now and 15% of the web worldwide, 25% of users of the web in the US are actually, in some way or another, being exposed to Teoma technologies. So we’re not out there to build some sort of monolith with teoma.com but we are out there…
Mike: Can I just ask about some other technology? I was wondering about activity at the user interface. The amount of data and information you can pick up from user behaviour at the interface is very telling I guess. If we think about temporal tracking in the true sense, then what was a popular, or the authoritative result for a particular movie six months ago will not be the same as now.
Paul: Yes, there is a level of understanding which goes beyond the algorithm. It helps to determine “how useful is this page?” We collect a lot of this information and look at these numbers very closely. If you look at Ask Jeeves for instance, we layer in what’s called ‘The Knowledge Base’. When we see an opportunity which we consider to be statistically inside a range where we know that somebody is asking for something where we know we have additional knowledge then we pull that to the top. Basically we analyse and look at the GUI [graphic user interface] very closely. We have the Direct Hit technology, which I’m sure you’re aware of.
We own the patent on Direct Hit technology. You mentioned earlier about other search engines using that type of technology, but if anybody tells you they’re using Direct Hit technology - you better tell them to give us a call [laughs]
Click popularity, as it has been known, is a very important aspect of how to rank pages. It’s a different type of information, but the trick is knowing just ‘how much’ to use of it. It’s not really its own vehicle, it’s not really sufficient to be a stand alone search engine. It’s only good for a certain number of results and even for those results it’s only useful for certain fragments of information. So, we layer it into our algorithm simply as another factor in the ranking of every page.
Mike: That’s interesting because I wrote a piece in my newsletter a while ago about HotBot still giving credit to Direct Hit for results yet I was certain that they were Teoma results - I guess I was right?
Paul: Well, yes, but Direct Hit is still out there. You still see the two little Direct Hit men because we do have partners who use it in its pure form. As I say, we believe it’s a very important subset of the ranking process. Any information which we believe adds any value to the results that a customer [end user] is looking for, then we consider all of it.
Mike: Search engines come and go Paul don’t they? The big search engines of yesteryear are the minnows of today. I remember when Alta Vista was the mighty search engine: the true flavour of the month. Now we see a few major players buying everything up and seemingly squaring up: what do you think is happening?
Paul: I don’t know, you may have a better perspective than I do Mike [laughs] but I guess the large players are realising the power behind the algorithm. Algorithms scale more efficiently, more predictably than humans if you think about. Teoma is a great example. Because we can use the ‘hubs’ we don’t need one hundred editors working for us. We have 50 million editors working for us [big grin]
Mike: The work that Monica Henzinger [now head of research at Google] did on further developing the HITS algorithm was tested - and proved that it was frequently more accurate at classification and categorisation than human editors…
<< Stop >>
At this point I want to delve more deeply into algorithmic search. Paul is happy to continue the conversation, but says he’d be much more relaxed about it without the tape running. I switch it off and we talk for another 15 minutes during which Paul is very candid. This further information is reserved for the third edition of Search Engine marketing: The essential best practice guide.
Not tried Teoma yet? Check it here:
< http://www.e-marketing-news.co.uk/teoma >
Free document about HITS and linkage based algorithms here:
< http://www.e-marketing-news.co.uk/topic_distillation >
Thanks so much to Paul Gardi and Alexa Rudin at Jeeves. Don’t forget, you can meet Paul and I (and Alexa) at the Search Engine Strategies Conference in San Jose, August 18 - 21 2003 [see sponsor's link].













