Friday, January 14, 2005

Do No Evil

We all love google, don't we?. You don't know about a particular programming paradigm, you google. You want to know about latest news, you google. Even if you don't know what you are looking for you still google. Google, today is one of the those companies which commands respect and for that matter envy from a lot. But still a lot of times, while searching, we end up finding something that we never even wanted to look for. Consider this, while I was preparing this blog I did a bit of research on Google, it's algorithms and stuff et al. I started with some random searches like "Google search engine", "Search Engines design". Yes I got a bit of what I was looking for but nothing to say "Whoa!". Clever users could rebuke me that I must be more specific about my search try to put a quote there, use a more specific keyword, use OR keyword, you NOT operator and even use the "means like(~)" operator. I don't agree. Explain or I flame you, is that what you are thinking. Then in that case I owe you an explanation.

PageRank, root cause of evil.

Google indexes it's documents according to PageRank. A bit on PageRank is due for explanation before we proceed. What page rank means that every page has a rank, I bet even your cat knows that. But every ranking has a policy, you could rank yourself as someone senior in your office, while someone could rank you as a bonehead So what matters to us as users is how exactly does google do ranking of pages. Just over a decade back you could search for "President of United States", chances were that you would be taken to a page which would have "President Of United States" written 500 times and then in the end followed by "Sucks". Google changed all that, it started searching on the basis of hyperlinks. Consider this you have set up a webpage of yourself, you would have written on your frontpage "Welcome to my Webpage". With a lot of information about your "Projects" and "Family". In a tiny corner lies "About Me". So how come if I search for "Michael Homepage" I get on to your webpage. Well if you have a lot of friends then, they would have started to put up links like visit "Michaels homepage". So what matters to google is that as long people have interest in you, they would be setting links to you and you would get to there. This simple approch single handedly changed the way search and research was done.

So what's wrong with this. Pretty much all. If you use google on a regular basis you would be surprised about the amount of "spam" that you get. I no longer can restrict myself to "Top 10", a lot of times what you find in "top 10" well is not "top 10" for you. Why? Just about a year back, I could confidently restrict myself to those "top 10" and be happy about it. The trouble is that as people have realized about the basis of PageRank they have been setting up link farms. So you could pay $10000 and the company would set up 1000 links to your webpage with a lot of informative hyperlinks. And if you really want in that "Top 10", also set up a contract with some Indian SEO Company ready to optimize your website for $10000. Just choose your pick. The sad truth is that cooperates today are ready to do pretty much anything to put your webpage on "Top 10". Even sell their souls.

Personalization, Tom, Dick and Harry.

You are searching for "Operating Systems". And so has been your blonde girlfriend, who decides to impress you with your geeky IQ What do you get? Your girlfriend and you would get the same set of links. It's unto you to filter it to satisfy your quest for variety of operating systems and if you are bit (un)lucky you would have to filter this stuff for your girlfriend too, albeit using a different approach. People who spend their day specializing there searches have point. Why don't I try to put in Quotes, try to be specific, like do I want to know about Memory Management, try to add those keywords. Really. I thought I was promised "flying cabs" by 2010. Also consider this, today you searched for "+How +to make chicken", you as usual start filtering out all of these. Some are plainly out of context, they might deal with "Why did chicken cross the road?", Lots of links that you filter out, so couple of days later your girlfriend comes in and you have to make chicken for her, you search again, you get the same set of links, what Google lacks today is intelligence. Yes if I could save dime for every time I had to say that.

AD-sense, I _sense_ a rat.

Google has pretty amazing feature ad-sense, you have a webpage dealing with "Pets", adsense would "sense" that your webpage has information about pets and it would put up advertisements about "Selling and Buying Pets" on your webpage, sharing a percentage of profit with you. Adsense is where a lot of money for google comes in. This ad-sense, again relies on the "textual" rather than "contextual" information on your webpage. That’s where trouble starts, if you are programmer and you searching for "inheritance" in comp.lang.c++, chances are that you will get links like


Sponsored Links

Cash for Heirs in Probate

Credit & job status not important.

Fast Cash to Heirs & Beneficiaries!


Inheritance Cash Advance

If your Inheritance is in Probate &

you want your money now - call us!


XML Inheritance Help

XML Schema Editor - Free Trial!

Visually Derive XML Data Models



And no Huston, I did not make them up.

Asta LaVista Baby.

People have often commented that because of monoply of Microsoft, we have lost a lot of innovation. The trouble is that people have learnt to live with Microsoft, if you place in front of them any other Operating system, they won't use it simply because, it won't run their favouraite Game or Email program. Google does not have that that sort of backup. If IBM does come out with it's promised "Natural Language Understandable" search engine along with an index as wide as Google, I would say "AstaLavista Baby".


Being a programmer, has taught me a couple of things, one of them being "Never Underestimate Complexity". If its cold outside, it could very well be burning inside. Surely Google is doing a great job. It has to write a lot of clever algorithms and manage thousands of computers. No mean feat. But, I feel a bit perplexed every week when google comes out with a new product. Why not just just get to the drawing board and think a bit about how we can essentially improve search as a whole? Why not fulfill that promise of "flying cab"? Why not?

So long people.


