Gordon And Mike's ICT Podcast: Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10] on Steno.fm

Gordon And Mike's ICT Podcast

Flat World Strategies: Google and Search Wikia, Search Technology Explained [23:10]

January 07, 2007 11:42am

23m

Intro: Right before the 2006 holidays Jimmy Wales, creator of the online encyclopedia Wikipedia, announced the Search Wikia project. This project will rely on search results based on the future sites community of users. In this podcast we take a look at popular search engine technologies and discuss the Search Wikia project concept.

Question: I know this project was really just announced. Before we get into the technology involved - can you tell us what phase the project is in?According to the BBC Jimmy Wales is currently recruiting people to work for the company and he's buying hardware to get the site up and running.

Question: What makes this concept fundamentally different than what Google or Yahoo! Are doing?When Wales announced the project he came right out and said it was needed because the existing search systems for the net were "broken". They were broken, he said, because they lacked freedom, community, accountability and transparency.

Question: This sounds a lot like digg - am I on the right track?Yes you are - what you end up with is a digg like application, or what Wales is calling, a "people-powered" search site.

Question: Can you provide a bit more detail on how Google works?Googlebot is Google's web crawling robot. Googlebot finds pages in two ways: through an add URL form, www.google.com/addurl.html, and through finding links by crawling the web.

Source: www.google.com

Question: That's Googlebot, how does the indexer work?Googlebot gives the indexer the full text of the pages it finds. These pages are stored in Google's index database. This index is sorted alphabetically by search term, with each index entry storing a list of documents in which the term appears and the location within the text where it occurs. This data structure allows rapid access to documents that contain user query terms.

Source: www.google.com

Question: So now that everything is indexed, can you describe the search query?The query processor has several parts, including the user interface (search box), the "engine" that evaluates queries and matches them to relevant documents, and the results formatter.

PageRank is Google's system for ranking web pages. A page with a higher PageRank is deemed more important and is more likely to be listed above a page with a lower PageRank.

Source: www.google.com

Question: Can you run us through, step by step, a Google search query?Sure - this is also off of Google's site, Here's the steps in a typical query process:

1. User accesses google server at google.com and makes query.

2. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book--it tells which pages contain the words that match any particular query term.

3. The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

4. The search results are returned to the user in a fraction of a second.

Source: www.google.c