Subscribe

Resources

« How Much Dark Fiber is Google Buying? | Main | Time For the Quarterly Google Craps Game »

April 13, 2006

Google CAPEX: Where the Money's Going

Telephone_lines_x A regular Internet Outsider reader roasted me for embracing dark-fiber conspiracy theories as an explanation for where all that Google CAPEX is going and provided the explanation below.  I have heard counter-arguments to this, which suggest that Google could not possibly require twice as much hardware as Yahoo! just for search alone, but said arguments were not nearly so detailed. 

Thanks again to the anonymous contributor (and to anyone else who cares to weigh in).  I'm happy to keep tossing out boneheaded theories as long as they prompt better-informed folks to share their knowledge.

Google's CAPEX is higher than Yahoo!s because of spending on hardware (i.e. machines, cpu's, memory,motherboards and hard drives).  Yahoo!'s search traffic is much smaller than Google's.  In the US, Yahoo is about half Google's size and internationally (save Japan)  they are further behind.  There is a (near) linear relationship between number of searches served and the number of machines needed to serve them.

Secondly, Google's search index is larger than Yahoo!'s, (about three times as large). This does not mean that Google needs three times as many machines, (there are tricks that can be done as not all searches need the full index), but it does nearly double Google's hardware needs.

Thirdly, Yahoo!'s non search properties need less machines per serving event than a search. To serve a webpage, requires bothering one machine, and each machine can probably deal with hundreds of requests a second.  To do a search requires bothering thousands of machines.  Thus, although Yahoo! has more total traffic, the hardware needed to serve it is much less.

Fourthly, Google's Adsense product requires a lot of machines.  In Google's last analyst presentation they disclose how many Adsense impressions they get. (128 per user/month for 68% of internet users. = 1B impressions a day.)  Adsense serving is similar to search (with less data to index, but more work required in book-keeping).

Finally, in Google's analyst presentation, they say "All webpages included in the Google index and searched all the time".  This suggests that they intend to increase their index size.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/417987/4662905

Listed below are links to weblogs that reference Google CAPEX: Where the Money's Going:

Comments

Henry, I have been a fan since your Amazon.com call. I was about 17 at the time. Im just busting your balls about having more content bro. You have a good site. Not as good as guckedgoogle.com , but still good.

I found a new one today called something Bastille's Search blog. Good sitte there too.

Not to perpetuate the dark fiber conspiracy, but there is an interesting theory played out in Robert Cringely's weekly PBS column. He says that Google IS buying fiber, but not to enter the ISP market. Instead, Google is building the infrastructure to support intense data feeds from video downloads, IPTV, VOIP, and other goodies expected from web 2.0. For those interested, check out the November 17th and November 24th posts. http://www.pbs.org/cringely/archive/2005.html

There a few small inaccuracies I'd like to address here

>> Secondly, Google's search index is larger than Yahoo!'s, (about three times as large).

Not true. Overall the 2 have about the same total index size, Google having a slight edge. The apparent massive disparity is due to both using different rules to decide what to dump from the total index when constructing the searchable index. Yahoo claimed to have indexed a total of 20 billion items in August '05, although much if it is clearly spam, so doesn't make it into the main search index

>> To do a search requires bothering thousands of machines

Not true. Search engines do a lot of preprocessing of their data, and break the index up into chunks (I think Google refer to them as "shards") which contain topically related datasets. When you enter a search at google.com, you first hit a load-balancing bank, which assigns your query to the datacentre likely to be able to answer you most quickly, based on your location, and current datacentre loading.

You then hit the query parser at the DC which looks at your query nd decides which shard, or shards are most likely to contain your answer. The query is sent to the relevant machine(s), and a SERP is served. Additionally, popular searches are run ahead of time, and cached, which is why popular terms / searches often come back with a sub-second reponse time. A given query might impact on tens of machines, but not thousands. It is still a much more expensive process than serving a web page though, and Google do serve more searches than anyone else.

>> Fourthly, Google's Adsense product requires a lot of machines.

And Yahoo's equivalent product, Yahoo Publisher Network, is run by ad fairies? AdSense is bigger because it's older, but YPN got a flying start out of the blocks due to all the "banned from AdSense" sites out there. I know a couple of major players (in the millions of contextual ad impressions served a day over their networks) who prefer YPN simply because it keeps them off Googles radar

I just can't believe that ALL the money is going on search. Yes Google have a higher search volume to service, but Yahoo aren't slouches either. Google are also very good at keeping their unit costs down on the machines they use. They created a custom OS and filesystem specifically to allow them to use cheap, off the shelf components in their DC machines, $1000 / unit to run the biggest SE on the planet.

Even with the current upgrade cycle, I just can't see how they could spend that much cash on search hardware. Maybe it not all going on dark fibre, but it isn't going on search...

Apparently all that money is going to fridges - I got a small fridge that says "Google AdWords: Cooler Thinking" on it - you think this might be their alternative to dark fiber? I know what you're thinking - this couldn't even put a dent in all that capital they have available. Before you come to that conclusion, though, keep in mind that it also comes with a connector to plug into a cigarette lighter (hello road trip) and a button to make it a food warmer instead of a food cooler....

This is a wonderful insight into the CAPEX discrepancy Henry, and it makes a lot of sense. Thanks for the detective work!

yes.ok

A common trade-off in computer science is between speed and storage space. Undoubtedly, one of the ways Google has made searches fast is by storing data is less efficient ways (for instance, redundant copies of data). It wouldn't surprise me if Google requires several times the storage space that its competitors require for an equivalent amount of data.

Henry -

Perhaps Goog is spending all their - i mean investor money on building a V network....http://biz.yahoo.com/fool/060417/114529297707.html?.v=2

good.

*sigh* mtv200 is another forum spammer from China. Cleanup on aisle 10!

Post a comment

This weblog only allows comments from registered users. To comment, please Sign In.

Sponsored by

Sponsors