La tecnologia di Google

Qualche informazione interessante sulla tecnologia di Google, in particolare su Google File System e su MapReduce:

It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers. We’re not aware of any other sorting experiment at this scale and are obviously very excited to be able to process so much data so quickly.

An interesting question came up while running experiments at such a scale: Where do you put 1PB of sorted data? We were writing it to 48,000 hard drives (we did not use the full capacity of these disks, though), and every time we ran our sort, at least one of our disks managed to break (this is not surprising at all given the duration of the test, the number of disks involved, and the expected lifetime of hard disks). To make sure we kept our sorted petabyte safe, we asked the Google File System to write three copies of each file to three different disks.

Ci vogliono sei ore e due minuti per ordinare un petabyte (mille terabyte, un milione di gigabyte) di dati su 4000 computer! I dati vengono scritti su 48000 dischi rigidi, e durante ogni test almeno uno dei dischi si rompe, per cui ogni file viene scritto su tre diversi hard disk.

Qui l’articolo completo:

Official Google Blog: Sorting 1PB with MapReduce

