Saturday, April 13, 2013

The Google File System (GFS)

A brief overview on GFS.

GFS development was motivated by need of a scalable distributed file system. GFS supports large-scale data processing workloads on commodity hardware. In GFS files are divided in to fixed size chunks. And replicated over chunkservers to deliver aggregate performance and fault tolerance. Each chunk has a unique 64 bit chunk handle.

GFS has single master for simplicity and multiple chunkservers(replicas). Master and chunkservers coordinate using heartbeat messages. GFS is fault tolerant and supports TeraBytes of space.

Here is the architecture diagram from GFS paper.



In above diagram, the GFS client contact GFS master to obtain chunk location. And then contact one of the chunkservers to obtain data.

Reference:

No comments: