More Related Content
Similar to CFS: Cassandra backed storage for Hadoop
Similar to CFS: Cassandra backed storage for Hadoop (20)
More from nickmbailey (8)
CFS: Cassandra backed storage for Hadoop
- 8. ©2012 DataStax
The Solution
• InputFormat/OutputFormat
• Unfortunately, still need a DFS
• Run tasktrackers/datanodes locally
• Data Locality FTW!
• Run namenode/jobtracker somewhere
• Since Cassandra 0.6 (the dark ages)
8
- 14. ©2012 DataStax
Static - Users Column Family
14
Row Key
nickmbailey password: * name: Nick
zznate password: * name: Nate phone: 512-7777
- 22. ©2012 DataStax
CF: inode
• Row Key = UUID
• Allows for file renames
• Secondary indexes for file browsing
• Columns:
22
Column
filename /home/nick/data.txt
parent_path /home/nick/
attributes nick:nick:777
TimeUUID1 <block metadata>
TimeUUID2 <block metadata>
TimeUUID3 <block metadata>
...
- 24. ©2012 DataStax
CF: sblocks
• Essentially, datanode replacement
• Stores actual contents of files
• Each row is an hdfs block
• Row Key = Block ID
24
Column
TimeUUID1 <compressed file data>
TimeUUID2 <compressed file data>
TimeUUID3 <compressed file data>
...
- 26. ©2012 DataStax
Writes
• Write file metadata
• Split into blocks
• Still controlled by ‘dfs.block.size’
• also ‘cfs.local.subblock.size’
• Read in a block
• split into sub blocks
• Update inode, sblocks
• rinse, repeat
26
- 28. ©2012 DataStax
Reads
• Check for file in inode
• Determine appropriate blocks
• Request blocks via thrift
• If data is local...
• ...get location on local filesystem
• If data is remote...
• ...get actual file content via thrift
28
- 29. ©2012 DataStax
What Else?
• Current Implementation: 1.0.4
• <property>
<name>fs.cfs.impl</name>
<value>com.datastax.bdp.hadoop.cfs.CassandraFileSystem</value>
</property>
• Supports HDFS append()
• Immutability makes things easy
• See the first incarnation
• https://github.com/riptano/brisk
29