Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
BitVault
BitVault is a content-addressable distributed storage system, developed by Microsoft Research in China. BitVault uses peer-to-peer technology to distribute the tasks of storing and managing data. As such, there is no central authority responsible for management of the system. Rather, it is self-managing, provides high availability, reliability and scales up in a self-organizing manner, with low administrative overhead, which is almost constant irrespective of the size of the distributed overlay network.
BitVault system is best suited for reference data, which is large amount of data which changes very infrequently. Such data include archives of out-of-date data, as well as multimedia data like music and video, which, even though might be frequently used, changes very rarely.
Every participating peer node in BitVault architecture is a Smart Brick, which is a trimmed down PC with large disks. All Smart Bricks in a BitVault system are connected by a high-bandwidth, low latency network. A BitVault system can be easily scaled up – any computer can be configured to act as a Smart Brick by simply installing the BitVault software, and connecting it to the network, without any need for interrupting the already working nodes.
BitVault stores immutable data objects, i.e., objects which cannot be changed. The physical location of the objects is not fixed and can be on any of the bricks. Its location changes depending on its frequency of access; it can even be replicated at more than one brick. To get around this problem of changing locations, BitVault makes it accessible by means of a 160-bit key, which is unique for each object. The system dynamically references the location from which the object can be retrieved most efficiently, by using the key, and makes the object available. The unique key is generated from a hash of the data of the object, thus making the system content-addressable, as opposed to location-addressable. The hashes of the objects (key) are mapped to the physical addresses using hash tables, which are internally managed by the system and do not need any user intervention. Different sets of nodes maintain different sets of hash tables, which concern with only the data in that set of nodes, thereby giving rise to an overlay network in which the location of the data is tracked by a distributed hash table (DHT) architecture.
The BitVault architecture is composed of multiple bricks which constitute a logical 160 bit address space, each associated with hash of some data. The association is maintained in a Distributed Hash Table (DHT). The DHT partitions the entire hash table into smaller hash tables. For example, if there are n peers, the hash table would be divided into n hash tables, each starting from the row next to where its immediate predecessor ended. Each DHT has its associated brick, and the extent of the logical address space a brick is responsible for is called its Zone. The bricks communicate using peer-to-peer technology, over the Membership and Routing Layer (MRL). Lookup of any data object can be done by n bricks in parallel, in its own zone, giving an efficiency of O (log N).
Multiple copies of a single object, called replica, are stored in the BitVault system, to give enough redundancy. If any index is damaged, the nearest replica can be notified to start its repair. And if the index notices that the replica is damaged, it can initiate the repair of the replica. This method of error recovery is called the Object Driven Repair model. In order for this to work, there needs to be a membership service running which will give a logical ordering to the peers. This is achieved using the MRL. The membership service guarantees that any addition or removal of a brick is eventually and reliably informed to every other live bricks. The MRL is also responsible to route messages to and from bricks and its associated DHTs.
The MRL uses a one hop DHT to perform routing, i.e., it never takes more than one hop over a peer to route messages, when the BitVault system is stable, i.e., no new bricks are added, nor is any load balancing or repair going on. The MRL is implemented using an XRing architecture, which maintains a distributed routing table which facilitates one-hop routing.
A brick registers itself with the MRL with a 160 bit key that forms its identifier, and its zone in the DHT is from its id to just before the id of its next logical successor. The brick architecture is divided into two parts – the Index Module and the Data Module. The index module keeps a list of the list of all the replicas that are cached by the disc, mapped with their hashes. In addition, for each object that is stored, the IM also keeps a list of locations of all other replicas of the object. IM listens to the MRL and updates itself according to membership changes and also according to data being entered into BitVault system or being retrieved from it. The IM is also responsible to initiate repair of replicas once it is informed of a damaged one, and to ask for repair of replicas in its store. The IM is connected to a small Access Module, which serves as the gateway to external clients. Data module stores replicas of objects to a local disc. Along with the object, its metadata such as its hash key and its degree of replication in the BitVault system is also kept.
Hub AI
BitVault AI simulator
(@BitVault_simulator)
BitVault
BitVault is a content-addressable distributed storage system, developed by Microsoft Research in China. BitVault uses peer-to-peer technology to distribute the tasks of storing and managing data. As such, there is no central authority responsible for management of the system. Rather, it is self-managing, provides high availability, reliability and scales up in a self-organizing manner, with low administrative overhead, which is almost constant irrespective of the size of the distributed overlay network.
BitVault system is best suited for reference data, which is large amount of data which changes very infrequently. Such data include archives of out-of-date data, as well as multimedia data like music and video, which, even though might be frequently used, changes very rarely.
Every participating peer node in BitVault architecture is a Smart Brick, which is a trimmed down PC with large disks. All Smart Bricks in a BitVault system are connected by a high-bandwidth, low latency network. A BitVault system can be easily scaled up – any computer can be configured to act as a Smart Brick by simply installing the BitVault software, and connecting it to the network, without any need for interrupting the already working nodes.
BitVault stores immutable data objects, i.e., objects which cannot be changed. The physical location of the objects is not fixed and can be on any of the bricks. Its location changes depending on its frequency of access; it can even be replicated at more than one brick. To get around this problem of changing locations, BitVault makes it accessible by means of a 160-bit key, which is unique for each object. The system dynamically references the location from which the object can be retrieved most efficiently, by using the key, and makes the object available. The unique key is generated from a hash of the data of the object, thus making the system content-addressable, as opposed to location-addressable. The hashes of the objects (key) are mapped to the physical addresses using hash tables, which are internally managed by the system and do not need any user intervention. Different sets of nodes maintain different sets of hash tables, which concern with only the data in that set of nodes, thereby giving rise to an overlay network in which the location of the data is tracked by a distributed hash table (DHT) architecture.
The BitVault architecture is composed of multiple bricks which constitute a logical 160 bit address space, each associated with hash of some data. The association is maintained in a Distributed Hash Table (DHT). The DHT partitions the entire hash table into smaller hash tables. For example, if there are n peers, the hash table would be divided into n hash tables, each starting from the row next to where its immediate predecessor ended. Each DHT has its associated brick, and the extent of the logical address space a brick is responsible for is called its Zone. The bricks communicate using peer-to-peer technology, over the Membership and Routing Layer (MRL). Lookup of any data object can be done by n bricks in parallel, in its own zone, giving an efficiency of O (log N).
Multiple copies of a single object, called replica, are stored in the BitVault system, to give enough redundancy. If any index is damaged, the nearest replica can be notified to start its repair. And if the index notices that the replica is damaged, it can initiate the repair of the replica. This method of error recovery is called the Object Driven Repair model. In order for this to work, there needs to be a membership service running which will give a logical ordering to the peers. This is achieved using the MRL. The membership service guarantees that any addition or removal of a brick is eventually and reliably informed to every other live bricks. The MRL is also responsible to route messages to and from bricks and its associated DHTs.
The MRL uses a one hop DHT to perform routing, i.e., it never takes more than one hop over a peer to route messages, when the BitVault system is stable, i.e., no new bricks are added, nor is any load balancing or repair going on. The MRL is implemented using an XRing architecture, which maintains a distributed routing table which facilitates one-hop routing.
A brick registers itself with the MRL with a 160 bit key that forms its identifier, and its zone in the DHT is from its id to just before the id of its next logical successor. The brick architecture is divided into two parts – the Index Module and the Data Module. The index module keeps a list of the list of all the replicas that are cached by the disc, mapped with their hashes. In addition, for each object that is stored, the IM also keeps a list of locations of all other replicas of the object. IM listens to the MRL and updates itself according to membership changes and also according to data being entered into BitVault system or being retrieved from it. The IM is also responsible to initiate repair of replicas once it is informed of a damaged one, and to ask for repair of replicas in its store. The IM is connected to a small Access Module, which serves as the gateway to external clients. Data module stores replicas of objects to a local disc. Along with the object, its metadata such as its hash key and its degree of replication in the BitVault system is also kept.