AbstractCache
Initially developed at Eniro.se when I worked there but after I left they accepted that I released the code open source so it can be maintained and spread.
AbstractCache's main focus is to deliver various cache and storage solutions which each addresses common challenges. All implementations follow the java.util.Map interface which makes it very easy to plugin. If your application's data could suffice by being stored/retrieved by unique keys then perhaps AbstractCache is something for you.
Tailsweep.com uses the HashedDiskCache for storing 10 million blog entries and we expect this to expand to a 1000M in six months but still we trust the HashedDiskCache.
Most of the implementations are tested in real life under extreme conditions but many are not. The Open Source Community can help out by using this fine library and reporting bugs. You can contact me regarding this project on: marcus dot herou at tailsweep.com
TODO
My call in life is to create an engine which guarantess state across all nodes, is fast, stable and maintainable. When the engine is completed it should be able to hook in various storage engines i.e. any cache available. Preferrable without a namenode which is the hard trick. Using a cluster file system like HDFS or GlusterFS is the key to success in this matter I believe since it removes one variable in the equation. The easy way to achieve this is to use the HashedDiskCache on GlusterFS but it has it's drawbacks in maintainability spreading millions of files in subdirs.
the todos
- Symmetric cluster where all key/pairs reside on all machines. Initial work is started and near completion in SymmetricJgroupsCacheEngine which suites cache data but not yet reliable enough to store business critical data.
- Asymmetric sharded cluster which replicates each key/value X times and provides a unified state. Initial work is started in AsymmetricJgroupsCacheEngine which suites cache data but not yet reliable enough to store business critical data.
- HashedDiskCache should have an index at the top which knows what key maps to what file.
News
- 2008-06-08 Added thrift, which is a perfect solution for sharing state between languages and applications, currenty only support for binary data though.
- 2008-06-08 Added key locking (i.e row locking) where you just supply a org.apache.lucene.store.LockFactory to the cache, a Distributed Lock Manager will soon be released.
- 2008-06-07 Added a draft to a HTTP API to the cache in form of an embedded Jetty server.
- 2008-05-25 Added HBaseCache which uses hbase as storage. Intended for use in large clusters where HA is key.
- 2008-05-08 Replaced the JDBM B+TRee implementation with Apache Xindices implementation which is 10 times faster on writes.
- 2008-05-07 Added a Lucene Storage engine which performs really well.
Features
- Fast and Flexible
- Spring support - Most modern Java modules today have spring support so does this one.
- Generics support - Only want to store instances of the same type in the Cache huh ?
- Support for Hibernate as 2nd level cache - All caches can be used as 2nd level cache by wrapping them in the "HibernateCacheStrategy"
- Various memory based cache policies
- Persistance - Many applications would prefer if the data wasnät lost whenever the applicaton is restarted and therefor many caches which persist the objects have been implemented or each cache can be wrapped in other caches which persist the data.
- Cache chaining/wrapping. You can wrap a cache in another for instance a LruCache wrapped in a TimedCache making it evict on time and on lru.
- Clustering with JGroups, RMI or my own Distributed engine based on TCP (fast as hell)
- MemCached client, both SpyMemcached and Greg Whalin's client is implemented and tested in heavy conditions.
- All caches support the method lock(key) which makes applications able to create transaction semantics ( no rollback supported though )
- All caches have listener support which gets notified on the put, remove, clear, lock, release events.
Cache Types
- Memory based
- LFU (Least Frequent Used)
- TreeMapLfuCache - This class uses a SortedMap(TreeMap) as lfu algorithm where all keys have a counter of how frequently they are used. The TreeMap provides guaranteed log(n) time cost for the containsKey, get, put and remove operations
- PriorityBufferLfuCache - This class uses a PriorityBuffer as lfu algorithm where all keys have a counter of how frequently they are used. A PriorityBuffer's add and remove ops performs in logarithmic time
- LRU (Least Recently Used)
- TreeMapLruCache - This class uses a SortedMap as lru algorithm where all keys have a timestamp of when last used. The TreeMap provides guaranteed log(n) time cost for the containsKey, get, put and remove operations
- PriorityBufferLruCache
- LinkedHashMapLruCache - A simple extension of the LinkedHashMap which performs quite well.
- FifoCache - An UnboundedFifoBuffer's remove() and get() operations perform in constant time. The add(Object) operation performs in amortized constant time.
- HashMapCache - Just a basic HashMap implementation with some extra features as locking on keys
- Disk based
- HashedDiskCache - Each key/value pair is stored in a separate file based on the hash of the key stored in a file hiearchy where the spread/depth is configurable. It does not perform as good as a single file based cache like the BTreeCache though since every key requires an opening/close of a file.
- BtreeCache - Fast btree implementation which implements the SortedMap interface i.e it can be used instead of a TreeMap for sorted collections which needs persistance. Very suitable for creating indices.
- HtreeCache - Fast hash based alternative to the BTree implementation when sorting is unimportant.
- HdfsCache - Uses HDFS (Hadoop Distributed File System) as storage device. Each object is stored in a new File which will be distributed evenly by Hadoop. The file path is determined in the same manner as in the HashedDiskCache. The reliability of the cache is determined by the reliability of the nodes in the HDFS. Since it uses HDFS it is extremely scalable.
- Distributed caches - (or DHT - Distributed Hash Table)
- MemCached - Wraps com.danga.MemCached.MemCachedClient client which communicates with a memcached server. Memcached distributes the memory on a consistent hash basis which means that the key lookups will be as consistent as possible when new nodes join or leave.
- JgroupsCacheStrategy - Uses JGroups for membership and notifications of when a cache state has changed so a cluster can contain the same state. It can use all other cache implementations as storage. The examples covers both replication of memory and disk. Since it keeps the same state on all nodes it has very good performance but it wastes disk or memory (or both).
- SymmetricJgroupsCacheStrategy - Uses JGroups for membership and notifications of when a cache state has changed so a cluster can contain the state. It can use all other cache implementations as storage. The examples covers both replication of memory and disk. It is a DHT implementation where the underlying cache state is duplicated to X nodes. It is very alike memcached however the difference is that it can store on which node the key is located even though all nodes goes down. This makes it suitable for creating diskbased caches where every added node's disk is used together. It is extremely important that this cache uses the TCP protocol of jgroups for performance reasons.
- SqlCache - A simple database wrapper which serializes data to/from a table named "Cache". Suitable for applications which want to have a single point of contact for persistent storage of objects and being able to look them up by a key. Any DataSource can be used for connecting to the database. If performance is an issue I would recommend using a Column-oriented DBMS
for instance MonetDB
instead of the normal RDBMS.
Create table syntax: create table Cache ( createTime bigint(20), lastAccessed bigint(20), version int(11), `object` blob, regionHash int(11), cacheKeyHash int(11), key region_idx (regionHash), key key_idx (cacheKeyHash), primary key(regionHash,cacheKeyHash) ) engine=InnoDB
Cache Strategies
A strategy is merely a wrapper around either a Cache or another CacheStrategy making mixing and chaining different cache characteristics possible
- Acegi
- Hibernate
- HibernateCacheStrategy wraps a cache in the interface of a org.hibernate.cache.Cache
- JGroups
- JgroupsCacheStrategy uses jgroups for membership handling notifications of when a cache state has changed so a cluster can contain the same state.
- Timed
- TimedMapStrategy - Wraps a java.util.Map and decorates all values with a specified eviction timeout.
- TimedCacheStrategy - Wraps a org.tailsweep.abstractcache.Cache and decorates all values with a specified eviction timeout.
- SaveState
- SaveStateStrategy - Saves the state of another cache in a HTree every specifiable milliseconds. The state is persisted in the background to avoid performance degradation of the primary cache. The cache is loaded with the state of the saved cache whenever it is recreated.