Tailsweep
Svenska UK

Meny

  • Hem
  • Tailsweep
  • Tailsweep Blog Search
  • Tailsweeps Blogg
  • Google group
  • AddThis Social Bookmark Button

Projekt

  • Mammatus
  • Parhely
  • Haloe
  • AbstractCache
  • Utils

Arkiv

  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008

Sidor

Kategorier

    AJAX
    Backup
    BigTable
    Browser
    cache
    Geo
    haloe
    Hibernate
    Javascript
    Job
    Lucene
    Mail
    Monitor
    Monitoring
    MySQL
    optimization
    regex
    release
    SCM
    Server
    sharding
    Spatial
    Tools
    Uncategorized

Prenumerera

RSS Senaste nytt som RSS

Arkiv för August, 2008

Renaming of AbstractCache

Sunday, August 3rd, 2008

Since the AbstractCache project is mostly about storing tuples in various implementations I think that the name AbstractCache is misleading. The implementations CAN be caches but should just be treated as storage mechanisms for Tuples.

Can you come up with a new name for this project ? Perhaps something involving the word Tuple ?

Tags: cache, renaming, tuple
Postad i cache | 7 Comments »

BerkeleyDB support

Sunday, August 3rd, 2008

I read about using Lucene as a database on the Lucene mailinglist. Then someone threw in BerkeleyDB as an alternative. I thought yeah right, an Oracle db. It will probably be lightweight and easy to use NOT!

I was wrong however it is easy as hell and the tutorial on the Oracle webpage is super. I’m surprised to see such a full blown db with foreign keys, various constraints etc having such an easy API.

Check it out

And here is my Tuple implementation

Tags: berkeley db, cache, tuple
Postad i | No Comments »

Optimized SOLR indexing

Sunday, August 3rd, 2008

I noticed a really fast and cool way of posting docs to SOLR in the solr mailing list.

About the same config parameters are available to SOLR as in raw Lucene but instead of committing on RAM they commit on time and the number of documents. You should therefore estimate how much a document weighs in average and adjust the maxBufferedDocs accordingly.

By Jeremy Hinegardner

–clip clip–

If the xml files are available locally on the machine where the solr instances
lie you can instead tell solr to load the file from disk instead of transmitting
the file over http.

You have to set enableRemoteStreaming=”true” in the solrconfig.xml and then your
curl request would I think be:

curl -d stream.file=/tmp/post.xml http://localhost:8983/solr/update

–clip clip–

Tags: indexing performance, solr
Postad i Uncategorized | No Comments »

Optimized Lucene indexing

Sunday, August 3rd, 2008

There are some nice indexing settings which I think emerged in lucene-2.3.2 which you can use to increase the writing speed.

I use settings like this:

IndexWriter indexWriter = new IndexWriter(directory, this.getAnalyzer(), new KeepOnlyLastCommitDeletionPolicy(), new IndexWriter.MaxFieldLength(5000));

indexWriter.setMaxBufferedDocs(10000);
indexWriter.setMaxBufferedDeleteTerms(10000);
indexWriter.setMergeFactor(10);
indexWriter.setUseCompoundFile(false);
indexWriter.setRAMBufferSizeMB(50);
indexWriter.setMergeScheduler(new ConcurrentMergeScheduler());

With this settings I managed to write 100000 documents which each is about 2K in about 12 sec ~ 10000 docs per sec. Optimization took 1.7 sec. All this on a a slow laptop with a 5400 RPM SATA disk.

Update: I forgot to turn on tokenization: It took 30 sec and 6 sec to optimize which is about 3 times slower.

If you are prior to 2.3.2 you can use IndexWriter.ramSizeInBytes() on each add/update of a document to estimate when it is time to flush (commit) the IndexWriter or create a bg job which runs every 5 secs or so and does the same. I typically used a combo of a background thread which commited on a ramThreshold and a foreground emergencyCommitThreshols before 2.3.2. This worked really well but since the code now has moved into the core of Lucene I think it is wise to give the Lucene guys a chance to sort it out.

Tags: indexing performance, Lucene, solr
Postad i Uncategorized | No Comments »

Copyright © 2007 Tailsweep AB

Tailsweep development Blog is proudly powered by WordPress
Entries (RSS) and Comments (RSS).