hirez: More graf. Same place as the other one. (Default)
[personal profile] hirez
The most useful tool I have found for keeping an offline copy of an LJ + comments is the mildly-obviously named 'LJArchive'. It also manages to provide a fairly rapid full text search of entries + comments, which is really very useful indeed.

As far as I can tell (-> . <- this far) it's written in M$ C#, has been abandoned by the author and hasn't worked for some number of months due to $Random-XML-error which appears to be inside the comment-parsing code.

I was going to chunter on about this being unfixable b/c it would require spending ££ on the relevant part of the M$ toolchain, but it seems that the 'Express' version is free for the download (and presumably in exchange for all sorts of details that M$ can use to sell me things).

So instead of whining about it, I'd better bag yon thingy and see if the code is amenable to tinkering by a Unix Curmudgeon.

Date: 2011-12-14 08:28 pm (UTC)
From: [identity profile] hirez.livejournal.com
Lucene appears to eat all the memory in the world and frag its indexes at the first sign of trouble... No, wait. The other one. Solr.
Edited Date: 2011-12-14 08:30 pm (UTC)

Date: 2011-12-15 10:34 am (UTC)
From: [identity profile] quercus.livejournal.com
Lucene and Solr are much the same thing internally. Lucene is nuts and bolts for building one, Solr is one I built earlier, all in a box. I've only ever used Lucene, so I don't know really what the boundary between Solr and Nutch is.

Mostly I've sat Lucene on top of Oracle, so the underlying indexes were well-behaved anyway.

Date: 2011-12-23 03:12 pm (UTC)
reddragdiva: (Default)
From: [personal profile] reddragdiva
Correct. Lucene is the worst text search except all the others. It is fat and slow and horrible and works pretty well. Any serious use requires a separate box with several gig of memory just for search.

May 2025

S M T W T F S
    123
45678910
11121314151617
18192021222324
2526272829 3031

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 22nd, 2026 01:39 pm
Powered by Dreamwidth Studios