The most useful tool I have found for keeping an offline copy of an LJ + comments is the mildly-obviously named 'LJArchive'. It also manages to provide a fairly rapid full text search of entries + comments, which is really very useful indeed.
As far as I can tell (-> . <- this far) it's written in M$ C#, has been abandoned by the author and hasn't worked for some number of months due to $Random-XML-error which appears to be inside the comment-parsing code.
I was going to chunter on about this being unfixable b/c it would require spending ££ on the relevant part of the M$ toolchain, but it seems that the 'Express' version is free for the download (and presumably in exchange for all sorts of details that M$ can use to sell me things).
So instead of whining about it, I'd better bag yon thingy and see if the code is amenable to tinkering by a Unix Curmudgeon.
As far as I can tell (-> . <- this far) it's written in M$ C#, has been abandoned by the author and hasn't worked for some number of months due to $Random-XML-error which appears to be inside the comment-parsing code.
I was going to chunter on about this being unfixable b/c it would require spending ££ on the relevant part of the M$ toolchain, but it seems that the 'Express' version is free for the download (and presumably in exchange for all sorts of details that M$ can use to sell me things).
So instead of whining about it, I'd better bag yon thingy and see if the code is amenable to tinkering by a Unix Curmudgeon.
no subject
Date: 2011-12-14 02:36 pm (UTC)Otherwise just blag the high-level design and knowledge of how to poke the LJ server (which is the hard part), then clean-room it in Scala, Python or $FAVOURITE_THING_THIS_WEEK. Things that can fall over with XML errors are usually indicative of brain-dead roll-your-own-DOM coding in the first place.
no subject
Date: 2011-12-14 02:43 pm (UTC)no subject
Date: 2011-12-14 05:02 pm (UTC)no subject
Date: 2011-12-14 08:28 pm (UTC)no subject
Date: 2011-12-15 10:34 am (UTC)Mostly I've sat Lucene on top of Oracle, so the underlying indexes were well-behaved anyway.
no subject
Date: 2011-12-23 03:12 pm (UTC)