The most useful tool I have found for keeping an offline copy of an LJ + comments is the mildly-obviously named 'LJArchive'. It also manages to provide a fairly rapid full text search of entries + comments, which is really very useful indeed.
As far as I can tell (-> . <- this far) it's written in M$ C#, has been abandoned by the author and hasn't worked for some number of months due to $Random-XML-error which appears to be inside the comment-parsing code.
I was going to chunter on about this being unfixable b/c it would require spending ££ on the relevant part of the M$ toolchain, but it seems that the 'Express' version is free for the download (and presumably in exchange for all sorts of details that M$ can use to sell me things).
So instead of whining about it, I'd better bag yon thingy and see if the code is amenable to tinkering by a Unix Curmudgeon.
As far as I can tell (-> . <- this far) it's written in M$ C#, has been abandoned by the author and hasn't worked for some number of months due to $Random-XML-error which appears to be inside the comment-parsing code.
I was going to chunter on about this being unfixable b/c it would require spending ££ on the relevant part of the M$ toolchain, but it seems that the 'Express' version is free for the download (and presumably in exchange for all sorts of details that M$ can use to sell me things).
So instead of whining about it, I'd better bag yon thingy and see if the code is amenable to tinkering by a Unix Curmudgeon.
no subject
Date: 2011-12-14 12:52 pm (UTC)no subject
Date: 2011-12-14 01:05 pm (UTC)Really I'd rather not bother at all, but no bugger else looks like they're padding up and windmilling a Stuart Surridge ('Stepping up to the plate' is an Americanism with which I shall have no truck), I know root(fuck-all) about C#, merrily detest XML and haven't written production Winders code since, er, 1991.
no subject
Date: 2011-12-14 01:10 pm (UTC)no subject
Date: 2011-12-14 09:52 pm (UTC)[FX: Converts code from VS-2005]
[FX: Can't build debug version for reason which I'm sure makes perfect sense]
Hm. Ok. Good. Existing version is throwing the right XML error again. However, I need to see the XML it's attempting to parse.
[FX: Installs Wireshark...]
no subject
Date: 2011-12-14 10:19 pm (UTC)no subject
Date: 2011-12-14 10:19 pm (UTC)[FX: Grovelling through results]
Huh? It appears to fail while trying to parse a DTD from w3c.org. WTF?
no subject
Date: 2011-12-14 10:27 pm (UTC)no subject
Date: 2011-12-14 11:43 pm (UTC)no subject
Date: 2011-12-14 11:52 pm (UTC)It blows up big-style if I point www.w3.org at 127.0.0.1
I wonder if one could hack those bits out?
no subject
Date: 2011-12-15 10:24 am (UTC)DTDs though, especially not for HTML, just don't need to be retrieved from the canonical w3 each time. It's not uncommon, but it's still crappy coding to rely on this.
I presume that the w3 site here gets hammered so much they must front-end it with a squid the size of Cthulthu.
no subject
Date: 2011-12-15 10:31 am (UTC)Which might make working out which part of the DTD it's failing to parse somewhat simpler. Might.
no subject
Date: 2011-12-15 11:27 am (UTC)I don't see how just being Windows would break the ability to frob DTD retrieval by spoofing the public identifier?
no subject
Date: 2011-12-15 12:06 pm (UTC)20052004, have turned out to be somewhat less than optimal.Google seems to show that this is A Thing for C#/.NET
no subject
Date: 2011-12-14 02:36 pm (UTC)Otherwise just blag the high-level design and knowledge of how to poke the LJ server (which is the hard part), then clean-room it in Scala, Python or $FAVOURITE_THING_THIS_WEEK. Things that can fall over with XML errors are usually indicative of brain-dead roll-your-own-DOM coding in the first place.
no subject
Date: 2011-12-14 02:43 pm (UTC)no subject
Date: 2011-12-14 05:02 pm (UTC)no subject
Date: 2011-12-14 08:28 pm (UTC)no subject
Date: 2011-12-15 10:34 am (UTC)Mostly I've sat Lucene on top of Oracle, so the underlying indexes were well-behaved anyway.
no subject
Date: 2011-12-23 03:12 pm (UTC)no subject
Date: 2011-12-14 01:06 pm (UTC)no subject
Date: 2011-12-14 01:10 pm (UTC)no subject
Date: 2011-12-14 01:23 pm (UTC)It certainly used to: older posts have comments attached, but nothing recent does. I hadn't noticed that before, and I don't remember seeing the pop-up before, but posts downloaded prior to this conversation are also minus their comments.
I feel it's rather cheeky to say can I have a copy if you do fix it :) The nearest I've come thus far to making C#'s acquaintance is a few unpleasant brushes with Managed C++, so I fear I'm quite unlikely to be able to offer particularly useful assistance.
no subject
Date: 2011-12-14 01:39 pm (UTC)If (massive if) I get the thing working, I'll no doubt jabber about it. Although the last time something similar happened, it turned out my fixing was entirely redundant and gave me a migraine.
no subject
Date: 2011-12-14 01:56 pm (UTC)That would explain it :(
I really should remember that utilities like this are often things whose innards one can poke if one wants. It just never occurs to me. I don't know why not.
no subject
Date: 2011-12-14 08:39 pm (UTC)no subject
Date: 2011-12-14 07:02 pm (UTC)