hirez: More graf. Same place as the other one. (Default)
[personal profile] hirez
The most useful tool I have found for keeping an offline copy of an LJ + comments is the mildly-obviously named 'LJArchive'. It also manages to provide a fairly rapid full text search of entries + comments, which is really very useful indeed.

As far as I can tell (-> . <- this far) it's written in M$ C#, has been abandoned by the author and hasn't worked for some number of months due to $Random-XML-error which appears to be inside the comment-parsing code.

I was going to chunter on about this being unfixable b/c it would require spending ££ on the relevant part of the M$ toolchain, but it seems that the 'Express' version is free for the download (and presumably in exchange for all sorts of details that M$ can use to sell me things).

So instead of whining about it, I'd better bag yon thingy and see if the code is amenable to tinkering by a Unix Curmudgeon.

Date: 2011-12-14 11:43 pm (UTC)
From: [identity profile] quercus.livejournal.com
Which DTDs are even at the W3C? Probably the HTML ones, which it's generally a bad idea to depend upon anyway.

Date: 2011-12-14 11:52 pm (UTC)
From: [identity profile] hirez.livejournal.com
XHTML, I think.

It blows up big-style if I point www.w3.org at 127.0.0.1

I wonder if one could hack those bits out?

Date: 2011-12-15 10:24 am (UTC)
From: [identity profile] quercus.livejournal.com
Easiest thing (wronger than a wrong thing) would be to point w3.org at 192.168.1.some_handy_apache_box and stick local copies of the DTD up on it, at the right path.

DTDs though, especially not for HTML, just don't need to be retrieved from the canonical w3 each time. It's not uncommon, but it's still crappy coding to rely on this.

I presume that the w3 site here gets hammered so much they must front-end it with a squid the size of Cthulthu.

Date: 2011-12-15 10:31 am (UTC)
From: [identity profile] hirez.livejournal.com
Ugh. It's a Winders app, so I don't think that's going to work. I would suspect that a less-worse option would be to hoover down the DTDs and pull them from file://

Which might make working out which part of the DTD it's failing to parse somewhat simpler. Might.

Date: 2011-12-15 11:27 am (UTC)
From: [identity profile] quercus.livejournal.com
The DTDs _should_ be embedded into the exe by some convenient means. If these are the HTML DTDs (or &Raggett forbid, the XHTML DTDs), then they aren't changing any time soon.

I don't see how just being Windows would break the ability to frob DTD retrieval by spoofing the public identifier?

Date: 2011-12-15 12:06 pm (UTC)
From: [identity profile] hirez.livejournal.com
Right. I think what has happened is that assumptions made about the content (or encoding?) of the DTDs made in, er, 2005 2004, have turned out to be somewhat less than optimal.

Google seems to show that this is A Thing for C#/.NET
Edited Date: 2011-12-15 12:10 pm (UTC)

May 2025

S M T W T F S
    123
45678910
11121314151617
18192021222324
2526272829 3031

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 22nd, 2026 07:30 pm
Powered by Dreamwidth Studios