Monday, January 24, 2005

Tools for reading

Since I got my iBook, I've been using Bibdesk to manage a BibTeX database of the papers I've downloaded (and bothered transferring to my laptop). It's quite cool. Well, once you get past a few bugs, it's quite cool. It ties in with Preview's recent documents or will move a file that you select to your "papers folder" (it won't do both as far as I can tell).

Whilst this is pretty cool, I've been thinking that BibTeX is a little annoying (what with it wanting to rewrite capital letters unless you use braces as the string delimiters, etc). Lately I've been thinking that perhaps an XML format might be just what the doctor ordered. DC probably supports as many fields as necessary. The only problem might be the bunch of publication types that bibliography styles typeset differently, though I'm sure that there is a DC element (and maybe a dictionary of terms) to describe document types.

In any case, I'm too lazy to switch from something that works (BibTeX, et al) to something that I'd have to write myself (because using someone else's would defeat the purpose :-).




Other things that might be useful are some software to convert PDF and PS documents to text and an automatic text condenser and an outline editor. The former would help process those pesky documents without abstracts, whilst the latter would help note-taking and the like. If anyone happens to read this and knows of any free software (for Mac OS X, or UNIX) which fits the descriptions here, post links in a comment.

2 comments:

Anonymous said...

Hi, Thomas. I'm one of the main developers of BibDesk. Glad to hear you like it. I've got a few links and tips for you.

First, I'll double check that choosing one of Preview's recent documents works as expected. It really should move it to a papers folder even then.

Also, if you post the bugs you find on our sourceforge bug tracker (assuming they're not duplicates of bugs we haven't gotten to yet), we'll keep track of them that way. Of course, patches and fixes are even better!

As far as an XML format, you might be interested in reading
Bruce D'Arcus' weblog [1], in which he discusses XML bibliographic formats and OpenOffice.org's bibliographic efforts [2].

Also, I know of a pdftotext utility that dumps the text of a PDF (as long as it's not just an image). BibDesk uses that utility to implement the 'preview as text' functionality. you can try it out from an editor window with a publication that has an attached PDF. A drawer with the text should open. pdftotext is from the XPDF program [3]

It sounds like you might find the OS X "Summarize" service useful also - highlight some text and go to the application menu and choose "Summarize" from the Services menu. You can call services programmatically too, if you want to write a program to do it. I think the API is NSPerformService(), but I might be wrong.

Hope this info comes in handy, and please feel free to keep posting comments and feedback about Bibdesk.

Sincerely,
Mike McCracken

[1] http://netapps.muohio.edu/blogs/darcusb/darcusb/
[2] http://bibliographic.openoffice.org/biblio-sw.html
[3] http://www.foolabs.com/xpdf/download.html

thsutton said...

The only problem with programs like pdftotext and pdf2ascii is that PDF generators occasionally decide to encode the text in drawing order, not textual order, meaning that for two column layouts we get "line one of column one, line one of column two, line two of column one, line two of column two..." instead of what it should be.

I haven't actually looked at this phenomenon in any detail, but I imagine that it is probably something to do with distilling (from PS or DVI or whatever) as opposed to going straight to PDF.

Other than that, the various PDF to text extractors work quite well. I seem to recall seeing one that tried to give you some markup as well, but I might be confusing a converter for a different format.