Yesterday I stumbled over an interesting article and I thought “hey, it would be nice to save it for some time later“. Then I thouht “just saving is not enough, because I want to know where I found it”. But manually writing the URL somewhere as well feels ugly. Therefore I am looking for some kind of document archival tool. Here is how it might work, givinen a command line interface and the invented name “nodoc”:
$ nodoc get http://labs.google.com/papers/disk_failures.pdf
$ nodoc list
disk_failures.pdf
old_files_on_disks.pdf
$ nodoc info disk
ambiguous file name “disk”
$ nodoc info failures
disk_failures.pdf
URL: http://labs.google.com/papers/disk_failures.pdf
Date: Sat, 03 Mar 2007 17:46:03 +0100
LastUpdate: Sat, 03 Mar 2007 17:46:03 +0100
Title: Failure Trends in a Large Disk Drive Population
$ nodoc tags failu + google disk statistics
$ nodoc list
disk_failures.pdf google,disk,statistics
old_files_on_disks.pdf
$ nodoc update
disk_failures.pdf: File up to date
old_files_on_disks.pdf: New version available.... done downloading.
$ nodoc search happyness
happyness not found (full text search in 2 documents)
I think you get the idea. Note that I expect it to read the meta-data from PDF files (which the mentionend article does not contain, but it’s just an example). Does anyone know of a simple linux program, text-based or graphical, that does that? I am considering to write it myself, but I don’t want to re-invent the wheel.
Have something to say? You can post a comment by sending an e-Mail to me at <mail@joachim-breitner.de>, and I will include it here.
I started hacking on a basic bash based upon your idea since it has some reasonable example of expected behavior. Xapian maybe what you need to extract metadata and keywords from the document. It has plugins to work with some file formats. But I dont think I've seen a cli program that has that command set. Interesting idea.
cheers,
Kev