@ryanwwest

Search IconIcon to open search
Email Email

PKM Annotation Woes

2023-04-28

Update: Zotero 7 will be released in 2024 and now solves many problems discussed below including new EPUB/HTML annotation support on all major platforms and web. Annotations are will still only be accessible within Zotero.

While videos are the easiest to digest, much of the source material I learn from for clients, research, or for fun is in written form. Being the digital age, I exclusively read on laptops, my phone, or an ereader, and thus either read from a web browser or from a file. As I explained in my post Why I Blog and How I Automate it, I save papers, blog posts, ebooks, and other writing that I find valuable and want to remember to the source and citation management system Zotero , which I can then (sometimes) annotate, link to my Personal Knowledge Management System (PKMS) in Obsidian.

I prefer to save or archive these written forms primarily because of link rot—if I take notes on a URL and that URL become invalid or changes, I have to relocate the resource online and possibly never will (and 5-15% of new links will be broken after 1 year, depending on the study). This is particularly easy with written content since it usually takes up little space on disk (compared to, e.g., video). gwern.net has a great essay that dives into proactively archiving URLs, but manually deciding to save them is more valuable to me.

However, this brings up the question of how to locally store such written content and my annotations of that content. Academic papers are almost exclusively stored in PDF format, which is fine since they tend to follow a rigid layout (and is often the only choice). But other content, like blog posts and articles, doesn’t require the same layout and can thus easily reflow, or wrap to fit on smartphones, desktop monitors, and any size or aspect ratio in between. This flexible property is a big benefit, as it is inconvenient to read PDFs on any device smaller than my 13.3 inch Boox Max Lumi e-reader, and I want to retain it when saving files. The same applies for EPUB files which is the most common (and open) format for ebooks and is essentially a zipped folder of HTML. While it is easy to ‘print’ a webpage to PDF, reflow is then lost. By instead using the SingleFile browser extension, it is effortless to download any webpage into a single HTML file, including images, and then view it ‘offline’ thenceforth. The Zotero Connector browser extension makes this even easier, as it saves webpages (via SingleFile) or online PDFs directly into Zotero including metadata, which makes it easy to pick Zotero as my ‘source archive’ of choice.

It is critical to get annotations into my PKMS because then I can easily link to them in other Markdown notes or even directly embed the notes in other files and encounter them when searching or navigating through my vault, giving more chances to spawn new ideas and understanding. From within Zotero I can directly open the [paired source note] for each source Item in Obsidian using the MarkDB-Connect Zotero add-on and manually sync and resync PDF annotations made in Zotero to the same paired note with obsidian-zotero-integration.

#
Problems

However, at this point we reach a crossroads. As outlined above, in my opinion there are three common and important types of written files that should all be archivable and annotatable:

  1. PDF
  2. HTML
  3. EPUB

I prefer HTML and EPUB files over PDFs for their flexible viewability (reflow) and also for easier copy/paste and (somewhat) easier editing. Fabian Suchanek’s excellent blog post Best File Formats for Archiving agrees that PDFs are also best avoided when possible and recommends HTML and plaintext (i.e. Markdown) instead. Unfortunately, while Zotero can import, store, and manage any filetype (including HTML and EPUB) as an attachment to an Item entry, it can only directly open and annotate PDF files. This is a highly-requested feature but doesn’t seem to be a priority. I think annotating (meaning highlighting text, being able to take inline notes, etc.) is critical for pulling value from a source of information, as you may notice different things each time you reread. Past inline annotations let you encounter your past thoughts and make new connections even more easily than even having a companion paired source note somewhere else (and I think the act of annotating, just like the act of writing, helps you better pay attention to and internalize what you are reading). But this web -> Zotero -> Obsidian knowledge workflow only works with the inferior PDF format.

Offline/local annotations are stored either within a written file or in a separate file/database. Zotero stores them separately, so any PDF annotations you make in the Zotero PDF viewer aren’t visible if opening that same PDF externally (you can optionally export a PDF with the annotations embedded, however). Since Zotero can’t natively open HTML/EPUB files, you need an external viewer anyway and can find some that allow annotations there instead. But there are a few issues with this.

First, annotations are not well supported in general. PDF support many different flavors and styles of annotation depending on the reader application used, and aren’t always cross-app compatible (but at least Zotero is pretty good about importing annotations previously embedded in the file into its external annotations database). EPUBs are much worse—while there have been various efforts to create an open annotation EPUB standard, it seems they’ve never gotten very far or been adopted. Only a few EPUB readers support annotation, and to my knowledge, none are cross-app compatible. HTML files are even worse, as I’ve found almost nothing that supports local-first annotation. Hypothes.is shines for HTML annotation (and you can even have both public (shareable/linkable/replyable) and private annotations), but doesn’t support archived offline files out of the box.

Even once you find a good annotator program, you need a new way to sync annotations into your PKMS. Furthermore, if you use a smartphone, tablet, laptop, and/or desktop, you might need to find different applications to read and annotate each filetype across each OS and product.

#
App Options

My favorite solution would probably be the Obsidian Annotator Plugin, if it were not so buggy and unstable. It requires all your source material to be hosted within your Obsidian vault (meaning, somewhere in the same directory) but uses an offline version of the Hypothes.is client to let you read and annotate PDFs, EPUBs, HTML files, and even videos all within Obsidian (which is supported on all major platforms). This lets the paired source note that I explained in Why I Blog and How I Automate it literally store the entire annotation data of the source, not just keep a synced copy of them. The biggest problems are that it is unreliable and unoptimized for different platforms, sometimes only successfully opening target PDFs or EPUBs 20% of the time (HTML hardly works at all), and development has slowed to a crawl. The Markdown-JSON storage format of the annotations is also pretty cumbersome to work around. Your source notes (with annotations) and the source files themselves are also susceptible to getting disconnected if the file is renamed or moved, especially if you sync your PKMS (and sources) with something like Syncthing (this has happened to me many times). For this reason specifically I prefer Zotero because of its URL scheme, which gives you a cross-app, cross-OS, unchanging link to any source Item (and its attachments of any file format - even changing attachments is fine)(if you don’t use citekeys) and now even lets you target specific annotations to open to directly in Zotero (for PDFs). Zotero also isn’t dependent on Obsidian as the PKMS.

Excluding PDFs (which have ‘good enough’ cross-app annotation support, especially because of Zotero, Logseq, etc.), there are several programs that support annotating HTML/EPUB, but they are usually only available on one or two platforms. Some other options:

Unexplored, but promising:

Zotero clearly isn’t ideal as its annotations only support one of the three target filetypes and, while exportable, don’t use some universally readable annotation standard either. I still use Zotero because it has had strong development for over a decade, works offline and syncs on all major platforms (except Android…), manages any source items and attachments well, and is the open-source state-of-the-art for research citation management for my PhD work. The other recent features of PDF reading, annotation, ample PKMS integrations, and likelihood of me using it for years are enough to make it the best option for me right now.

#
Conclusion

So that leaves me in a rough spot. Looking at each of the three file formats that make up most written sources of information:

I thus heavily annotate and re-evaluate all PDFs, occasionally annotate EPUBs knowing that I probably won’t take the time to use them again, and never annotate HTMLs. In the latter two’s case, sometimes I’ll create a paired source note in my PKMS and take notes or directly copy-and-paste passages that I want to remember, with the disadvantage that if I reread them later without opening the paired source note, I’ll miss them. I could create PDFs of each EPUB and HTML file to get around this but think the loss of reflow for smaller screens (and duplication of data) is not worth it.

My OCD tendencies really portrayed themselves in this post. Regardless, I hope for the day that we have universal standards for annotation and that all of this will be a lot more seamless! If I get some free time eventually, maybe I’ll try creating an Obsidian-agnostic, browser-but-local version of its Annotator plugin that reliably uses self-hosted Hypothes.is to read and annotate all of these and more, and maybe operable inside Zotero (and storing annotations for all filetypes in its database) as well. Calibre’s EPUB viewer/annotator is great, but since it can’t support HTML or PDF, it doesn’t seem like the right path to pursue. One day, it would be nice to have a well-supported open annotation standard where different versions of the same content (PDF vs HTML, v1 vs v2) can be recognized as similar and annotations can be portably shown/ migrated across each or even in isolation (see beepb00p’s annotation vision), but that’s far off.


If comments are missing, refresh the page. Or read/write comments directly on Github.