PKM Annotation Woes

2023-04-28

Update: Zotero 7 will be released in 2024 and now solves many problems discussed below including new EPUB/HTML annotation support on all major platforms and web. Annotations are will still only be accessible within Zotero.

While videos are the easiest to digest, much of the source material I learn from for clients, research, or for fun is in written form. Being the digital age, I exclusively read on laptops, my phone, or an ereader, and thus either read from a web browser or from a file. As I explained in my post Why I Blog and How I Automate it, I save papers, blog posts, ebooks, and other writing that I find valuable and want to remember to the source and citation management system Zotero , which I can then (sometimes) annotate, link to my Personal Knowledge Management System (PKMS) in Obsidian.

I prefer to save or archive these written forms primarily because of link rot—if I take notes on a URL and that URL become invalid or changes, I have to relocate the resource online and possibly never will (and 5-15% of new links will be broken after 1 year, depending on the study). This is particularly easy with written content since it usually takes up little space on disk (compared to, e.g., video). gwern.net has a great essay that dives into proactively archiving URLs, but manually deciding to save them is more valuable to me.

However, this brings up the question of how to locally store such written content and my annotations of that content. Academic papers are almost exclusively stored in PDF format, which is fine since they tend to follow a rigid layout (and is often the only choice). But other content, like blog posts and articles, doesn’t require the same layout and can thus easily reflow, or wrap to fit on smartphones, desktop monitors, and any size or aspect ratio in between. This flexible property is a big benefit, as it is inconvenient to read PDFs on any device smaller than my 13.3 inch Boox Max Lumi e-reader, and I want to retain it when saving files. The same applies for EPUB files which is the most common (and open) format for ebooks and is essentially a zipped folder of HTML. While it is easy to ‘print’ a webpage to PDF, reflow is then lost. By instead using the SingleFile browser extension, it is effortless to download any webpage into a single HTML file, including images, and then view it ‘offline’ thenceforth. The Zotero Connector browser extension makes this even easier, as it saves webpages (via SingleFile) or online PDFs directly into Zotero including metadata, which makes it easy to pick Zotero as my ‘source archive’ of choice.

It is critical to get annotations into my PKMS because then I can easily link to them in other Markdown notes or even directly embed the notes in other files and encounter them when searching or navigating through my vault, giving more chances to spawn new ideas and understanding. From within Zotero I can directly open the [paired source note] for each source Item in Obsidian using the MarkDB-Connect Zotero add-on and manually sync and resync PDF annotations made in Zotero to the same paired note with obsidian-zotero-integration.

#
Problems

However, at this point we reach a crossroads. As outlined above, in my opinion there are three common and important types of written files that should all be archivable and annotatable:

PDF
HTML
EPUB

I prefer HTML and EPUB files over PDFs for their flexible viewability (reflow) and also for easier copy/paste and (somewhat) easier editing. Fabian Suchanek’s excellent blog post Best File Formats for Archiving agrees that PDFs are also best avoided when possible and recommends HTML and plaintext (i.e. Markdown) instead. Unfortunately, while Zotero can import, store, and manage any filetype (including HTML and EPUB) as an attachment to an Item entry, it can only directly open and annotate PDF files. This is a highly-requested feature but doesn’t seem to be a priority. I think annotating (meaning highlighting text, being able to take inline notes, etc.) is critical for pulling value from a source of information, as you may notice different things each time you reread. Past inline annotations let you encounter your past thoughts and make new connections even more easily than even having a companion paired source note somewhere else (and I think the act of annotating, just like the act of writing, helps you better pay attention to and internalize what you are reading). But this web -> Zotero -> Obsidian knowledge workflow only works with the inferior PDF format.

Offline/local annotations are stored either within a written file or in a separate file/database. Zotero stores them separately, so any PDF annotations you make in the Zotero PDF viewer aren’t visible if opening that same PDF externally (you can optionally export a PDF with the annotations embedded, however). Since Zotero can’t natively open HTML/EPUB files, you need an external viewer anyway and can find some that allow annotations there instead. But there are a few issues with this.

First, annotations are not well supported in general. PDF support many different flavors and styles of annotation depending on the reader application used, and aren’t always cross-app compatible (but at least Zotero is pretty good about importing annotations previously embedded in the file into its external annotations database). EPUBs are much worse—while there have been various efforts to create an open annotation EPUB standard, it seems they’ve never gotten very far or been adopted. Only a few EPUB readers support annotation, and to my knowledge, none are cross-app compatible. HTML files are even worse, as I’ve found almost nothing that supports local-first annotation. Hypothes.is shines for HTML annotation (and you can even have both public (shareable/linkable/replyable) and private annotations), but doesn’t support archived offline files out of the box.

Even once you find a good annotator program, you need a new way to sync annotations into your PKMS. Furthermore, if you use a smartphone, tablet, laptop, and/or desktop, you might need to find different applications to read and annotate each filetype across each OS and product.

#
App Options

My favorite solution would probably be the Obsidian Annotator Plugin, if it were not so buggy and unstable. It requires all your source material to be hosted within your Obsidian vault (meaning, somewhere in the same directory) but uses an offline version of the Hypothes.is client to let you read and annotate PDFs, EPUBs, HTML files, and even videos all within Obsidian (which is supported on all major platforms). This lets the paired source note that I explained in Why I Blog and How I Automate it literally store the entire annotation data of the source, not just keep a synced copy of them. The biggest problems are that it is unreliable and unoptimized for different platforms, sometimes only successfully opening target PDFs or EPUBs 20% of the time (HTML hardly works at all), and development has slowed to a crawl. The Markdown-JSON storage format of the annotations is also pretty cumbersome to work around. Your source notes (with annotations) and the source files themselves are also susceptible to getting disconnected if the file is renamed or moved, especially if you sync your PKMS (and sources) with something like Syncthing (this has happened to me many times). For this reason specifically I prefer Zotero because of its URL scheme, which gives you a cross-app, cross-OS, unchanging link to any source Item (and its attachments of any file format - even changing attachments is fine)(if you don’t use citekeys) and now even lets you target specific annotations to open to directly in Zotero (for PDFs). Zotero also isn’t dependent on Obsidian as the PKMS.

Excluding PDFs (which have ‘good enough’ cross-app annotation support, especially because of Zotero, Logseq, etc.), there are several programs that support annotating HTML/EPUB, but they are usually only available on one or two platforms. Some other options:

The Calibre E-book viewer is popular, cross platform, open-source and supports highlight/note annotations on EPUBs from a local application or from a (possibly mobile) browser and nicely exports to Markdown (with context-preserving URL links to jump to the highlight) or JSON. I can envision a decent integration with Zotero for opening EPUBs and syncing their annotations between apps somehow (then syncing to Obsidian), but since it can’t directly read HTMLs or PDFs, this would then require a third system for HTML files.
KOReader - Android, Linux Desktop, non-Android e-readers. E-ink friendly open-source reader of PDF, EPUB, and HTML that supports exportable annotations of all 3 in companion folders next to the files (which can be synced across devices with Syncthing), and has a lot of external tooling. No TTS, but could potentially be a good ‘front-end’ for reading/editing things stored in Zotero. Syncing annotations across devices and outside of KOReader seems like the biggest hangup, though it certainly seems possible to build out. Popular and actively developed. I’m currently working on a universal annotation-syncing program that goes between this and Zotero, but we’ll see where it goes.
Some proprietary Android apps like Moon+ Reader and NeoReader (excellent, but Boox ereaders only) support highlight and note annotations. Extracting them out of these examples and using them elsewhere is difficult but possible (and typing them up without a keyboard is annoying). Moon+ Reader technically integrates with the Calibre Content Server but doesn’t sync annotations (or work very well). I wish Librera Reader did as it is open source and otherwise superb.
This Hypothes.is + epub.js / ReadiumJS demo shows the potential for using a browser to read and annotate EPUBs, but it’s always been buggy when I’ve tried and I don’t know of an application that lets you do this on your own (again Hypothes.is is online-only by default and is the only good option I’ve found for annotating HTML files anyway).
Literal is Android only but archives copies of HTML/PDF and lets you annotate them. It uses the W3C Web Annotation Data Model and seems very portable, but is perhaps no longer maintained. It would be nice if it could store sources and annotation data directly in Zotero or some cross-app file system.
Apple Books can annotate EPUBs on Mac and must sync with iOS, but I don’t know how exportable it is and Apple-only is not a good universal fit.
Readwise is a popular paid choice among other PKM enthusiasts and lets you import annotations from articles and even from content on Amazon Kindles and sync them directly into Obsidian, but I avoid it primarily because it is paid and proprietary. I prefer self-hosting wherever possible as then the service can’t shut down at some point—if I choose to change it, it’s my decision. Google Play Books might fit in here as well as you can upload PDFs/EPUBs and annotate/sync all from the browser or the iOS/Android apps (for free). They also have a new product, Reader, which might be more inline with what I’m looking for as it supports reading and annotating all three filetypes, but I doubt it can support locally-archived versions of them.
Thorium Reader runs pretty well on Windows/MacOS/Linux and doesn’t support EPUB annotations, but it might later this year. Koodo Reader also works on Desktop OSes (for all 3 filetypes) and has syncable annotations (not sure about their exportability) but reportedly doesn’t bode well for mobile devices.
WorldBrain Memex is online, paid, and somewhat abstracts the filetype and storage of sources away. It lets you annotate webpages and PDFs in browsers, syncs across devices, and offers full-text search and many more features (maybe too many). While it claims to store data offline, I had retrieval issues when annotating PDFs and HTML sites after disconnecting from the internet. Its storage layer seems rigid and may not integrate well with other storage management systems like Zotero / a flat file system / auto-sync with Obsidian. It’s definitely something to watch, but I don’t think it supports EPUB either. It isn’t open-source, doesn’t seem export-friendly either and lacks comment discussions like Hypothes.is, and overall seems like a closed, worse Hypothes.is. I’d rather delegate the archival step to other dedicated tools. Polar seems to fit in a similar, paid, proprietary, cloud-only category.

Unexplored, but promising:

Promnesia TODO check out. This is meant to enhance your browser history but seemingly supports annotations, maybe is archive/local-friendly, probably doesn’t work on HTML/EPUB files directly… But it’s actively developed, open-source, and seems very integration-friendly.

Zotero clearly isn’t ideal as its annotations only support one of the three target filetypes and, while exportable, don’t use some universally readable annotation standard either. I still use Zotero because it has had strong development for over a decade, works offline and syncs on all major platforms (except Android…), manages any source items and attachments well, and is the open-source state-of-the-art for research citation management for my PhD work. The other recent features of PDF reading, annotation, ample PKMS integrations, and likelihood of me using it for years are enough to make it the best option for me right now.

#
Conclusion

So that leaves me in a rough spot. Looking at each of the three file formats that make up most written sources of information:

PDF sources are ‘good enough’ because they work so well with Zotero (easy to integrate them into Obsidian and keep annotations in sync). There isn’t a great option for pure Android, though only Android tablets are really big enough to read PDFs well anyway (I use the SpaceDesk Android app to turn my Boox Max Lumi 13.3 inch ereader into an extra touch-enabled Windows laptop monitor, and can then use Zotero to read and annotate PDFs that way).
EPUB sources can be read on any platform and annotated on some, but annotations are currently too hard to get into my PKMS (and keeping them up-to-date is a no-go). Unlike Zotero, where I can easily read past PDF annotations from any desktop OS and embed them into the PDF if necessary, there isn’t a good way to view EPUB annotations outside of the app I originally annotated them in.
HTML sources don’t really have any annotation options, unless you have a way to give them a unique URL and use with Hypothes.is. If you don’t care about archiving articles then you could just privately or publicly annotate the page with Hypothes.is and hope it sticks around.
(Plaintext/markdown/org files could also be sources of information, though I have few of these. I still like to keep these in Zotero with all other sources, not in my PKMS since it’s not Personal (authored by me), and I don’t have a good annotation solution for these either.)

I thus heavily annotate and re-evaluate all PDFs, occasionally annotate EPUBs knowing that I probably won’t take the time to use them again, and never annotate HTMLs. In the latter two’s case, sometimes I’ll create a paired source note in my PKMS and take notes or directly copy-and-paste passages that I want to remember, with the disadvantage that if I reread them later without opening the paired source note, I’ll miss them. I could create PDFs of each EPUB and HTML file to get around this but think the loss of reflow for smaller screens (and duplication of data) is not worth it.

My OCD tendencies really portrayed themselves in this post. Regardless, I hope for the day that we have universal standards for annotation and that all of this will be a lot more seamless! If I get some free time eventually, maybe I’ll try creating an Obsidian-agnostic, browser-but-local version of its Annotator plugin that reliably uses self-hosted Hypothes.is to read and annotate all of these and more, and maybe operable inside Zotero (and storing annotations for all filetypes in its database) as well. Calibre’s EPUB viewer/annotator is great, but since it can’t support HTML or PDF, it doesn’t seem like the right path to pursue. One day, it would be nice to have a well-supported open annotation standard where different versions of the same content (PDF vs HTML, v1 vs v2) can be recognized as similar and annotations can be portably shown/ migrated across each or even in isolation (see beepb00p’s annotation vision), but that’s far off.

If comments are missing, refresh the page. Or read/write comments directly on Github.

PKM Annotation Woes

# Problems

# App Options

# Conclusion

#
Problems

#
App Options

#
Conclusion