This morning when waking up, I had an important realization. Because I'm trying to implement some kind of hypertext system, I naturally have known entities in the texts, and too need a little bit of data to aid navigation. Also, I have to manage a few lists, like the book lists, collected URLs, a calendar and so on. For that reason, I was proposing to build an "open commons data federation" for a few months, and did a number of tiny experiments, for example towards a simple aggregator/graph-merger, a generic table manager solely operated over API endpoints, I also have a primitive tree/outliner-ish WWW server package.
Yesterday, for purposes of link sharing and potentially attaching notes/comments to these, the question came up for what to enter a source table of collected links into. The table manager and tree/outliner services would be a nice fit (even if I cut away table columns to make it a single-column list), except the source does come with a copyright license (Creative Commons BY), so I want to comply with the BY Attribution clause. How do I do that? Duplicate the notice in an additional column in a table? Customize the tree/outliner package and hard-code the copyright notice into it? In this case, luckily, there's a workaround, of creating a pseudo user and under that user account add the entries, because the tree/outliner package then already indicates authorship in the front-end. But in principle and for other similar cases, this may not be an option.
Therefore, I finally realized, the proper way to handle this is to break the data up/out from the sequential flat 1D list and the 2D tabular arrangement into a graph (something that's obvious anyway and I have been doing various graph things separately, it's just there also needed/needs to be some front-end exploration about lists, tables, and so on), which in the usual fashion can have one node that carries the copyright notice, and that'll have multiple edges to all the URLs that are covered by it (standard stuff).
However, if I start going down that route now, it's inevitable that I would reinvent and duplicate the RDF, OWL, SPARQL, graph databases approach of the Semantic Web. The good thing is that I would not be in danger of doing reasoning/"AI", and just use it for data management. Many years ago, Christopher Gutteridge explained RDF to me - not so much the notation or graph aspects, but how that would help with developing applications. I simply couldn't grasp the benefit, because I'm focusing mostly on semantics in hypertext for augmentation, trying to be as general + generic as I can, while what Christopher described was for specific apps that work on particular domain data, and I wasn't much interested in or working on this kind of single-purpose business logic. Given the "Semantic Web" tech is pretty heavy-weight in comparison to a regular one-off WWW SaaS, the synergies/savings probably only get realized if one builds multiple domain apps on/for the same graph data, this way the graph is inevitably going to eat up world knowledge to support more application domains.
So for a Goodreads clone (Library JSON via Common SenseMakers) or a "social media" platform, these might/should indeed go the graph data route (except there's not much synergies if the graph remains limited in use, or is all proprietary/siloed). "HyperKnowledge" might be such a thing, except I'm not aware if/when/where the "Interoperability Protocol" got ever designed and implemented and if there's any independent federation participants that share some data over it, and too it might be primarily for representing knowledge for argument mapping (less other use cases, like mine?); maybe "Free Jerry's Brain" gets to a stage to allow collaborative graph editing (and then one has to wonder, who's going to curate it and will it be openly published, libre-freely licensed?) despite at this late time, TheBrain even finally launched an API of their own; or maybe NextGraph picked up steam again (after not winning funding and too many cooks spoiling the soup); or maybe NooNAO is exactly such a thing and BestOfNow such a specialized app built this way.
Given this landscape, I don't think I should invest the time of duplicating these existing efforts, including the work that went into proper Semantic Web, competing with a late entry. At the other hand, I can't help to notice that none of these existing services seem to share any libre-freely data/signals, so I as just another user and peer don't know any feeds to subscribe to or pull/query from. Also, most of the tech is on/in the WWW, on some big server, with just another guy or company in control of it (not the open federation as claimed/advertised). These proprietary silos by design, implementation and/or operation seem to have the goal of eventually becoming a certain business or getting bought, with the accumulated data, kept secret special tech and dependent locked-in users as assets in the acquisition. For that reason, I can imagine just doing the open, public alternative to these, and not repeating the mistakes that invited all the data crap to be published as today's WWW, or for the Semantic Web never becoming a common affordance, because no easy & cheap (converging towards gratis) tooling has been made generally available and integrated into the "browser" (or included as part of a proper system).
Instead of trying to duplicate and reinvent all of this on my own, fighting an uphill battle against these big and well-established existing models, I should better work on other things in the areas that don't receive any work or attention. This way, I can wait and see if eventually, very late, an open data federation ends up potentially emerging, or not. So far I haven't tried at all to make any graph-related things for their own sake, only by accident and for various other needs I ended up developing a number of small graph components. I might still do some more graph work, if/when I have a personal need/benefit, if there's a paying customer who specifically requests a particular solution. Or when I want to reference some data, it could be sufficient to simply hook it up with Wikidata, which is not that great, but is shared, open, libre-freely licensed, and that alone is far better and more useful/usable than the ones that are not. If there's still no other options around in 1-3 years, that's then another indication that something is foundationally wrong and it might be worth a try, regardless of any other attempts, given that the "Semantic Web" had a long run on it already and it's still not really here/anywhere yet, and the mainstream went either towards "AI" (or the Cyc/Siri/Palantir path) or downgraded to SOLID for just "social media", against strong opposition/sabotage by the powerful actors in the space.
Copyright (C) 2023 Stephan Kreutzer. This text is licensed under the GNU Affero General Public License 3 + any later version and/or under the Creative Commons Attribution-ShareAlike 4.0 International.