Personal Hypertext Report #11

Looks like C++ doesn’t come with built-in streams that can be navigated in reverse direction, as the default principle is to optimize for speed, and if data is read in forward direction and optimized by reading larger blocks + buffering them, std::istream::unget() may fail in proceeding before the current buffer block. Changing direction back and forth might result in throwing away an entire buffer block and having to read in another chunck, which defies attempts to optimize for speed, especially if those operations occur between buffer limits and there’s no smartness built in to deal with dynamic buffer sizes. I have to either verify that std::istream::unget() can always go back to the beginning of the data source (which is unlikely, because it may be possible with some stream implementations and fail with others, for example data that arrived over the network) or have to come up with my own stream interface and implementation for a file source which likely may not be too optimized in terms of reading block chunks. I could also limit the stream type to file streams, but I would want to avoid that, if possible, so data can come from other places as well as long as they’re not exclusively forward-directional. Introducing a new abstract stream class might be worth the effort for the „Reverse Streaming API for XML“: when porting to Java, Java’s streams might not have this limitation, and it can’t be worse than JavaScript with no notion of streams whatsoever (as encountered with the JsStAX port).

Another cheap solution would be to use the recently updated change_instructions_executor_1 that generates a separate file for every change instruction, and if I would also add some separate file that exports the instruction(s), I could navigate back and forth between those files as specified by the change_instructions_executor_1 result information file. But this would require such files to be copied locally (not to rely on any change_instructions_executor_1 output that might be subject to change or removal), and the need to have that many files around for a longer period of time isn’t particulary better than keeping an exclusive lock on a change instruction file because the stream is currently reading it. In general, this option would make use of other existing tools of the system, which is a favorable property, but then we’re in a GUI tool and not necessarily in a workflow anyway and the change_instructions_executor_1 could also still change in major ways (not that this would be a problem, but something to consider).

This text is licensed under the GNU Affero General Public License 3 + any later version and/or under the Creative Commons Attribution-ShareAlike 4.0 International. See the history of how this text developed.

What’s the “Symbol” Tool?

Symbols are abstractions that are intended to act as (mental) substitutions for the real thing. As atomic entities in information encoding and communication, they can’t be manipulated without diverting the reference from the thing they were substituting to substituting for something else. If it’s about fiddling with the reference, it’s usually cheaper and easier to just create another symbol instead of trying to manipulate existing ones. The value/usefulness of a symbol is defined by its function as a disambiguation, in contrast to all the other symbols or useless noise.

There are many symbol conventions and media. Signs, sounds, words are just a few of them. Individual, abstract characters/letters are symbols themselves, but in dealing with symbols, we rarely read on the character level, but on the word level to identify its symbolic meaning. Text manipulation on the character level as opposed to text manipulation at the word level is rarely about manipulating symbols, because changing individual characters most of the time switches to a different word symbol and doesn’t do anything to the character as a symbol itself. Characters are the atomic encoding format/convention to construct textual symbols. Characters are atomic in the information encoding scope, but the symbol scope is one level above mere encoding concerns. The atomic entity for textually encoding a symbol is a word. As we can’t really manipulate a word symbol, as text is a series of word symbols, what we do most of the time is to manipulate the composition of word symbols within a text series.

We don’t care about the letter ‘s’ in “insight” as a symbol, because the symbols ‘s’ and “insight” are different from each other. We rarely use individual characters for their own symbolic meaning, but as building blocks to construct words via character combination. Such word symbols then can be proper, better symbols than what the limited alphabet provides. Now, if word symbols are atomic, how to manipulate them? If we start to change characters, we likely create new words/symbols, or switch to totally different symbols like “insignia” or “insider”. Changing characters in a word symbol switches to a different symbol and manipulates the text, but didn’t change the original word symbol “insight”, and how could we?

As we have established that “insight” is a symbol, what can we do with it or how to manipulate it? There’s the option to re-define it or fiddle with its meaning, which can be considered a bad, confusing action or a very creative activity as well, depending on context. The “insight” symbol doesn’t reference a tangible physical object of course, but an abstract concept, which isn’t a big difference, it’s just giving names/identifiers to whatever we might want to talk about, as a shorthand or “link”/”address” to the real thing. The actual meaning of the symbol has a defined scope (can be more vague or more strict), which includes a sudden realization or gained understanding about something non-obvious; the long and deep study of something that leads to better, more correct understanding than what others learn from a quick look on the surface; to look into oneself as the practice of self-reflection or -analysis; that’s what our language standard says, what the dictionaries as lookup tables for resolving and dereferencing word symbols say. But then I could start to call my company “insight”, we could agree to use the term/symbol to mean the exact opposite as some kind of secret code or in ironic context, I could “abuse” the term/symbol by using it to describe/name the event that a physical object comes into view/sight of an observer (as in “the ship came into insight distance”), or similar. Notice that the symbol itself hasn’t changed and hasn’t been manipulated, I instead manipulated what the symbol is pointing to or the scope of meaning, what it can and can’t point to. Symbol manipulation in terms of changing and overloading it’s meaning is somewhat dangerous because it becomes less useful if we do it too much.

What is symbol manipulation then? If I come up with the word “outsight” to refer to a situation in which insight never can be obtained; sudden, surprising findings about something while I was looking for something else; looking from inside outwards; general dumbness or whatever else (similarities in meaning scope is just because I followed a similar character construction rule that allows the deduction of a negated meaning, but the actual referenced concepts/meanings are different and distinct, they may or may not be opposed even, and I could have picked a different selection of meanings or a different combination of characters to refer to some or all of the mentioned concepts), it barely affects the original “insight” symbol and its meanings, only by mere accident/coincidence. One could claim that this is a symbol manipulation example because I relied on the original symbol to construct/derive the new one, so there is a relation, but I could make the point that the symbol itself is rather arbitrary. It’s perfectly fine to come up with new words that don’t have any resemblance to existing words/symbols (although it’s considered bad design) and define their meaning or meaning scope. I could just define that “anpecatrm” refers to the activity of looking out of the window (to specify the scope, specifically and only used when there is a window of an implied house, not to be used looking out of the window/windshield of a car).

How else could symbols be manipulated? We could consider the usual manipulations of typography, typesetting, rendering, visualization, but if “insight” in red has a distinct different meaning than in green, changing the color changes what meaning is referenced, the two symbols stay separate from each other and their color can’t be manipulated interchangeably. Such operations can be a way to trigger/hint different connotations however, to indicate a slight difference in meaning scope, but please note that we are only able to do so after leaving the encoding convention of plain text and entering the entirely different encoding conventions (another dimension) of pictorial visualization.

If you’re an electrical engineer and encounter computers with their binary information encoding, the realization can be (see Turing) that the bit patterns are arbitrary symbols that can represent other symbols like numbers (most prominent back in the day), text, images, abstract concepts and whatever else, and just as we manipulate binary and numeric symbols, we can as well manipulate text, image, audio symbols (if we can find reasonable methods to do so, that is). For binary and numbers, arithmetic is a useful manipulation method (in contrast to useless manipulations like picking a random bit or digit of a large number and make all other bits/digits that very bit/digit). What is it for text? Converting upper-case characters to lower-case? Make a word/symbol italic (but what would that change, do we enter pictorial/visual symbolism and would it still remain to be the same symbol)? I have some trouble of listing useful methods that manipulate pure word symbols. It may be much easier to list useful symbol manipulation methods for numbers, audio, images, but that too changes the symbol so it refers to something else (most dramatically with numbers). Whatever we do to symbols themselves, we usually have to follow pretty narrow constraints in order to preserve them as useful and correct.

So what is it that we really care about? It could be moving symbols around, combining, separating and rearranging them, “enacting” them (to attach effects to symbols and trigger them), and indeed augment their use (“writing” them or picking them from a list of symbols, insert them into other contexts as, for example, formal constructs, or whatever else). Those activities rarely change the symbols themselves as they’re supposed to retain the reference/meaning.

How would we manipulate language, if that’s similar enough to symbol manipulation, if not equivalent/synonymous? Or are (word, visual or other) symbols atomic entities and “language” the rules where to put them? Is it about us changing vocabulary and/or grammar? Potentially to some extend, but it’s more about manipulating particular symbol sequences in compliance with the established rules. A text, for example, is encountered as a large collection of symbols, being composed in a specific language (in which our knowledge is encoded). Language/vocabulary are in place for a long time now and can’t be changed easily because their modification requires everybody to agree on the new standard, so the meaning and the rules for dereferencing become established.

Another consideration: There is no practical obstacle whatever now to a world that exclusively operates on/with audio symbols. Noises and language received a great deal of standardization for their use in writing, reading and print serialization, but with audio interfaces and serialization, would we still hold on to the complex rules of written language composition that target the eye for visual consumption? I can easily imagine that much more efficient symbols and languages could be developed and adopted for acoustic information encoding and communication.

This text is licensed under the GNU Affero General Public License 3 + any later version and/or under the Creative Commons Attribution-ShareAlike 4.0 International. See the history of how this text developed.

Personal Hypertext Report #10

With the change tracking text editor completed in its first stage, I can imagine that a lot of people can’t make a lot of use with the XML output it produces. In order to extend it to a full writing system, I currently look into programming a „change instruction navigator“, which is planned to have a rich editor control for the additions to be highlighted in green and the deletions to be highlighted in red. Two buttons at the bottom should allow the navigation backwards and forwards in history. There could be an option to jump to a specific instruction, and another button (optional) to select a specific version. On calling the program, one could immediately jump to a specific change instruction.

I think I’ll keep a stream object on the file which will lock it, and realized that the Java StAX API doesn’t allow to move backwards, so I’m looking into developing „Reverse StAX“, and to make things easier, I try to start a C++ reference implementation to later port it to Java, based on my existing CppStAX code. This will delay work on the navigator, but I’m not willing to keep all the instructions in memory, so I hope that it is worthwile to invest into more powerful XML tooling.

This text is licensed under the GNU Affero General Public License 3 + any later version and/or under the Creative Commons Attribution-ShareAlike 4.0 International. See the history of how this text developed.

Personal Hypertext Report #9

Finally, I’ve managed to get the „change_tracking_text_editor_1“ capability working reliable enough for beta testing and prepared a downloadable package: hypertext-systems.org/downloads.php. Java 1.6 or higher is required. A description of the tool/capability can be found in this video.

From here, plenty of very interesting options to extend it present themselves, but I find important to point out that in my opinion, tracking the development of a text is fundamental for a hypertext system and serious writing on a computer. Without it, versioning and revision can only be done retrospectively with heuristical diffs as after-the-fact analysis, which can be wrong and lacks information like the order of changes or changes that later got overwritten again. With the text’s history recorded, every individual character as the most atomic element can be addressed with a higher resolution than with indifferent, agnostic diff/patch blocks.

For a hypertext system, one has to ask where the texts it’s supposed to manage come from. If they’re from old sources, only compiled, merged, consolidated versions without record of the history of their writing may be available, but for new texts I write myself today, I don’t want to immitate the old constraints imposed by physical production and instead make full use of the potential of digital. With writing covered for now (although very primitive initially), I can continue with tools/capabilities for resource management, publishing and reading, to eventually arrive at an integrated system for more convenience than using the capabilities individually.

Besides the prototype of a hyperglossary capability (video) and WordPress post retriever (with subsequent conversion to different target formats), the „Change Tracking Text Editor“ is the only other contribution I was able to prepare for the 50th anniversary of Douglas Engelbart’s Great Demo while loosely participating in the Doug@50 effort during the year 2018.

Related books I’ve read while working on the editor: „Track Changes“ by Matthew Kirschenbaum and „The Work of Revision“ by Hannah Sullivan.

This text is licensed under the GNU Affero General Public License 3 + any later version and/or under the Creative Commons Attribution-ShareAlike 4.0 International. See the history of how this text developed.