JSON and XML

Sometimes the JavaScript Object Notation, JSON for short, is promoted as an alternative to the Extensible Markup Language, XML for short. On the format level, I fail to see a difference other than not repeating the name at the end tag.

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <text>words</text>
</root>

is pretty equivalent to

{
  "root":
  {
    "text": "words"
  }
}

JSON saves a few bytes per tag, but memory/bandwidth can’t be the concern of JSON as it keeps commata, colons or the outmost braces. For both JSON and XML, it’s just text after all and we generally don’t care too much about its size as it is often optimized with gzip compression by the server, the multiplexing in HTTP/2 or some methods of minification as long as they don’t change tag names (removing whitespace indentation that’s only there for human readability would be the typical candidate). Is JSON concerned about ease of manual manipulation? Well, doesn’t matter as people aren’t supposed to fiddle with it directly anyway, it’s just the lack of decent tools that force us to. Is JSON about the tools? To some extend – JSON is for JavaScript. That’s particulary bad as JavaScript initially was designed to power DHTML. Web developers saw it as an improvement that they don’t have to declare types, and now there’s TypeScript. Web developers saw it as an improvement that they don’t have to declare classes, and now they were introduced by ECMAScript6/2015. Those extensions are good for modern JavaScript to become a better tool for building serious applications and not just stupid animations, but JavaScript still remains tied to the browser and the limitations that come from the security sandbox as native file-I/O didn’t gain traction. But even if we accept JavaScript as an equal citizen in the application languages world, there’s not a lot of support for JSON in the other languages. Just think about it: Java Enterprise Edition has a JSON library (let’s write server backends in Java for all those websites written in JavaScript), but the Standard Edition doesn’t (why should clients written in Java talk to servers with JSON as they’re not websites and therefore no JavaScript involved?), which leads to projects like JSON-java. It’s wheel-reinventing for a capability that already exists for the server side, wasting valuable lifetime to compensate for web deficiencies. For XML, on the other hand, there’s good support in almost any programming language.

All this waste of valuable lifetime for the sole reason to save a few bytes of memory/bandwith? More likely is a mere historical coincidence. As XHTML is based on XML, browsers are XML processors anyway. Just look at what happens if you open a random XML file in your browser, it most likely will render a representation that’s different from the plain-text equivalent of the XML file in the most basic text editor without any syntax highlighting. Even more interesting is the result if the XML file contains a reference to a XSLT stylesheet, because it might end up being applied, so the “browser” is more or less expected to be a XSLT processor too. There’s the most interesting XSLTProcessor interface, which isn’t standardized unfortunately, but look at the browser compatibility list. In the browser war days, as there was only Netscape/Mozilla and Microsoft, the red “no support” by popular Microsoft Internet Explorer might have killed the XSLT-based ViewSpecs of HyperScope, but nobody cares about Microsoft browsers anymore (did you know that Internet Explorer is based on Mosaic code Microsoft licensed from NCSA after Andreessen and his bunch left to found Netscape?).

JSON is fine for data transfer if the developer controls both endpoints, but then the web guys found out that it lacks “out-of-band” metadata like XML attributes and semantic descriptors/identifiers like XML namespaces, so now there’s JSON-LD, mimicing the mentioned XML features. Wait, LD, Linked Data, isn’t that the new name for the abandoned notion of a semantic web? Didn’t the web guys together with the browser vendors kill that effort, and just now realize that it is actually needed, rebooting it with their own JavaScript stuff slowly and with years of unnecessary delay, plus doing it wrong? It’s not difficult to predict that one day they’ll find out that they also need JSON Web Services Description Language, JSON Schema Definition, JSON Stylesheet Language Transformations and JSON Path Language, but with a completely new syntax because they can. Throwing away all the XML technology that already exists and reinventing it in the exact same way (don’t be surprised, XML concepts tend to make some sense) but in different packaging. JSON has a big chance however, that the web people denied XML: as soon as they recognize that they’ve made browsers into parsers of almost any arbitrary markup trash, be it by writing invalid, non-well-formed HTML or the W3C’s new efforts to deliberately break the XML-ness, websites could be written in the JSON Hypertext Markup Language. For some curious reason, the JSON deserializer demands well-formedness, which turns out to be important, just as XML always did and always got except on the web where it isn’t XHTML. Futhermore, as there’s no support for XML Cascading Stylesheets, it would be of equal help to have a JSON version of CSS.

See, it doesn’t really matter if it is XML or JSON as they’re basically the same format-wise, except that XML is way more advanced and JSON still too primitive. It would be incredibly cool to arrive at a “programmable” web where no big, bloated browser as interpreter and runtime environment is needed, but small clients/agents could consume and act on the semantic markup, a real “people’s web” infrastructure and data collection, fully accessible to the public without the need for centralized, lock-in Internet company services. Wonder who tries to prevent that? Isn’t it a dangerous vertical integration if those who offer web services also own the browsers and influence standardization consortiums too in order to make sure that better digital technology won’t disrupt their current sources of income? Most of modern web developers weren’t old enough to deeply learn about digital, software and networking, they grew up with “social” networks and apps already presented to them in a particular way. They’re easily fooled into hyping technological stagnation (let’s see what WebAssembly will end up as), while the smart developers conspire with the big companies who understand digital perfectly well to exploit unsuspecting markets, politicians and society with great success. Regardless of our dystopian future, let’s never forget that the centralized “cloud” isn’t everything, that personal computing is all about the independent individual, that computer liberation still needs to go on.

So let’s imagine a parallel universe in which some day somebody decided to add XML object serialization to JavaScript. It’s probably almost a trivial task if a more efficient way to parse/represent XML than DOM is already available, let’s say a JavaScript implementation of StAX for example or SAX if an asynchronous (in terms of node.js and async/await) push instead of pull method would be needed. That would offer the JavaScript developer a “new” native way to work with XML as if it were a JSON object (which it will be and represented as actually), as other dynamically typed languages enjoy such feature for quite some time now (that would be PHP’s SimpleXML. But why even bother? Is serialization really a thing that demands its own non-free license, pretty much like the well-known “CSV CRLF linebreak license” or the famous “SQL plus operator license”? No, for me, JSON is fine if I shovel non-public dumb data between two websites under my control. For everything else, I’ll just convert JSON to XML as the universal format for text-oriented data and then work with the latter. JavaScript is only a small fraction of application programming and I’ll certainly not abandon decades of improvement for seriously broken web stuff. It’s not that JSON is “bad” or something, it’s just not very helpful for the things I want and need to do.

skreutzer

Autor: skreutzer

http://www.skreutzer.de/about.html

2 Gedanken zu „JSON and XML“

    1. Not everybody programs in JavaScript, not everybody is in a browser. Even in JavaScript, the lack of an equivalent to XML namespaces as identifiers for semantics leave a consuming JavaScript module without any semantic context, except the module is talking to itself (in the same version) or that meaning is derived implicitly.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.