Balisage Paper: Fat Markup: Trimming the Fat Markup Myth one calorie at a time
https://www.balisage.net/Proceedings/vol10/html/Lee01/BalisageVol10-Lee01.html
XML is a fine format in comparison to JSON.
I hate writing and reading xml compared to json, I don’t really care if one is slightly leaner than the other. If your concern is the size or speed you should probably be rethinking how you serialize the data anyway (orotobuff/DB)
XML has its strengths as a markdown format. My own formatted text format ETML is based on XML, as I could recycle old HTML conventions (still has stylesheet as an option), and I can store multiple text blocks in an XML file. It’s not something my main choice of human readable format SDL excels at, which itself has its own issues (I’m writing my own extensions/refinements for it by the name XDL, with hexadecimal numbers, ISO dates, etc.).
XML is good for markup. The problem is that people too often confuse “markup” and “serialization”.
Too redundant, just use S-exprs.
(Mostly joking, but in some cases…)
Unironically.
Given the choice between S-expressions and XML, I will choose S-expressions.
A word document is xml
zipped xml!
The future if text documents were Json:
City_pic.png.xml
Lots or file formats are just zipped XML.
I was
reverse engineeringfucking around with the LBX file format for our Brother label printer’s software at work, because I wanted to generate labels programmatically, and they’re zipped XML too. Terrible format, LBX, really annoying to work with. The parser in Brother P-Touch Editor is really picky too. A string is 1 character longer or shorter than the length you defined in an attribute earlier in the XML? “I’ve never seen this file format in my life,” says P-Touch Editor.Sounds like it’s actually using XSLT or some kind of content validation. Which to be honest sounds like a good practice.
Here’s an example of a text object taken from the XML, if you’re curious: https://clips.clb92.xyz/2024-09-08_22-27-04_gfxTWDQt13RMnTIS.png
Is it because of the lower case Latin æ since it’s technically one character even if two bytes?
Nope, doesn’t seem like it.
I’m starting to like this AI thing…
Is this a tactic used by skynet to lure all humans together and then…BANG!!!
It is very cool, specifically as a human readable mark down / data format.
The fact that you can make anything a tag and it’s going to be valid and you can nest stuff, is amazing.
But with a niche use case.
Clearly the tags waste space if you’re actually saving them all the time.
Good format to compress though…
I think we did a thread about XML before, but I have more questions. What exactly do you mean by “anything can be a tag”?
It seems to me that this:
<address> <street_address>21 2nd Street</street_address> <city>New York</city> <state>NY</state> <postal_code>10021-3100</postal_code> </address>
Is pretty much the same as this:
"address": { "street_address": "21 2nd Street", "city": "New York", "state": "NY", "postal_code": "10021-3100" },
If it branches really quickly the XML style is easier to mentally scope than brackets, though, I’ll give it that.
Since XML can have attributes and children, it’s not as easy to convert to JSON.
Your JSON example is more akin to:
<address street_address="21 2nd Street" city="New York" ...></address>
Hmm, so in tree terms, each node has two distinct types of children, only one of which can have their own children. That sounds more ambiguity-introducing than helpful to me, but that’s just a matter of taste. Can you do lists in XML as well?
No arrays are not allowed. Attributes can only be strings. But the children are kind of an array.
I’m not sure now that I think about it, but I find this more explicit and somehow more free than json. Which can’t be true, since you can just
{"anything you want":{...}}
But still, this:
<my_custom_tag> <this> <that> <roflmao> ...
is all valid.
You can more closely approximate the logical structure of whatever you’re doing without leaving the internal logic of the… syntax?
<car> <tyre> air, <valve>closed</valve> </tyre> <tyre> air, <valve>closed</valve> </tyre> <tyre> <valve>open</valve> </tyre> <tyre> air, <valve>closed</valve> </tyre> </car>
Maybe I just like the idea of a closing tag being very specific about what it is that is being closed (?). I guess I’m really not sure, but it does feel nicer to my brain to have starting and closing tags and distinguishing between what is structure, what is data, what is inside where.
My peeve with json is that… it doesn’t properly distinguish between strings that happen to be a number and “numbers” resulting in:
myinput = {"1":"Hello",1:"Hello"} tempjson = json.dumps(myinput) output = json.loads(tempjson) print(output) >>>{'1': 'Hello'}
in python.
I actually don’t like the attributes in xml, I think it would be better if it was mandatory that they were also just more tagged elements inside the others, and that the “validity” of a piece of xml being a certain object would depend entirely on parsing correctly or not.
I particularly hate the idea of attributes in svg, and even more particularly the way they defined paths.
https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Paths#curve_commands
It works, but I consider that truly ugly. And also I don’t understand because it would have been trivial to do something like this:
<path><element>data</element><element>data</element></path>
Maybe I just like the idea of a closing tag being very specific about what it is that is being closed (?).
That’s kind of what I was getting at with the mental scoping.
My peeve with json is that… it doesn’t properly distinguish between strings that happen to be a number and “numbers"
Is that implementation-specific, or did they bake JavaScript type awfulness into the standard? Or are numbers even supported - it’s all binary at the machine level, so I could see an argument that every (tree) node value should be a string, and actual types should be left to higher levels of abstraction.
I actually don’t like the attributes in xml, I think it would be better if it was mandatory that they were also just more tagged elements inside the others, and that the “validity” of a piece of xml being a certain object would depend entirely on parsing correctly or not.
I particularly hate the idea of attributes in svg, and even more particularly the way they defined paths.
I agree. The latter isn’t even a matter of taste, they’re just implementing their own homebrew syntax inside a tag, circumventing the actual format, WTF.
I don’t mind xml as long as I don’t have to read or write it. The only real thing I hate about xml is that an array of one object can mistaken for a property of the parent instead of a list
I disagree, with a passion.
It is soooo cluttered, so much useless redundant tags everywhere. Just give JSON or YAML or anything really but XML…
But to each their own i guess.
YAML for human-written files, JSON for back-to-front and protobuf for back-to-back. XML is an abomination.
Having an easy on the eyes markdown that is also easy to parse would be cool.
But YAML does these things:
https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-from-hell
which are not excusable, for any reason.
YAML is good for files that have a very flexible structure or need to define a series of steps. Like github workflows or docker-compose files. For traditional config files with a more or less fixed structure, TOML is better I think
finally accurate ai
XML is fine. Namespaces have a special place in hell though
Except for obvious typos
wate
BASED. What is the name of this AI? I want to use this.
coral by cohere
no wait, it’s perplexity, I remember the logo.
you can try their labs version which gives to access to latest and beefy models like llama3.1 70b
OH HEY EVERYONE, EVERYONE, THIS GUY LIKES JSON
Fuck you and your unstructured garbage.
RSS/ATOM has to be the best thing to come out of XML
Listen we all know deep down the solution is to try to parse it with regex
a wate of time
I mean, it’s not wrong…
Disagree. I prefer XML for config files where the efficiency of disk size doesn’t matter at all. Layers of XML are much easier to read than layers of Json. Json is generally better where efficiency matters.
TOML or bust
yes.
Aren’t most XML parsers faster than JSON parsers anyway?
Wishful thinking