JSONata: A Language for Data

JSONata: A Language for Data
Photo by Melanie Kreutz / Unsplash

We all know JSON. But you might not know that it stands for JavaScript Object Notation. JSON.org says it best, "It is easy for humans to read and write. It is easy for machines to parse and generate."

{
  "foo": "bar"
}

It truly is easy. Nearly every language supports JSON either directly or through popular packages/libraries. This list includes things like C, Java, C#, Rust, Lisp, etc., but also more obscure languages like 8th and Visual FoxPro (neither of which I've heard of) and even within Photoshop.

There are few universally adopted things when it comes to programming. We can't even agree if we should use semicolons, where to put brackets, or if to use brackets at all—but JSON is everywhere with few true competitors.

This wasn't always the case. Even when I first began my career, other things like XML were quite popular (and still are in certain niches). JSON was born around the change of the millennium and was based on concepts found in JavaScript Programming Language Standard ECMA-262 3rd Edition (it would take a little longer for it to become a part of the language).

You can read the full 161 page spec here: https://www-archive.mozilla.org/js/language/E262-3.pdf

Literals

This specification for ECMAScript, the actual name of JavaScript, provides several key syntax elements that JSON relies on:

  • The null value
  • Boolean literals true, false
  • Number literals 42, 3.14
  • String literals with double quotes "hello"

Array Literals

An array initialiser is an expression describing the initialisation of an Array object, written in a form of a literal. It is a list of zero or more expressions, each of which represents an array element, enclosed in square brackets. The elements need not be literals; they are evaluated each time the array initialiser is evaluated.
Array elements may be elided at the beginning, middle or end of the element list. Whenever a comma in the element list is not preceded by an AssignmentExpression (i.e., a comma at the beginning or after another comma), the missing array element contributes to the length of the Array and increases the index of subsequent elements. Elided array elements are not defined.

That's a great interview question for you: "What's the length of this array? [1,,2,3] And what would [1,,2,3][1] return?"

This gives us the next level up of a JSON file above the raw values or literals. Arrays are great for storing data that doesn't necessarily have a unique name or where the length is unpredictable. An example of this would be HTML's class attribute where the number of classes is unknown and each value exists independently rather than belonging to another named thing.

Non Literal Values

With Array Literals (and as you'll soon see, Object Literals), rather than only using literals for values, you can use things that evaluate to literals. Literally. Here's an example:

const x = 5;
const obj = {
    calculatedValue: x * 2,  // expression that's evaluated
    method: function() {},   // can include functions
    reference: x             // can reference variables
};

This is completely valid JavaScript, but not valid JSON. After all, how could you serialize that? You'd have to include the entire language in every JSON file or assume the user could do it, but then you're just sending JavaScript files, hardly a solution for cross environment and language data transfer.

JavaScript is smart. When you try to serialize the above, it knows that calculatedValue is 10 and that reference is 5, but it doesn't know what method is, so it will just remove it. This is one area where JavaScript != JSON. JSON is not a programming language.

Object Literals

An object initialiser is an expression describing the initialisation of an Object, written in a form resembling a literal. It is a list of zero or more pairs of property names and associated values, enclosed in curly braces. The values need not be literals; they are evaluated each time the object initialiser is evaluated.

These name and value lists form JavaScript objects. For example:

const foo = { "PropertyName": "Value", "PropertyName2": 2 };

This forms the foundation of what we know as JSON, which is data stored within curly braces with names and values. These values can be other literals, including arrays and other objects containing even more literals.

Name Value vs Key Value

If you're paying attention, you might have noticed that the spec said names and values rather than keys and values. I thought this was really interesting and I may be reading into it a little too deeply, but I think the terminology difference primarily reflects the language origins and its design philosophies:

  • JavaScript uses "name/value pairs" because it emerged from a document-oriented world where properties were conceptually named attributes of objects (similar to HTML attributes). This terminology emphasizes the descriptive nature of the identifier. <div id="div_1"></div>
  • Other technologies use "key/value pairs" because they evolved from data structure and algorithm traditions where the emphasis was on lookup operations and accessing data efficiently through keys (like in dictionaries and hash tables).

This interesting nuance quietly reflects JavaScript's origin as a language for manipulating document elements versus other languages' roots in data structure theory and algorithms. JavaScript was originally a DSL (Domain Specific Language) which simply means it was originally created to solve a very limited scope of problems within a specific "domain" (or area of work). This domain was of course, manipulating the DOM and adding interactivity to web pages. JavaScript has long outgrown its humble beginning to being one of the top languages in the world running on everything from smartphone apps (React Native) to launching into space (SpaceX and JWST).

Why JSONata

Now we're finally getting to what this post is about, JSONata. JSONata is another DSL language, but rather than being created as a browser scripting language, it is a query and transformation language for JavaScript Object Notation (JSON). If you're keeping track, we're several layers deep now (JavaScript > JSON > JSONata).

JSONata allows you to query your JSON and return results. For example, with this expression, you can add up the total price in a user's cart: $sum(Account.Order.Product.(Price*Quantity)). But beyond things like that, you can also do joins, sorts, grouping, and more. Sure, you could do this in JS, but JSONata makes it much easier and more readable.

JSONata is a Turing complete functional programming language. It supports syntax like the ternary operator (my favorite feature), variables, functions, date/time processing, and more. Here's an extreme example showcasing how complicated you can make things if you desire:

(
  $Y := λ($f) { λ($x) { $x($x) }( λ($g) { $f( (λ($a) {$g($g)($a)}))})};
  [1,2,3,4,5,6,7,8,9] . $Y(λ($f) { λ($n) { $n <= 1 ? $n : $f($n-1) + $f($n-2) } }) ($)
)

/* results in */
/* 1, 1, 2, 3, 5, 8, 13, 21, 34 ] */

The language also has a built-in library of commonly used functions. You'll probably never need to write your own function unless you're doing something weird like a Fibonacci sequence using Y-combinator. You'll find:

  • String functions like uppercase and trim
  • Numeric functions like rounding and power
  • Min, max, sum, average
  • Shuffle, sort, zip, distinct, count
  • Convert unix epoch to and from ISO 8601
  • Map, single, filter, reduce, shift
  • Plus more...

Is It Really Needed?

JavaScript is great at manipulating and working with JSON. Plus, as mentioned earlier, nearly every language has support for JSON in some way or another. So why not just read in JSON into a general purpose language?

That answer, or at least my answer, is that it would be overkill. JSON has grown to a level where it is used widely, but having a way to query and transform it is an opportunity area. JSONata fills this need.

There's a need for greater JSONata support (currently limited to browsers and node), but overall, I think there are a lot of use cases, especially if you're allowing users to write their own queries within dashboards, low/no code platforms, or for filtering data. You don't necessarily want to allow users to write and run their own JavaScript, because that is a security headache and challenging for users, but writing what feels like an Excel formula is manageable. In use cases like this JSONata shines. Rather than inventing your own way to write queries, you can use JSONata and get a batteries-included full programming language.

I hope you've enjoyed this deep dive into JavaScript, JSON, and JSONata. It has been fun to write. I encourage you to look deeper into how the languages you use work and their history. With this added context, you'll most likely become a better developer not only knowing more about the language, but also how languages work in general.

Try It

Here are two sites that let you try out JSONata for yourself: