JSON Library
All messages exchanged between the OKC and the OKE are valid JSON notation.
The CWL includes a hardened JSON library, derived from the mjson open source project, that makes it easy to properly generate and parse JSON.
The parser is very memory efficient. The loop, task, or thread running the CWL and JSON parser will generally not need more than 1-2kB of stack space depending on application requirements and platform specifics. The parser has two compile time constants that will help prevent stack overflow:
MJSON_MAX_DEPTH, by default 5, prevents the parser from recursing more than this many levels when parsing a JSON message.
MJSON_MAX_LENGTH, by default 1024, prevents the parser from inspecting more than that many characters when parsing.
Nevertheless, it is recommended to keep a healthy margin of unused stack allocated to avoid stack overflow.
Also, the parser does not directly make dynamic memory (heap) allocations, although implementors are permitted to allocate dynamic memory and use those buffers when calling into the parser API.
This section describes how to use the library.
- 1 Online Resources
- 2 String Escaping
- 3 Library Limitations
- 4 Unit Test
- 5 Generation
- 5.1 mb parameter
- 5.2 fmt parameter
- 5.2.1 %Q – Quoted and Escaped String
- 5.2.2 %.*Q – Length Limited Quoted and Escaped String
- 5.2.3 %d and %ld – Signed Integer
- 5.2.4 %u and %lu – Unsigned Integer
- 5.2.5 %B - Boolean
- 5.2.6 %s – Literal String
- 5.2.7 %.*s – Length Limited Literal String
- 5.2.8 %g – Shortest Floating Point Number Representation
- 5.2.9 %f – Floating Point Number
- 5.2.10 %.[0-9]f – Floating Point Number with Specific Precision
- 5.2.11 %V and %v – Quoted and Unquoted Base64 Encoding
- 5.2.12 %H and %h – Quoted and Unquoted ASCII Hexadecimal Encoding
- 5.2.13 %M – Generic Printer Function
- 5.2.14 %A – Array Printer Function
- 5.2.15 %O – Object Printer Function
- 5.3 Decoding
- 5.3.1 s, len, path - Common Arguments
- 5.3.2 MJIsValid()
- 5.3.3 MJFind()
- 5.3.4 MJGetBase64()
- 5.3.5 MJGetDouble()
- 5.3.6 MJGetInteger()
- 5.3.7 MJGetUInteger()
- 5.3.8 MJGetBool()
- 5.3.9 MJGetString()
- 5.3.10 MJGetHex()
- 5.3.11 MJGetArrayElement()
- 5.3.12 MJParse()
Online Resources
https://www.json.org/json-en.htm– - Formal description of JSON syntax.
https://jsonlint.com/ - An excellent online tool that can check JSON strings for syntax errors.
https://cryptii.com/pipes/base64-to-binar– - Very useful tool that can convert between binary data and Base64 encodings.
https://github.com/cesanta/mjso– - Original mjson source project; Please refer to the documentation in this guide regarding the parser since many of the interfaces have been enhanced or modified.
String Escaping
The C language requires that certain characters are escaped in literal strings. For example, a double quote " contained within a literal string must be preceded with a backslash:"\"hello world\"" in memory would be "hello world". When interpreting the C language examples in the following sections implementors should keep in mind C escaping rules.
Likewise, certain characters within JSON encoded literal strings must be escaped. However, the rules for escaping JSON literal strings is entirely different and distinct from the C language. The JSON library provides encoding and decoding functions that properly handle JSON literal string escape sequences. Implementors using the facilities provided by the JSON library will not need to know the details of how JSON escape sequences work.
Another, common problem that causes syntax problems arises when using Microsoft Word. By default, Microsoft Word uses "Smart Quotes". These quotes do not have the same binary encoding as the standard ASCII double quote and will cause JSON syntax errors. This document was written with Smart Quotes disabled. Implementors may wisely choose to disable Smart Quotes under the Auto Correct options especially if using Microsoft Word to document equipment specific JSON elements.
Library Limitations
The JSON library has several limitations as enumerated in this section.
Unicode character sets are not supported. The library will not encode Unicode characters and it will not decode Unicode characters. If this is a requirement for an application, then the data should be encoded as a Base64 or ASCII Hexadecimal string. The library does and will encode and decode certain single byte characters as Unicode data to allow for proper string escaping.
The library allows for specifying hierarchical element paths delimited by a period symbol. If an element name includes a period, which is technically permittable under JSON syntax rules, then the path description will not be able to find the requested element. Implementors should not use the period symbol in their JSON element names.
Unit Test
After porting the JSON library to the target platform it is strongly recommended that MJSON_UNIT_TEST be defined and the MJUnitTest() function be invoked to verify that the library is behaving as expected. It is not necessary to compile in and invoke the unit test in production code.
The units test also provides implementors useful examples of how to properly use the JSON library.
Generation
MJPrintf() is the primary function used to generate JSON. It behaves in a similar way to the standard library function snprintf().
int32_t MJPrintf(MJBuffer_t *mb, const int8_t *fmt, ...);
The return value is always the number of bytes written to mb.
mb parameter
mb contains a pointer to a buffer where the JSON is printed. The structure also contains the size of the buffer, and the current length of the JSON string.
typedef struct MJBuffer
{
int8_t* ptr;
int32_t s ize;
int32_t len;
} MJBuffe r_t;
The JSON library will never overrun the buffer, and it always ensures NULL termination of the JSON string.
fmt parameter
fmt is just like the format specifier in printf(), except there is a different set of allowed formatters.
When using format specifiers, it is critical that implementors include the correct matching sequence and types of parameters otherwise a program is very likely to crash.
These types of errors are likely to be the number one source of problems in applications using the CWL, so implementors beware!
%Q – Quoted and Escaped String
%Q takes a matching int8_t * in the argument list. It prints a quoted and escaped string to mb.
Generally, when printing strings implementors should use %Q instead of %s. The reason is that %Q will escape certain characters that can appear within a string that would otherwise cause the JSON to be invalid.
uint8_t buf[1 00];
MJBuffer_t mb;
mb.ptr = buf;
mb.size = sizeof(buf);
mb.len = 0;
MJPrintf(&mb, "{%Q:%Q}", "name", "value");
/* mb.ptr == {"name":"value"}\0 */
/* mb.len == 16 */
mb.len = 0;
MJPrintf(&mb, "{%Q:%Q}", "name", "B\x12""C");
/* mb.ptr == {"name":"B\u0012C"}\0 */
/* mb.len == 19 */
mb.len = 0;
MJPrintf(&mb, "{%Q:%Q}", "name", "B\"C");
/* mb.ptr == {"name":"B\"C"}\0 */
/* mb.len == 15 */
%.*Q – Length Limited Quoted and Escaped String
%.*Q takes a matching int and int8_t * in the argument list. It behaves like %Q except that it limits the length of the string to value of the passed uint32_t.
%d and %ld – Signed Integer
%d takes a matching int in the argument list. It behaves like the standard printf() %d formatter and allows only numbers in an int16_t range to be printed. Unlike %d used in printf(), the behavior of %d will not differ between 16 and 32-bit platforms. When using %d the argument must either be of type int or be cast as such otherwise the variable argument list will be pushed improperly, leading to invalid behaviors or a crash.
%ld takes a matching int32_t in the argument list. It behaves like the standard printf() %ld formatter and allows only numbers in an int32_t range to be printed. The behavior of %ld will not differ between 16 and 32-bit platforms. When using %ld the argument must either be of type int32_t or be cast as such otherwise the variable argument list will be pushed improperly, leading to invalid behaviors or a crash.
Unlike the standard printf() %d and %ld, they do not permit any decorators such as flags, width, precision, or length.
%u and %lu – Unsigned Integer
%u takes a matching unsigned int in the argument list. It behaves like the standard printf() %u formatter but allows only numbers in an uint16_t range to be printed. Unlike %u used in printf(), the behavior of %u will not differ between 16 and 32-bit platforms. When using %u the argument must either be of type unsigned int or be cast as such otherwise the variable argument list will be pushed improperly, leading to invalid behaviors or a crash.
%lu takes a matching uint32_t in the argument list. It behaves like the standard printf() %lu formatter and allows only numbers in an uint32_t range to be printed. The behavior of %lu will not differ between 16 and 32-bit platforms. When using %lu the argument must either be of type uint32_t or be cast as such otherwise the variable argument list will be pushed improperly, leading to invalid behaviors or a crash.
Unlike the standard printf() %u and %lu, they do not permit any decorators such as flags, width, precision, or length.
%B - Boolean
%B takes a matching Boolean expression as an int in the argument list. It will print true or false to the buffer.
%s – Literal String
%s takes a matching int8_t * in the argument list. It behaves like the standard printf() %s formatter and prints a string to the output buffer.
Unlike the standard printf() %s, it does not permit any decorators such as flags, width, precision, or length.
Care must be taken when using %s since it will not properly escape control characters that would otherwise cause the JSON to be invalid.
%.*s – Length Limited Literal String
%.*s takes a matching int and int8_t * in the argument list. It behaves like the standard printf() %s formatter and prints a string to the output buffer where the int limits the total length of the string.
Unlike the standard printf() %s, it does not permit any decorators such as flags, width, precision, or length.
Care must be taken when using %.*s since it will not properly escape control characters that would otherwise cause the JSON to be invalid.
%g – Shortest Floating Point Number Representation
%g takes a matching double in the argument list. It behaves like the standard printf() %g formatter and prints the shortest representation of the double to the buffer.
Unlike the standard printf() %g, it does not permit any decorators such as flags, width, precision, or length.
%f – Floating Point Number
%f takes a matching double in the argument list. It behaves like the standard printf() %f formatter and prints the double value to the buffer.
Unlike the standard printf() %f, it does not permit any decorators such as flags, width, precision, or length.
%.[0-9]f – Floating Point Number with Specific Precision
%.[0-9]f takes a matching double in the argument list. It behaves like the standard printf() %f formatter and prints the double value to the buffer with a precision between 0 and 9.
Unlike the standard printf() %f, it does not permit any other decorators such as flags, width, or length.
%V and %v – Quoted and Unquoted Base64 Encoding
%V takes a matching int and uint8_t * expression in the argument list. It will print the Base64 encoded representation of the number of bytes specified by the int from the binary data specified by the uint8_t * in a quoted string.
%v performs the same encoding as %V but will not enclose the encoding with double quotes.
Base64 encoding is an efficient way to represent binary data as plain text that can be easily encoded in a JSON string. Typically, this is used for binary data that is voluminous or does not contain any obvious meaning, such as firmware images or proprietary binary configuration data.
The CWL substitutes the pipe character | for the standard forward slash character / when encoding and decoding Base64 strings. This is done to eliminate the required JSON escaping of the forward slash within the JSON string. This makes the length of the Base64 encoding more predictable and results in less memory consumption.
%H and %h – Quoted and Unquoted ASCII Hexadecimal Encoding
%H takes a matching int and uint8_t * expression in the argument list. It will print the ASCII hexadecimal encoded representation of the number of bytes specified by the int from the binary data specified by the uint8_t * in a quoted string.
%h performs the same encoding as %H but will not enclose the encoding with double quotes.
ASCII hexadecimal encoding is a way to represent binary data as plain text that can be easily encoded in a JSON string. However, it is not as compact an encoding as Base64.
Typically, this is used for binary data that is not voluminous and contains an obvious meaning, such as a MAC address or a status bitmask.
%M – Generic Printer Function
%M takes a matching MJPrintFn_t * and zero or more additional expressions in the argument list. This function will be invoked to print a complex value to the buffer.
For each optional expression after the MJPrintFn_t * the function must call va_arg() to pull the additional arguments off the list using the correct types and order. Failure to do so will typically cause a program to crash since the format specifier and variable argument list will become unsynchronized.
%A – Array Printer Function
%A works exactly as %M except that it will encapsulate the output of the invoked function between opening and closing square array brackets.
%O – Object Printer Function
%A works exactly as %M except that it will encapsulate the output of the invoked function between opening and closing curly object brackets.
Decoding
The JSON library has an extensive set of functions to properly parse JSON messages.
Home brew, ad-hoc parsing should not be attempted since such code will invariably not account for all of the subtleties of JSON syntax and will certainly lead to bugs and improper operation.
s, len, path - Common Arguments
There are several arguments common to the decoding functions. These arguments are defined here.
s is a pointer to the JSON string to decode.
len is the length of s in bytes. The parser will not go beyond len bytes of data when decoding.
path is a string that specifies the element of interest. A path must always start with the $ (dollar sign) character since it denotes the root of the JSON string. The names of other elements may be appended to the $ in a hierarchical manner by adding a .(period) character and the name of the element. Elements of an array are 0 indexed using square brackets like a C language array.
For example, take the JSON string:
The value of $ is the object {"a":{"DD":-42,"EE":"hello"},"d":["e","f"]}
The value of $.a is the object {"DD":-42,"EE":"hello"}
The value of $.a.DD is the number -42.
The value of $.d is the array ["e","f"]
The value of $.d[1] is string "f".
While any of the functions that accept a path will permit array notation, it is recommended to use the MJGetArrayElement() function since it simplifies application code when accessing array elements.
MJIsValid()
bool MJIsValid(const int8_t* s, int32_t len);
This function returns true when s is a valid JSON string, else it returns false.
MJFind()
MJToken_t MJFind(const int8_t* s, int32_t len, const int8_t* path, const int8_t** tokptr, int32_t* toklen);
This function finds the element declared in path and returns the data type of the value. When tokptr and toklen are not NULL then the function will also return the starting position of the value in s and its length.
Usually tokptr and toklen are passed as NULL since other functions are better at safely and correctly extracting the value into a usable C language variable.
Possible return values are:
MJSON_TOK_INVALID /* Element not found */
MJSON_TOK_STRING /* String, ASCII hexadecimal, or Base64 */
MJSON_TOK_NUMBER /* Number, double or integer */
MJSON_TOK_TRUE /* Boolean true */
MJSON_TOK_FALSE /* Boolean false */
MJSON_TOK_NULL /* null */
MJSON_TOK_ARRAY /* Array [] */
MJSON_TOK_OBJECT /* Object {} */
MJGetBase64()
int32_t MJGetBase64(const int8_t *s, int32_t len, const int8_t *path, int8_t *to, int32_t n);
This function decodes up to n bytes of a quoted Base64 string into to. If successful it returns the actual number of data bytes written into to. Or a negative number on a failure, usually because the to buffer is too small, in which case to will not contain a valid decoding and must not be used by the caller. This function writes binary data not a string into to, therefore NULL termination is not ensured unless the NULL termination character is part of the Base64 encoded string.
MJGetDouble()
int32_t MJGetDouble(const int8_t* s, int32_t len, const int8_t* path, double* v);
This function converts a floating-point number into a double and stores it to the memory pointed to by v. It returns sizeof(double) on success, any other return value indicates an unsuccessful conversion and the memory pointed to by v will not be updated and must not be used by the caller.
MJGetInteger()
This function converts the number in the JSON string element into a 32-bit signed integer. If the type of the element is not known to contain a 32-bit signed integer then MJGetDouble() should be used instead because it can represent a larger set of numbers. On success, the function will return sizeof(int32_t), otherwise the value pointed to by v must not be used by the caller.
MJGetUInteger()
This function converts the number in the JSON string element into a 32-bit unsigned integer. If the type of the element is not known to contain a 32-bit unsigned integer then MJGetDouble() should be used instead because it can represent a larger set of numbers. On success, the function will return sizeof(uint32_t), otherwise the value pointed to by v must not be used by the caller.
MJGetBool()
This function converts a JSON Boolean into a C language Boolean. On success it returns sizeof(bool) and the memory pointed to by v will contain the Boolean value of the element, any other return value is a failure and the value pointed to by v must not be used by the caller.
MJGetString()
This function returns an unescaped NULL terminated C language string into to up to n bytes in length. On success it returns the actual length of the string not including the NULL terminator. An empty string "" will return a length of zero. On failure, typically when to is not large enough to hold the unescaped string and NULL termination, it returns a negative number and the memory pointed to by to must not be used by the caller.
MJGetHex()
This function decodes an ASCII hexadecimal string into binary data and stores up to n bytes into to. The binary data may be returned in reverse order from the string encoding by passing true for the reverse parameter. On success the function returns the actual number of binary data bytes decoded. On failure, typically when to is not large enough to hold the decoded binary value of the string, a negative number is returned, and the caller must not use the data pointed to by to.
MJGetArrayElement()
This function makes it easier to extract the values of array elements even when they are multidimensionally nested. type declares the data type of the passed data buffer pointer that has a size of n, and is one of the following enumerated values:
MJSON_DATATYPE_HEX, MJSON_DATATYPE_HEX_REV, MJSON_DATATYPE_STRING, MJSON_DATATYPE_DOUBLE, MJSON_DATATYPE_INTEGER, MJSON_DATATYPE_UINTEGER, MJSON_DATATYPE_BASE64
After the n parameter is a list of zero or more numbers of type int that are the associated zero-indexed array indices specified in path. To indicate a variable index value in path use a %d format specifier.
The return value is equivalent to directly calling the associated primitive functions MJGetBase64(), MJGetDouble(), MJGetInteger(), MJGetUInteger(), MJGetBool(), MJGetString(), and MJGetHex() functions.
MJParse()
This function provides access to the low level JSON parser that is used internally by the functions previously described. It is primarily used when the name of an element can vary which makes specifying a path parameter difficult or impossible.
Generally, implementors will not need to use this function.