Project Dana

Automated type inference for JSON decoder

Hello,

I'm using the JSONEncoder to write and read data instances from JSON. I'm wondering if it's possible to create a version of the jsonToData() function where the type information is automatically constructed from the JSON data itself.

At the moment the type info must be passed in as a parameter but I have some situations where it could be useful to infer the type from the JSON. Is this sensible as a thing to attempt? And if so could you give a pointer or two on how to start?

Thanks.

by Noah

Hi Noah,

I think you can code it the way you want it. But I'm not sure if this is desirable. You'll end up with some problems in the end that I'm not sure if it's worth it. Consider the following JSON strings:

Json1: {name: "Roberto", age: 32}
Json2: {name: "Roberto", age: 32.5}
Json3: {name: "Roberto"}
Json4: {age: 32, name: "Roberto"}

Because JavaScript is dynamically typed, it allows JSONs to be flexibly structured. Dana, on the other hand, is statically typed, meaning that if "age" is defined as int you won't be able to assign a 32.5 value to it (not without casting -- even if the casting is implicit).

So, the way we coded JSONEncoder, we made sure that it can convert all 4 JSON string examples into the same Dana data type. That's important because it ensures that whoever is using JSONEncoder knows the type "jsonToData()" is converting the JSON string into, and that makes JSONEncoder safe to use for whatever JSON string you pass as a parameter.

If you implement jsonToData() the way you want it, for each of the 4 JSON string examples you'll end up with a completely different Dana data type. For all the possibilities you have to format your JSON string, you'll end up with a different Dana data type. This makes "jsonToData()" somewhat unpredictable and difficult to use -- unless you craft your JSON strings very carefully (considering type orders, type names, and not omitting values even if they are null). That means you won't be able to receive JSON strings created from a different system with confidence it'll work all the time. Not sure if this is worth it.

But if you still wanna try, have a look at the "reflection" section on the Dana guide: https://www.projectdana.com/dana/guide/reflection

Roberto

by (anonymous)

Hi,

Thanks for the awesome detailed explanation. I hadn't thought much about field ordering for sure, I see how that could yield different and incompatible types without a schema to work to. I also didn't realize the JSONEncoder figured out field ordering automatically like this even for differently ordered fields for different records. I understand now it doesn't really make much sense to try to interpret general JSON data without a schema to work with. My other scenario was saving data to files from transfer state, and reading it back in from the file.

I suppose if I know the data is only ever written by JSONEncoder in the first place I can make some assumption about coherence of the data and not bother saving out the schema separately. Based on your thoughts here I think the tidiest design would be to try writing some kind of Type inferType(char json[]) utility function to "guess" the schema from the JSON, with maybe some exception thrown if no sensible inference was possible, then feed that Type into jsonToData().

I'll think about this some more. Thanks again for the answer.

by Noah