Why shouldn't you use YAML to store eye tracking data? /s

qaz@lemmy.world · edit-2 1 month ago

Why shouldn't you use YAML to store eye tracking data? /s

Nate@piefed.alphapuggle.dev · 1 month ago

This isn’t YAML, this is just sparkling JSON

ZoteTheMighty@lemmy.zip · 1 month ago

All yaml is just sparkling JSON.

Caveman@lemmy.world · 1 month ago

Always has been

DataElemental@programming.dev · 1 month ago

How do you figure?

TomasEkeli@programming.dev · 1 month ago

Valid JSON is valid YAML.

So valid YAML can contain JSON

DataElemental@programming.dev · 1 month ago

That’s a long way from all or most YAML being JSON-compatible. I wonder if more YAML files in the wild parse as Markdown than JSON.

TomasEkeli@programming.dev · 1 month ago

JSON is valid YAML does not imply that YAML is valid JSON

DataElemental@programming.dev · edit-2 30 days ago

That was my point, yes, but I was replying to @ZoteTheMighty@midwest.social’s suggestion that “all yaml is just sparkling json”.

MonkderVierte@lemmy.zip · 1 month ago

Maybe use a real database for that? I’m a fan of simple tools (e.g. plaintext) for simple usecases but please use appropriate tools.

deegeese@sopuli.xyz · 1 month ago

If you’re using a library to handle deserialization , the ugliness of the serial format doesn’t matter that much.

Just call yaml.load() and forget about it.

BodilessGaze@sh.itjust.works · 1 month ago

That works until you realize your calculations are all wrong due to floating point inaccuracies. YAML doesn’t require any level of precision for floats, so different parsers on a document may give you different results.

deegeese@sopuli.xyz · 1 month ago

What text based serialization formats do enforce numeric precision?

AFAIK it’s always left up to the writer (serializer)

BodilessGaze@sh.itjust.works · edit-2 1 month ago

Cuelang: https://cuelang.org/docs/reference/spec/#numeric-values

Implementation restriction: although numeric values have arbitrary precision in the language, implementations may implement them using an internal representation with limited precision. That said, every implementation must:

Represent integer values with at least 256 bits.

Represent floating-point values with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.

Give an error if unable to represent an integer value precisely.

Give an error if unable to represent a floating-point value due to overflow.

Round to the nearest representable value if unable to represent a floating-point value due to limits on precision. These requirements apply to the result of any expression except for builtin functions, for which an unusual loss of precision must be explicitly documented.

deegeese@sopuli.xyz · edit-2 1 month ago

Thanks for teaching me something, but the obscurity of your answer just illustrates how rare that requirement is in human readable formats, and mostly limited to data formats designed for numeric precision, like HDF5, FITS or protobuf.

BodilessGaze@sh.itjust.works · edit-2 1 month ago

I don’t think having well-defined precision is a rare requirement, it’s more that most devs don’t understand (and/or care) about the pitfalls of inaccuracy, because they usually aren’t obvious. Also, languages like JavaScript/PHP make it hard to do things the right way. When I was working on an old PHP codebase, I ran into a popular currency library (Zend_Currency) that used floats for handling money, which I’m sure works fine up until the point the accountants call you up asking why they can’t balance the books. The “right way” was to use the bcmath extension, which was a huge pain.

squaresinger@lemmy.world · 1 month ago

Technically, JSON enforces a specific numeric precision by enforcing that numbers are stored as JS-compatible floating point numbers with its associated precision.

Other than that, the best way to go if you want to have a specific precision is to cast to string before serialisation.

fibojoly@sh.itjust.works · 1 month ago

I’m amazed at developers who don’t grasp that you don’t need to have absolutely everything under the sun in a human readable file format. This is such a textbook case…

chaospatterns@lemmy.world · 1 month ago

Yeah this isn’t even human readable even when it’s in YAML. What am I going to do? Read the floats and understand that the person looked left?

squaresinger@lemmy.world · 1 month ago

It’s human-readable enough for debugging. You might not be able to read whether a person look left, but you can read which field is null or missing or wildly out of range. You can also read if a value is duplicated when it shouldn’t be.

Human-readable is primarily about the structure and less about the data being human readable.

marcos@lemmy.world · 1 month ago

Even if you want it to be human readable, you don’t need to include the name into every field and use balanced separators.

Any CSV variant would be an improvement already.

fibojoly@sh.itjust.works · 1 month ago

Even using C#'s decimal type (128bit) would be an improvement! I count 22 characters per numbers here. So a minimum of 176bit.

Dultas@lemmy.world · 1 month ago

That’s it everyone, back to copybooks.

FuckBigTech347@lemmygrad.ml · 1 month ago

Exactly. All modern CPUs are so standardized that there is little reason to store all the data in ASCII text. It’s so much faster and less complicated to just keep the raw binary on disk.

slackness@lemmy.ml · 1 month ago

Fuck yaml. I’m not parsing data structured with spaces and newlines with my eyes. Use visible characters.

DataElemental@programming.dev · 1 month ago

Does your viewer/editor not show space chars or indent levels?

lime!@feddit.nu · edit-2 1 month ago

i mean, json is valid yaml

raman_klogius@ani.social · edit-2 1 month ago

Why you shouldn’t use YAML

Damage@feddit.it · 1 month ago

The best approach would be to never use yaml for anything

BodilessGaze@sh.itjust.works · 1 month ago

YAML doesn’t require any level of accuracy for floating point numbers, and that doc appears to have numbers large enough to run into problems for single-precision floats (maybe double too). That means different parsers could give you different results.

BestBouclettes@jlai.lu · 1 month ago

I really like YAML but way too many people use it beyond its purpose… I work with Gitlabci and seeing complex bash scripts inline in YAML files makes me want to hurt people.

disco@lemdro.id · 1 month ago

This is nasty to look at

wise_pancake@lemmy.ca · 1 month ago

I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.

In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.

qaz@lemmy.world · edit-2 1 month ago

Thanks for the advice, but this is just the format of some eyetracking software I had to use not something I develop myself

wise_pancake@lemmy.ca · 1 month ago

Ah, well, such is software dependencies.

Supercrunchy@programming.dev · 1 month ago

Also let’s represent all numbers in scientific notation, I’m sure that’s going to make it easier to read…