> Introducing the ‘packed’ data format, a binary format that allows using data as it is, without the need for a deserialisation step. A notable perk of this format is that traversals on packed trees is proven to be faster than on ‘unpacked’ trees: as the fields of data structures are inlines, there are no pointer jumps, thus making the most of the L1 cache.
That is, a "memory dump -> zero-copy memory read" of a subgraph of Haskell objects, allowing to pass such trees / subgraphs directly over a network. Slightly reminiscent of Cap'n Proto.
Zolomon 60 days ago [-]
They mention this in the article.
spockz 60 days ago [-]
It reminds me more of flat buffers though. Does protobuf also have zero allocation (beyond initial ingestion) and no pointer jumps?
cstrahan 58 days ago [-]
No, one example of why being variable sized integers.
One thing that sometimes gets tricky in these things is handling Sub term sharing. I wonder how they implemented it.
90s_dev 60 days ago [-]
We are always reinventing wheels. If we didn't, they'd all still be made of wood.
tlb 59 days ago [-]
> the serialised version of the data is usually bigger than its in-memory representation
I don’t think this is common. Perhaps for arrays of floats serialized as JSON or something. But I can’t think of a case where binary serialization is bigger. Data types like maps are necessarily larger in memory to support fast lookup and mutability.
IsTom 59 days ago [-]
If you use a lot of sharing in immutable data it can grow a lot when serializing. A simple pathological example would be a tree that has all left subtrees same as the right ones. It takes O(height) space in memory, but O(2^height) when serialized.
nine_k 59 days ago [-]
I suppose all self-describing formats, like protobuf, or thrift or, well, JSON are bigger than the efficient machine representation, because they carry the schema in every message, one way or another.
lordleft 59 days ago [-]
This was very well written. Excellent article!
NetOpWibby 59 days ago [-]
Is this like MessagePack for Haskell?
gitroom 59 days ago [-]
honestly i wish more stuff worked this way - fewer hops in memory always makes me happy
That is, a "memory dump -> zero-copy memory read" of a subgraph of Haskell objects, allowing to pass such trees / subgraphs directly over a network. Slightly reminiscent of Cap'n Proto.
See https://protobuf.dev/programming-guides/encoding/
I don’t think this is common. Perhaps for arrays of floats serialized as JSON or something. But I can’t think of a case where binary serialization is bigger. Data types like maps are necessarily larger in memory to support fast lookup and mutability.