Apologies if this has been answered already, I searched but couldn’t find anything.
How, specifically, does Scrapscript actually hash each scrap?
Which hashing algorithm is used?
What canonicalization (if any) takes place before computing the hash?
This is a fascinating project, I’m excited to see how it evolves. I found your homepage whilst implementing my own interpretation of a content-addressing solution for lambda functions. Scrapscript looks awesome and I’d use it in production today if I could!
Right now, you can specify which hash you are using with the following syntax:
$sha1'3efce6ae1ebf7fef7c7bdd8c270d76da5b079438
Note that SHA1 is somewhat problematic, which is why I used it as an example
Some folks (I forgot where) made very good points about the hash type declaration not really being that important. In theory, the scrapyard could hash the content using CRC32, SHA1, MD5, etc. and try to match against all at the same time in the same big KV store. The address space is really, really big!
The canonical format of a scrap is its “flat scrap”, which is a binary representation of a scrap (something like msgpack). I’m still working through the details on this, but I expect to canonicalize variable names in functions, so b -> b and a -> a should flatten to the same exact content. If possible, I’d like to keep “hydrated” variable names elsewhere, so that you can restore the original var names and other metadata in an editor. I won’t make any other optimizations, so () -> 1 + 2 and () -> 2 + 1 will remain different.