I've been doing some performance tuning in Go lately to really squeak performance, and ended up with a very similar arena design except using byte slices for buf and chunks instead of unsafe pointers. I think I tried that too and it wasn't any faster and a whole lot uglier, but I'll have to double check before saying that with 100% confidence.
A couple other easy wins -
if you start with a small slice and find some payloads append large amounts, write your own append that preemptively is more aggressive in cap bumping before calling the builtin append.
unsafe.String is rather new and great for passing strings out of byte slices without allocating. Just read the warnings carefully and understand what you're doing.
PaulKeeble 1 hours ago [-]
The append(slice,slice2...) code is all well and good but its going to hit into the expansion quite often. When you know the second append is going to be large its often faster to allocate a new slice with the right size and no elements and then append both slices to it, then there is no expansion costs the values just get copied in and it also produces less garbage to be collected.
I have done a few other things in the past where I had sliceLike's which took two slices and point to one and then the other and a function mapped to the indices as if they were appended, costs a bit on access but saves on the initial allocation if you don't intend to iterate through the entire thing or only do so once.
The base library in go does not do much for optimising this sort of thing, its not a dominate operation in most applications so I can see why we don't have more advanced data structures and algorithms. You have to be quite heavily into needing different performance characteristics to outperform the built ins with custom code or a library. All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.
kristianp 42 minutes ago [-]
Just a quick meta note. This article is really lengthy, I don't have time to read this level of detail for the background. For example the "Mark and Sweep" section takes up more than 4 pages on my laptop screen. That section starts more than 5 pages into the article. Is this the result of having AI help to write sections, and as a result making it too comprehensive? It's easy to generate content, but the editing decisions to keep the important parts haven't been made. I just want to know the part about the Arena allocator, I don't need a tutorial on garbage collection as well.
19 minutes ago [-]
foundry27 14 minutes ago [-]
tl;dr for anyone who may be put off by the article length:
OP built an arena allocator in Go using unsafe to speed allocator operations up, especially for cases when you're allocating a bunch of stuff that you know lives and dies together. The main issue they ran into is that Go's GC needs to know the layout of your data (specifically, where pointers are) to work correctly, and if you just allocate raw bytes with unsafe.Pointer, the GC might mistakenly free things pointed to from your arena because it can't see those pointers properly. But to make it work even with pointers (as long as they point to other stuff in the same arena), you keep the whole arena alive if any part of it is still referenced. That means (1) keeping a slice (chunks) pointing to all the big memory blocks the arena got from the system, and (2) using reflect.StructOf to create new types for these blocks that include an extra pointer field at the end (pointing back to the Arena). So if the GC finds any pointer into a chunk, it’ll also find the back-pointer, therefore mark the arena as alive, and therefore keep the chunks slice alive. Then they get into a bunch of really interesting optimizations to remove various internal checks and and write barriers using funky techniques you might not've seen before
mholt 2 hours ago [-]
Off topic, but I love the minimap on the side -- for pages where I might be jumping around the content (long, technical articles, to refer back to something I read earlier but forgot) -- how can I get that on my site? Way cool.
A couple other easy wins -
if you start with a small slice and find some payloads append large amounts, write your own append that preemptively is more aggressive in cap bumping before calling the builtin append.
unsafe.String is rather new and great for passing strings out of byte slices without allocating. Just read the warnings carefully and understand what you're doing.
I have done a few other things in the past where I had sliceLike's which took two slices and point to one and then the other and a function mapped to the indices as if they were appended, costs a bit on access but saves on the initial allocation if you don't intend to iterate through the entire thing or only do so once.
The base library in go does not do much for optimising this sort of thing, its not a dominate operation in most applications so I can see why we don't have more advanced data structures and algorithms. You have to be quite heavily into needing different performance characteristics to outperform the built ins with custom code or a library. All parts of Go's simplicity push that seems to assume people don't need anything else other than Array Lists and hash maps.
OP built an arena allocator in Go using unsafe to speed allocator operations up, especially for cases when you're allocating a bunch of stuff that you know lives and dies together. The main issue they ran into is that Go's GC needs to know the layout of your data (specifically, where pointers are) to work correctly, and if you just allocate raw bytes with unsafe.Pointer, the GC might mistakenly free things pointed to from your arena because it can't see those pointers properly. But to make it work even with pointers (as long as they point to other stuff in the same arena), you keep the whole arena alive if any part of it is still referenced. That means (1) keeping a slice (chunks) pointing to all the big memory blocks the arena got from the system, and (2) using reflect.StructOf to create new types for these blocks that include an extra pointer field at the end (pointing back to the Arena). So if the GC finds any pointer into a chunk, it’ll also find the back-pointer, therefore mark the arena as alive, and therefore keep the chunks slice alive. Then they get into a bunch of really interesting optimizations to remove various internal checks and and write barriers using funky techniques you might not've seen before
But it uses a canvas and redraws.
While the post's website renders a copy of the page in a <div> and scroll it. As you can check by inspecting the div.