WinterMute wrote:Snarky comments aren't really that helpful tbh. I had this discussion several times during commercial dev work as well and we never managed to find a situation where a custom allocator actually had a significant benefit. I used a custom allocator once to avoid the overhead of newlib in a GBA based multiboot project & it turned out that I'd significantly overestimated the overhead of using malloc.
This problem has been around ever since mem2 support was added and it's taken this long to even get acknowledged, let alone fixed (and I'm sure this will be followed up by the usual "patches are welcome" BS. Sure they are, because it gives you an excuse to ask for donations without actually doing any work).
To say that a custom allocator can't do a better job than an implementation that potentially discards nearly all of MEM1 is ridiculous (plus you obviously never worked on an embedded project where dlmalloc's >20KB code size was prohibitive, that's low hanging fruit right there).
Obviously it would be much better if mem1 wasn't effectively locked out once sbrk traverses regions and sbrk took account of trim but, as several people have found out when attempting to address the issue, it's really not as simple as it sounds. Of course, having said that, once it's figured out no doubt it will look simple.
It is simple:
- discard the broken sbrk implementation
- create an mspace with base region of unused MEM1 (end of program data to top of MEM1), unlocked so it can be expanded
- make malloc() and friends use this mspace
- create an mmap implementation that handles 16KB pages of MEM2, using a bit array for tracking (~48MB / 16KB = 3072 pages = only 384 bytes).
- configure dlmalloc to use mmap for getting more core memory
- fix iosCreateHeap to use mmap (and add all the missing code to free heaps properly)
The starlet code is running in mem2 which does make mem2 access from the powerpc measurably slower. It's faster if Starlet is restricted to it's own exclusive ram, much like the DS. It's also an arm9.
I'd like to see those measurements. 99.9% of the time starlet/IOS is sitting in an idle loop, not touching the bus at all.
Starlet doesn't actually have any "exclusive ram" (I assume you mean the 128KB SRAM); it sits on the same bus as MEM2 and can be directly accessed by the powerpc when the correct bit is set appropriately.
If I remember right, 64 byte fetches were enabled by default for Wii
You remember wrong. It's actually advisable to not enable it unless you really know what you're doing, since all SDK and homebrew apps contain startup code to configure L2 access for 32 bytes and you can't switch down without a hard reset (courtesy of IOS via a title relaunch). So if your app launches another app, calling L2Enhance() will not end well.