Aligned thread-locals initialized to incorrect value

support for the ARM toolchain
Post Reply
Posts: 1
Joined: Sat Jun 11, 2022 1:25 am

Aligned thread-locals initialized to incorrect value

Post by ian-h-chamberlain » Wed Jun 15, 2022 2:10 pm

Hello! I have been debugging an issue for a while in which I found the initial value of __thread variables to be wrong (building for the 3DS), under certain circumstances. The following code reproduces the issue. Notably, changing these seems to resolve it:
  • using ALIGN(8) or less for BUF_16
  • building with -O1 or higher

Code: Select all

#include <3ds.h>
#include <stdio.h>
#include <string.h>

typedef ALIGN(4) struct {
    u8 inner[3];
} Align4;

typedef ALIGN(16) struct {
    u8 inner[3];
} Align16;

static __thread Align4 BUF_4 = {.inner = {2, 2, 2}};
static __thread Align16 BUF_16 = {.inner = {1, 1, 1}};

main(int argc, char** argv)
    consoleInit(GFX_TOP, NULL);

    BUF_16.inner[0] = 0;

    bool reproduced = false;

    for (int i = 0; i < 3; i++) {
        if (BUF_4.inner[i] != 2) {
            reproduced = true;
        printf("%d, ", BUF_4.inner[i]);

    if (reproduced) {
    else {

    // Main loop
    while (aptMainLoop()) {

        u32 kDown = hidKeysDown();
        if (kDown & KEY_START)
            break;  // break in order to return to hbmenu

        // Flush and swap framebuffers

    return 0;
From looking at objdump output, what I believe is happening (although I am no expert) is that the linker appears to generating incorrect offsets for the thread-local variables. The constant pool at the end of main looks like this:

Code: Select all

  100d2c:	eb0013c3 	bl	105c40 <__aeabi_read_tp>
  100d30:	e1a03000 	mov	r3, r0
  100d34:	e59f2114 	ldr	r2, [pc, #276]	; 100e50 <main+0x148> ; example use of the offsets, in this case BUF_16
  100d38:	e3a01000 	mov	r1, #0
  100d3c:	e7c31002 	strb	r1, [r3, r2] ; BUF_16.inner[0] = 0
  100e4c:	e8bd8800 	pop	{fp, pc}
  100e50:	00000024 	.word	0x00000024 ; offset for BUF_16
  100e54:	00000014 	.word	0x00000014 ; offset for BUF_4
  100e58:	00121000 	.word	0x00121000
  100e5c:	00121008 	.word	0x00121008
  100e60:	0012100c 	.word	0x0012100c
  100e64:	00121018 	.word	0x00121018
Whereas the thread-local initializer data looks like this, which (if I understand correctly) would seem to indicate that the offsets should be 0x1C and 0xC, respectively (including the 0x8 ARM thread-local offset). Hex-editing the binary to 0x1C and 0xC results in the expected behavior.

Code: Select all

Contents of section .tdata:
 12af0c 00000000 02020200 00000000 00000000  ................
 12af1c 00000000 01010100                    ........        
The reason I suspect the linker is that the object file itself has all zero offsets (I presume these get filled in during relocation of the object file during linking)?

Code: Select all

 140:	e24bd004 	sub	sp, fp, #4
 144:	e8bd8800 	pop	{fp, pc}
 148:	00000000 	.word	0x00000000 ; BUF_16
 14c:	00000000 	.word	0x00000000 ; BUF_4
 150:	00000000 	.word	0x00000000
 154:	00000008 	.word	0x00000008
 158:	0000000c 	.word	0x0000000c
 15c:	00000018 	.word	0x00000018
I'm hoping for any ideas about what might be causing this (am I accidentally creating UB or something?), or other workarounds to emit the proper offsets for thread-locals with alignment like this. I tried changing some ALIGN() directives in the 3dsx.ld linker script, but got inconsistent results and I'm a bit out of my depth when it comes to writing linker scripts, so I'm hoping someone here can point me in the right direction.

Site Admin
Posts: 1989
Joined: Tue Aug 09, 2005 3:21 am
Location: UK

Re: Aligned thread-locals initialized to incorrect value

Post by WinterMute » Thu Apr 20, 2023 3:00 pm

Apologies for not approving this post sooner.

This issue was fixed with

Thanks for the PR
Help keep devkitPro toolchains free, Donate today

Personal Blog

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests