Page 1 of 1

Timing problems with IRQs and graphics

Posted: Thu Sep 08, 2011 6:04 pm
by Dirbaio
I'm trying to code a MP3 player for the DS. Moonshell-like: you get a nice UI with smooth scrolling lists and such. MP3 streaming is working, the basic UI is done (a navigable scrolling file list)

But I have a problem: calls to the MP3 player take much more than 1/60th of a second so I can't animate the UI properly. I've tried to modify the MP3 decoder to decode smaller chunks at a time so it can keep up with the UI but no luck. Probably because doing things in smaller chunks makes the cache less effective.

So I thought of leaving the MP3 code on main() and update the UI in the Vblank IRQ handler. Here's what my code looks like right now (showing only the relevant stuff):

Code: Select all

int inputKeysDown;
int inputKeysHeld;
int inputKeysRepeat;

touchPosition inputTouch;

stack<Scene*> scenes;

void vBlank()
{
	if(scenes.size() == 0)
	{
		fprintf(stderr, "[IRQ] No scene found!\n");
		return;
	}

	timerStart(2, ClockDivider_256, 0, NULL);
	
	fprintf(stderr, "[IRQ] entering\n");

	bgUpdate();

	scanKeys();
	touchRead(&inputTouch);
	inputKeysDown = keysDown();
	inputKeysHeld= keysHeld();
	inputKeysRepeat = keysDownRepeat();
	
	Scene* sc = scenes.top();
	sc->tick();
	sc->render();

	//Draw BG image
	setTexture(0);
	glPolyFmt(POLY_ALPHA(31) | POLY_CULL_NONE);
	GFX_COLOR = RGB15(31, 31, 31);
	glBegin(GL_QUAD);
	GFX_TEX_COORD = (TEXTURE_PACK(inttot16(0), inttot16(0)));
	glVertex3v16(0, 0, -600 );
	GFX_TEX_COORD = (TEXTURE_PACK(inttot16(256), inttot16(0)));
	glVertex3v16(256, 0, -600 );
	GFX_TEX_COORD = (TEXTURE_PACK(inttot16(256), inttot16(192)));
	glVertex3v16(256, 192, -600 );
	GFX_TEX_COORD = (TEXTURE_PACK(inttot16(0), inttot16(192)));
	glVertex3v16(0, 192, -600 );

	glEnd();

	glFlush(GL_TRANS_MANUALSORT);

	//2181 = TIMER_HZ / 256 / 60 = Number of time ticks occurring in 1/60th second
	int dd = timerTick(2)*100/2181; 
	printf("\x1b[11;1HCPU_IRQ:  %d %%        ", dd);

	printf("\x1b[13;1H3D Scanline Buffer: %d        ", (*(vu32*) 0x04000320) );
	printf("\x1b[14;1HVertex RAM: %d        ", GFX_VERTEX_RAM_USAGE);
	printf("\x1b[15;1HPolygon RAM: %d        ", GFX_POLYGON_RAM_USAGE);

	struct mallinfo inf = mallinfo();
	printf("\x1b[8;1HRAM  %d       ", inf.uordblks);
		
	fprintf(stderr, "[IRQ] exiting\n");

	fprintf(stderr, "[IRQ] Total time %d\n", dd);
	timerStop(2);
}

int main(int argc, char *argv[])
{
	(...)

	irqSet(IRQ_VBLANK, vBlank);
	irqEnable(IRQ_VBLANK);
	
	while(true)
	{
		if(globalPlayer)
			globalPlayer->update();
		(...)
		swiWaitForVBlank();
	}
	
	return 0;
}
I've made the update() and render() functions to be as small as possible. Only new items that come on screen are rendered. I've also checked that the (tiny) IRQ Mode stack does not overflow. It works flawlessly.

It measures CPU related to the 60Hz screen refresh rate: 100% would mean it's taking exactly 1/60th of a second. Usually it is at CPU 2%. When you scroll the list and it has to render some files it goes up to 30% or so.

BUT There's ONE issue: Sometimes the CPU *can* go up to 100% (for example scrolling the list really quickly). Then everything stops working.

The IRQ handler is still called every frame, but it now takes more or less 95% CPU always, even if the list is not being scrolled. It "locks up" at 95% CPU. Also, the touch screen stops working for some reason. I can still navigate the list using the arrow keys though. And of course, since now IRQ mode takes up 95% CPU there's no CPU time left for the MP3 decoder and it hangs too :(

This bug is present both in DeSmuME and the real DS.

I think I know why it happens: I'm calling glFlush. According to GBATek, glFlush locks up until next Vblank. If I remove the glFlush call it doesnt lock up. Probably if the CPU goes too high, glFlush is called too late and it waits for the *next* Vblank. Then the previous Vblank IRQ is called late too, which makes the next glFlush wait for the *next* vblank again :(

So, how can I prevent this from happening? Is there a reliable way to do this?

Re: Timing problems with IRQs and graphics

Posted: Thu Sep 08, 2011 7:22 pm
by elhobbs
I am not sure that using the 3d engine in a vblank is such a good idea. glFlush swaps the geometry buffers during vblank.

Are you sure that the decoder is the issue? I would think that loading the mp3 from disk would be a bigger issue. How are you doing this? are decoding with the whole file already loaded into main memory or are you reading from disk as needed?

Re: Timing problems with IRQs and graphics

Posted: Thu Sep 08, 2011 7:58 pm
by zeromus
flush doesnt wait for the next vblank. but, after flush is called, any subsequent 3d operation will freeze the system until the next vblank. But since your 3d code shouldn't be running until the next vblank already, then this shouldn't be what freezes it up. Unless your next vblank callback which is stacked up fires immediately after returning from the old vblank. Then you would get this kind of trouble.

Your approach is rickety and should be discontinued. But, if you insist, then I suggest trying to run it every other frame. Since you can't nail 60fps reliably, then give up and try to nail 30fps reliably. Just immediately return from every other vblank.

Your drawing code should not take that long, though. I suggest you improve it, and then you won't have these problems. If youre using std::string for your drawing, consider how unsafe it is to be allocating from the heap while in an interrupt handler which could have interrupted the user program at any point, perhaps during other heap allocs (I doubt the libnds allocators are thread safe). This is one of many reasons you should not do this kind of logic from an interrupt.

I think your explanation about smaller chunks using the cache less effectively is not very good. If your UI is as slow as youve described (which is your real problem) then the mp3 decoder won't have time to keep up, no matter what size chunks you use.

Re: Timing problems with IRQs and graphics

Posted: Thu Sep 08, 2011 8:23 pm
by Dirbaio
elhobbs wrote:I am not sure that using the 3d engine in a vblank is such a good idea. glFlush swaps the geometry buffers during vblank.

Are you sure that the decoder is the issue? I would think that loading the mp3 from disk would be a bigger issue. How are you doing this? are decoding with the whole file already loaded into main memory or are you reading from disk as needed?
No, this is not the issue, because the freeze happens even if nothing is being played at all.
(I'm using libmpg123, which internally streams and decodes from file as you request chunks of decoded data. It even has ARM optimizations!)

zeromus wrote:flush doesnt wait for the next vblank. but, after flush is called, any subsequent 3d operation will freeze the system until the next vblank. But since your 3d code shouldn't be running until the next vblank already, then this shouldn't be what freezes it up. Unless your next vblank callback which is stacked up fires immediately after returning from the old vblank. Then you would get this kind of trouble.
Yes, I think that's what happening. It looks like libnds doesn't allow nested interrupts: vblank IRQ handler is called with IME = 0. I think the hardware "delays" any interrupts until IME is 1 again.
zeromus wrote:Your approach is rickety and should be discontinued. But, if you insist, then I suggest trying to run it every other frame. Since you can't nail 60fps reliably, then give up and try to nail 30fps reliably. Just immediately return from every other vblank.

Your drawing code should not take that long, though. I suggest you improve it, and then you won't have these problems. If youre using std::string for your drawing, consider how unsafe it is to be allocating from the heap while in an interrupt handler which could have interrupted the user program at any point, perhaps during other heap allocs (I doubt the libnds allocators are thread safe). This is one of many reasons you should not do this kind of logic from an interrupt.
Yes, you're right, it's rickety I knew it from the very first moments I coded it. But, is there any non-rickety, good way of doing this?
zeromus wrote:I think your explanation about smaller chunks using the cache less effectively is not very good. If your UI is as slow as youve described (which is your real problem) then the mp3 decoder won't have time to keep up, no matter what size chunks you use.
Yes it can. The UI is not that slow because it doesn't re-render all the text every frame, it only redraws new text. So it can exceed 100% CPU for one frame in a while, when doing things like switching folders or scrolling really fast. The MP3 engine does not have any problems because it uses a big buffer.

Re: Timing problems with IRQs and graphics

Posted: Fri Sep 09, 2011 11:52 am
by WinterMute
libnds allows nested interrupts just fine, you just need to set REG_IME=1 during any interrupt where you want to allow nesting. Defaulting to always allow nesting is problematic - amongst other things it makes it impossible to prevent nesting on interrupts where you might want to prevent this happening.

*printf isn't terribly fast and stdio in general isn't re-entrant, using this stuff in an interrupt context pretty much guarantees your code will die ... somewhere ... eventually.

Running input and output rendering in vblank isn't an awful idea but you'll have to be extremely careful with the code that runs in the interrupt. Basically this means no stdio, no heap allocations and nothing that waits around too long for a response. 95% cpu is *way* over budget for a vblank handler, that needs to be more in the region of 30 - 40% tops.

Re: Timing problems with IRQs and graphics

Posted: Fri Sep 09, 2011 1:28 pm
by elhobbs
am I missing something in regards to nested interrupts? why would you want a vblank handler to be called again before it is completed? wouldn't that be a problem?

Re: Timing problems with IRQs and graphics

Posted: Fri Sep 09, 2011 1:36 pm
by WinterMute
Nested interrupts mean that an interrupt handler can be interrupted - i.e. other pending interrupts can be serviced during the handler. Having a vblank handler interrupted by a vblank irq is obviously a bad idea and this needs to be avoided. Even if nesting isn't enabled a vblank handler that takes more than a frame to complete will essentially consume 100% of cpu anyway.