Heavy calculations on nds original

roelforg
Posts: 13
Joined: Thu Sep 22, 2011 5:47 pm

Heavy calculations on nds original

Post by roelforg » Thu Sep 22, 2011 6:03 pm

Hello,
I'm new here and i'm not sure where to post :oops: ... (Move the topic if i made a mistake)

I once made a mandelbrot viewer for my pc and wanted to port it to my nds.
I have already made some simple nds programs and have years of c++ experience.
My problem is the time it takes to calculate on the nds.
I was thinking and i thought:
"Well... the arm9 takes a lot of time, but how about the arm7? It sits mostly idle."
So, is it possible (without overheating the arm7) to make a adapted arm7 binary with a copy of the calculation algorithm?
I've read it's possible to use fifo to communicate between the two.
And i think is should be possible for the arm7 to do half of the calculations to speed things up, the faster arm9 then has more time to put things on screen and to sync them.
Can i do this without breaking/overheating the arm7?

The scheme give a symbolic (no programming knowledge required) representation of the idea:

Scheme:

Code: Select all

arm9:
--point1
now at pixel (x,y).
send(x,y)->arm7
x+1
y+1
--point 2
now at pixel(x,y)
calculate(x,y)
plot x,y to screen
x+1
y+1
check for reply from arm7, yes: plot to screen and goto point 1
else: goto point 2

arm7:
--point 1
message from arm9?
yes:
 -retrieve x,y
 -calculate
 -send x,y,value
do arm7 stuff
goto point 1
Help please.
We all ask nooby questions every once in a while.
Nobody's perfect, we all need help every once in a while.
Our questions might sound stupid to people but then, so do their's to us.
So please, think before calling someone you don't know a noob.

mtheall
Posts: 211
Joined: Thu Feb 03, 2011 10:47 pm

Re: Heavy calculations on nds original

Post by mtheall » Thu Sep 22, 2011 6:44 pm

Splitting your algorithm across the two CPU's in this manner will actually probably be slower than just doing it directly on the ARM9, due to the overhead of inter-processor communication. I would first look into optimizing your code. There are two main things I would do first: compile in ARM mode (instead of thumb), and set optimization to level 3 (O3). So your makefile would have the following changes:

Code: Select all

ARCH  :=  -mthumb -mthumb-interwork -march=armv5te -mtune=arm946e-s
becomes

Code: Select all

ARCH  :=  -marm -mthumb-interwork -march=armv5te -mtune=arm946e-s

Code: Select all

CFLAGS  :=  -g -Wall -O2
becomes

Code: Select all

CFLAGS  :=  -g -Wall -O3
These two things can give significant performance boosts. Alternative to the first Makefile change is to change your source file names from XXX.cpp to XXX.arm.cpp. This will cause only those cpp files to be compiled as ARM code.

After this, you might be interested into finding ways to improve your algorithm. It is generally not faster to split the workload across the CPU's; I don't think there's really a native way to enforce synchronization.

zeromus
Posts: 212
Joined: Wed Mar 31, 2010 6:05 pm

Re: Heavy calculations on nds original

Post by zeromus » Thu Sep 22, 2011 7:45 pm

the time to calculate one mandelbrot pixel may be more than the fifo synchronization time, especially if you use floating point maths. but since you havent even made your mandelbrot viewer yet on the arm9, it seems, you have no idea how fast it is running, making this a premature optimization. "putting things on screen and syncing it" will probably take a trivial amount of time compared to the computations.

at the very least, you should issue larger blocks than 1px to the arm7.

youre not going to overheat it.

mtheall
Posts: 211
Joined: Thu Feb 03, 2011 10:47 pm

Re: Heavy calculations on nds original

Post by mtheall » Thu Sep 22, 2011 8:05 pm

I think he has already run the code on NDS and has seen that it has room for improvement. I think that using -O3 and -marm are good first steps to seeing if the results are more acceptable, and I was suggesting to make optimizations to the algorithm after having tested using these options.

If it really came down to it and you really, really insisted on using the ARM7, I would send larger workloads than 1 pixel (maybe a workload of several lines, or hell, send it the whole workload you expect it to do). Additionally, I would keep in mind that the ARM9 runs at twice the clock speed as the ARM7, so it may make more sense to give 2/3 of the work to the ARM9 and 1/3 to the ARM7. Of course, this also means that you have to write two separate binaries (look at the arm7 and user fifo examples). Also keep in mind that the ARM7 code may optimize differently than the ARM9 code.

WinterMute
Site Admin
Posts: 1986
Joined: Tue Aug 09, 2005 3:21 am
Location: UK
Contact:

Re: Heavy calculations on nds original

Post by WinterMute » Thu Sep 22, 2011 11:28 pm

The first step in speeding up mandelbrot on the DS is using fixed point and possibly the hardware divider - floating point on the DS is done by software emulation so it's not really particularly fast. Building ARM code rather than thumb will help a lot for this kind of thing.

Your algorithm for having the arm7 handle calculations is likely to slow things down - working pixel by pixel and waiting for results isn't good. The algorithm would need to be parallelised so the arm7 can work on part of the screen while the arm9 handles the rest but bear in mind that the arm7 only has 96K available for code, data and stack space. This sort of thing is fairly advanced and I'd prefer to see code that works on the arm9 before going down this road.

Nothing you do in code is going to overheat or break the arm7.
Help keep devkitPro toolchains free, Donate today

Personal Blog

zeromus
Posts: 212
Joined: Wed Mar 31, 2010 6:05 pm

Re: Heavy calculations on nds original

Post by zeromus » Fri Sep 23, 2011 12:05 am

if it runs fast enough that offloading a share of it to the arm7 even makes sense as an optimization, then you're close to the goal anyway. just optimize it by a factor of 2 and youll have nailed it.

roelforg
Posts: 13
Joined: Thu Sep 22, 2011 5:47 pm

Re: Heavy calculations on nds original

Post by roelforg » Fri Sep 23, 2011 5:58 am

I don't have much time so:
I use the escapetime algorithm so it has to be done pixel by pixel
I'm already using -03
And the arm9 doesn't wait for the arm7.
I just wanna know if this can block the irq's and other stuff the arm7 does.
We all ask nooby questions every once in a while.
Nobody's perfect, we all need help every once in a while.
Our questions might sound stupid to people but then, so do their's to us.
So please, think before calling someone you don't know a noob.

WinterMute
Site Admin
Posts: 1986
Joined: Tue Aug 09, 2005 3:21 am
Location: UK
Contact:

Re: Heavy calculations on nds original

Post by WinterMute » Fri Sep 23, 2011 2:26 pm

I don't think you quite understand what we're saying here.

This procedure you outlined in your first post describes offloading *all* the calculations to the arm7 and sending co-ordinates for each pixel, leaving the arm9 to just send/receive data and plot pixels.

Code: Select all

arm9:
--point1
now at pixel (x,y).
send(x,y)->arm7
x+1
y+1
--point 2
now at pixel(x,y)
calculate(x,y)
plot x,y to screen
x+1
y+1
check for reply from arm7, yes: plot to screen and goto point 1
else: goto point 2

arm7:
--point 1
message from arm9?
yes:
 -retrieve x,y
 -calculate
 -send x,y,value
do arm7 stuff
goto point 1
In efffect what you're doing here is just adding extra steps for each pixel and performing the calculations on a slower processor. Logically this will perform at less than half the speed of what you're doing now - the arm7 is clocked at half the speed of the arm9. Plotting pixels is a miniscule part of the time taken for this code - it makes no sense to leave the arm9 almost completely idle while the arm7 does all the heavy lifting.

After looking up the escape time algorithm I don't see anything that indicates a dependence on other calculated values so there's no reason to send anything for every single pixel. It should be relatively straightforward to break this down into a procedure where each processor takes half the screen so all you really need to tell the arm7 is the start co-ordinate and let it send color data for each pixel back to the arm9. The libnds FIFO API will let you install a callback handler on the arm9 which should just plot a pixel, leaving the arm9 free to get on with calculating the colors for the other half of the screen.

I'm really not sure how much more detail I can get into without just writing the code for you which kind of defeats the object of the exercise.
Help keep devkitPro toolchains free, Donate today

Personal Blog

mtheall
Posts: 211
Joined: Thu Feb 03, 2011 10:47 pm

Re: Heavy calculations on nds original

Post by mtheall » Fri Sep 23, 2011 3:42 pm

WinterMute, I don't think you read his code correctly. In the ARM9 section: Point 1 is sent to the ARM7, and then Point 2 is calculated on the AMR9. Afterward, it checks for a response from the ARM7. Then, it plots both points.

roelforg: Just because the algorithm goes pixel-by-pixel doesn't mean you can't describe a larger job for the ARM7 to do. Instead of sending just a pixel to work on, you can send a bulk of pixels (e.g. half of the screen like WinterMute suggested) to be worked on and have the ARM7 dump the results into a buffer, which the ARM9 can later read from and plot onto the screen. Or go the FIFO callback route.

roelforg
Posts: 13
Joined: Thu Sep 22, 2011 5:47 pm

Re: Heavy calculations on nds original

Post by roelforg » Fri Sep 23, 2011 6:28 pm

Mtheall is half right,
He's right about the arm9 doing processing too,
But it doesn't wait on the arm7, no message from arm7 will just cause the arm9 to go calculating again and checking next time.

It's a slow calc anyways,
On my 3ghz quadcore pc it takes 30s to calc a 256x256 surface so i wanted to make use of the nds's second cpu's ability to be controlled seperatly.
We all ask nooby questions every once in a while.
Nobody's perfect, we all need help every once in a while.
Our questions might sound stupid to people but then, so do their's to us.
So please, think before calling someone you don't know a noob.

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests