When looking at CV1000 Blitter performance, one thing I noticed was that behavior seemed to differ heavily between games, and on some games, despite not seeing long Blitter operation lists, there is heavy slowdown.
The main offender here is Akai Katana, which will have very heavy slowdown when in Spirit mode, reflecting bullets. For some PCB footage, here’s an old video I made.
This slowdown comes despite not a lot of draws happening, which indicates that it’s related to CPU. I went ahead and verified this.
When there’s no slowdown and game runs on full speed, we can see the following behavior.
For captures below, the signals from top to bottom are:
- SH-3 BUS Clock (CKIO)
- IRQ2 (VSYNC)
- BREQ (Blitter requests operations)
- CS3 (RAM CS)
- CS4 (U2, EEPROM, Audio CS)
- CS6 (Blitter command CS)
- VRAM CS
First here’s a non-slowdown section.
When a VSYNC pulse triggers, Blitter operations kick off. These finish well ahead of the next VSYNC. When looking at CS3 pulses, we can see that these become much less frequent a while before VSYNC. This is when the CPU has finished executing game logic and is waiting for the next frame. Since logic finished before a new pulse, things are fine here.
Now lets look at that chunky Akai Katana slowdown.
The main difference here is that the CPU is actively doing work longer than a single frame, as can be seen by looking at CS3 working past the next IRQ2 pulse. This means that the CPU will ignore the interrupt, and then start spin-waiting for a new one. This causes one frame of slowdown.
So what does that mean?
For some games like Akai Katana, getting the Blitter timing accurate is not going to do anything for accuracy in emulation.
What is instead needed is work on the SH-3 emulation.
Current MAME emulation of the CPU does not support the wait states introduced by RAM accesses that either are uncached, or results in cache misses. The actual cache behavior is described in-depth in the SH-3 datasheet an should be rather simple to implement, but getting the wait-state handling in place seems pretty hard (especially with dynamic recompilation enabled).
There’s also wait states for stuff like U2, Audio, EEPROM, … accesses, but compared to RAM this doesn’t seem to be a big deal.
For some other CV1000 games, the Blitter timing does matter though, so getting both parts right is needed to get something that is truly behaving similar to the actual boards.