Some more fun Akai Katana slowdown pics

As an extra footnote to the CPU Slowdown post about Akai Katana, here’s two more fun pictures, showing other stuff that can happens.

First, there are actually some rare occasions where Blitter induced slowdown can matter. This happens very rarely in Akai Katana, but sometimes when there’s a lot of draws going on, the Blitter processing takes more than a frame, and we can see the CPU repeatedly doing ready checks by pulsing CS6.

As mentioned earlier, this is very infrequent in this game, and almost all slowdown is just CPU being slow.

The other fun thing is that the CPU often does enough work that it doesn’t even have time to process it in two frames, which means several frames of slowdown. The example below shows CPU processing and waits causing two extra frames of slowdown, but more can happen too.

It really feels like this was rushed out without much thought given to performance.


CV1000 CPU Slowdown investigated

When looking at CV1000 Blitter performance, one thing I noticed was that behavior seemed to differ heavily between games, and on some games, despite not seeing long Blitter operation lists, there is heavy slowdown.

The main offender here is Akai Katana, which will have very heavy slowdown when in Spirit mode, reflecting bullets. For some PCB footage, here’s an old video I made.

This slowdown comes despite not a lot of draws happening, which indicates that it’s related to CPU. I went ahead and verified this.

When there’s no slowdown and game runs on full speed, we can see the following behavior.

For captures below, the signals from top to bottom are:

  • SH-3 BUS Clock (CKIO)
  • IRQ2 (VSYNC)
  • BREQ (Blitter requests operations)
  • CS3 (RAM CS)
  • CS4 (U2, EEPROM, Audio CS)
  • CS6 (Blitter command CS)

First here’s a non-slowdown section.

No slowdown.

When a VSYNC pulse triggers, Blitter operations kick off. These finish well ahead of the next VSYNC. When looking at CS3 pulses, we can see that these become much less frequent a while before VSYNC. This is when the CPU has finished executing game logic and is waiting for the next frame. Since logic finished before a new pulse, things are fine here.

Now lets look at that chunky Akai Katana slowdown.


The main difference here is that the CPU is actively doing work longer than a single frame, as can be seen by looking at CS3 working past the next IRQ2 pulse. This means that the CPU will ignore the interrupt, and then start spin-waiting for a new one. This causes one frame of slowdown.

So what does that mean?

For some games like Akai Katana, getting the Blitter timing accurate is not going to do anything for accuracy in emulation.

What is instead needed is work on the SH-3 emulation.

Current MAME emulation of the CPU does not support the wait states introduced by RAM accesses that either are uncached, or results in cache misses. The actual cache behavior is described in-depth in the SH-3 datasheet an should be rather simple to implement, but getting the wait-state handling in place seems pretty hard (especially with dynamic recompilation enabled).

There’s also wait states for stuff like U2, Audio, EEPROM, … accesses, but compared to RAM this doesn’t seem to be a big deal.

For some other CV1000 games, the Blitter timing does matter though, so getting both parts right is needed to get something that is truly behaving similar to the actual boards.


Research into CV1000 Blitter performance and behavior

I’ve spent some time in December looking into CV1000 Blitter behavior to figure out how it performs in terms of slowdown. I feel I have a good understanding of how it works now, and have put together a doc describing it.

View/Download it here: CV1000_Blitter_Research_by_buffi.pdf

Why do this?

The current simulation of this Blitter in MAME is quite impressive as a high-level reproduction, but there doesn’t seem to have been much time spent researching the timing of operations.

This document aims to document how the behavior and timing of the Blitter actually works, and people can utilize this to make something that’s mostly accurate.

Also it is very fun to attach a Logic Analyzer to a PCB and figuring out how it works.

Preemptively Answered Questions

Q: But what about tuning Blitter Delay in MAME
A: Trying to tune the existing Blitter Delay slider in MAME doesn’t really make any sense, since the slowdown introduced from it doesn’t have anything to do with how it works on real hardware. It’s still arguably better than no slowdown at all, which used to be the other option, but that’s about it.

Q: Will this make CV1000 emulation run with proper slowdown?
A: Probably not really. While this should make it possible to have the Blitter part of emulation more accurate, there’s still no emulation of SH-3 Wait States either, which means that slowdown that’s due to CPU not having time to finish processing before VBLANK due to waiting will still not be accurate. I have no idea how much this matters for most games.’

Q: How much work is it to implement this?
A: It should be very simple. And the simplest thing to do would be:

  • Rip out all the existing Blitter delay logic.
  • When sending a Command to start Blitter Operations, estimate the time they will take to compute.
  • Don’t return “Ready” for the Ready Requests until that time has passed.

This still doesn’t reflect how it’s performs on real hardware (where Operations are running concurrently with the CPU, and requesting new Operations when the existing ones are done executing), but in practice I don’t think that should really matter in terms of experienced gameplay performance.


If you have feedback on the document, or suggestions for further work, please reach out to me on Arcade-Project forums, Github or in comments on this blog.