Bought a fancy Tigard JTAG adapter instead of the USB Blaster clone I’ve been using.
Need to figure out how to bump performance further, but even at default low speeds in urjtag, it seems like a good improvement.
When I converted this Mushi to Pink Sweets Suicide Club, I fucked up a bit and used bad blocks on U2, which is causing minor sprite glitches now. Going to see if I can fix it easily with JTAG so I don’t have to swap the IC again.
Disclaimer: This is mostly written for myself. It will be hard to follow.Maybe someone will find it interesting anyways.
Following up on the earlier posts on CV1000 research, I’ve been looking at making a patch for MAME implementing this, but got a nice report from a tester that the current implementation I did had weird amounts of delay after the second midboss in Pink Sweets. This was mostly due to a silly mistake in my code, which was good to find, but when looking at the actual delays from a PCB running Pink Sweets, I still seemed to be off by quite a bit.
This caused me to spend another week or so figuring out what was up, and I first made some sortof interesting observations regarding clipping:
Sprites that would be drawn entirely outside of the visible area when clipping is enabled will not cause any memory copies in the Blitter (this I had already expected, from looking at the operations of Espgaluda 2)
Since the Blitter still needs to read these operations into its operations FIFO, this means that if many “invisible draws” are read in a sequence, the Blitter will be idle, causing delay.
Draws that has some visible section, but cross the boundaries of the visible area will happily write outside of the visible area.
Additionally, I realized that the simplified calculations described in my PDF writeup was faulty. While the general thinking and memory layout was described correctly, write alignments may cause also cause additional delays. This will be described further below as well.
Sprites fully drawn outside of the visible area
Pink Sweets uses and interesting drawing method in sections with scrolling backgrounds featuring “wavy patterns”, where it sets up parts of the background in a separate area in VRAM before copying it to the visible buffer. These copies are done as many 1×324 pixel draws, with offsets varying by up to 4 pixels to generate the waves.
These draws will also be performed outside of the visible clipping area, and by instrumenting MAME I could see these type of fully invisible draws happening 80 times per frame in a single sequence.
This will look pretty interesting in a logic analyzer attached to a PCB. In the image below, the “BACK” pulses on the bottom row is the Blitter reading operations from SRAM. Each pulse will read 64 bytes of data, which means that a total of 1600 bytes are read. Since each Draw operation is 20 bytes, this maps exactly to the 80 “invisible” Operations.
This means two things in terms of delay simulation:
Draws fully outside of the visible area should not be calculated.
… but if a Main RAM access contains nothing to draw, the bus access timewill need to be added to calculations (in this case about 17.5us after subtracting the Horizontal line read) since the Blitter will still be considered busy when waiting for things to Draw.
Sprites partially drawn outside of the visible area
The way the waves in the Pink Sweets background is generated is by modifying the start offset in a curve with amplitude 4. Most of these draws will start slightly below the visible area.
As an example, a draw may be (note that X/Y are rotated, due to TATE):
This means that the write to (X=414,Y=128) and (X=415,Y=128) will not be in the visible area. Looking at a logic analyzer output of this write, it is however still written. The image below has some glitches in the signals, but is annotated for clarity.
1: Read 32 pixels of source data, nicely aligned in same VRAM row. 2: Read 2 pixels of destination for the VRAM row of the invisible data. 3: Write the 2 invisible pixels to that VRAM row. 4: Read remaining 30 pixels of destination data 5: Write the 30 pixels of destination data. (In practice, this will actually read 4 and 32 pixels instead, since all operations work at 4 pixels per VRAM CLK, but that doesnt matter much).
Since each additional VRAM row access has significant overhead, this means that there’s some wasted work going on here… and it gets worse!
Turns out my earlier simplified calculations were wrong. Oops.
In my large writeup earlier, I had the following calculation for Draw operations.
This has two major issues. First it assumes that the number of pixels being read from source data, and the number of pixels read+written to destination are equal. As described in the section “Writes are always done four pixels at the time, to offsets evenly divisible by four“, this is not true. There can be four pixels of overhead per line, depending on alignment.
Secondly, this assumes that each VRAM row written to, will only be written to once. Depending on alignment, this may also be false. If drawing a sprite with X_SIZE=64, Y_SIZE=1 to position X=16,Y=0, the following sequence will happen:
32 bytes are read from source
16 bytes are read and written to VRAM=0
16 bytes are read and written to VRAM=1
32 bytes are read from source
16 bytes are read and written to VRAM=1
16 bytes are read and written to VRAM=2
Every read+write to a destination VRAM row will add 35 CLK of overhead, so even if the number of rows written to here are just 3, the total access overhead will be for 4 VRAM row accesses. The actual calculation should be something like:
Calculating the number of VRAM accesses can be done iteratively, but there’s probably a nice formula for it as well, that I’m too tired to figure out right now.
The difference alignments can make…
Finally to show how big the difference can be in Draw latency depending on how the destination is aligned in VRAM rows, both of the picture above show the first 32 bytes of data being written for a 324×1 pixel part of the background. Image one is aligned will with destination VRAM rows, while the second image is off by 2 pixels.
As an extra footnote to the CPU Slowdown post about Akai Katana, here’s two more fun pictures, showing other stuff that can happens.
First, there are actually some rare occasions where Blitter induced slowdown can matter. This happens very rarely in Akai Katana, but sometimes when there’s a lot of draws going on, the Blitter processing takes more than a frame, and we can see the CPU repeatedly doing ready checks by pulsing CS6.
As mentioned earlier, this is very infrequent in this game, and almost all slowdown is just CPU being slow.
The other fun thing is that the CPU often does enough work that it doesn’t even have time to process it in two frames, which means several frames of slowdown. The example below shows CPU processing and waits causing two extra frames of slowdown, but more can happen too.
It really feels like this was rushed out without much thought given to performance.
When looking at CV1000 Blitter performance, one thing I noticed was that behavior seemed to differ heavily between games, and on some games, despite not seeing long Blitter operation lists, there is heavy slowdown.
The main offender here is Akai Katana, which will have very heavy slowdown when in Spirit mode, reflecting bullets. For some PCB footage, here’s an old video I made.
This slowdown comes despite not a lot of draws happening, which indicates that it’s related to CPU. I went ahead and verified this.
When there’s no slowdown and game runs on full speed, we can see the following behavior.
For captures below, the signals from top to bottom are:
SH-3 BUS Clock (CKIO)
BREQ (Blitter requests operations)
CS3 (RAM CS)
CS4 (U2, EEPROM, Audio CS)
CS6 (Blitter command CS)
First here’s a non-slowdown section.
When a VSYNC pulse triggers, Blitter operations kick off. These finish well ahead of the next VSYNC. When looking at CS3 pulses, we can see that these become much less frequent a while before VSYNC. This is when the CPU has finished executing game logic and is waiting for the next frame. Since logic finished before a new pulse, things are fine here.
Now lets look at that chunky Akai Katana slowdown.
The main difference here is that the CPU is actively doing work longer than a single frame, as can be seen by looking at CS3 working past the next IRQ2 pulse. This means that the CPU will ignore the interrupt, and then start spin-waiting for a new one. This causes one frame of slowdown.
So what does that mean?
For some games like Akai Katana, getting the Blitter timing accurate is not going to do anything for accuracy in emulation.
What is instead needed is work on the SH-3 emulation.
Current MAME emulation of the CPU does not support the wait states introduced by RAM accesses that either are uncached, or results in cache misses. The actual cache behavior is described in-depth in the SH-3 datasheet an should be rather simple to implement, but getting the wait-state handling in place seems pretty hard (especially with dynamic recompilation enabled).
There’s also wait states for stuff like U2, Audio, EEPROM, … accesses, but compared to RAM this doesn’t seem to be a big deal.
For some other CV1000 games, the Blitter timing does matter though, so getting both parts right is needed to get something that is truly behaving similar to the actual boards.
I’ve spent some time in December looking into CV1000 Blitter behavior to figure out how it performs in terms of slowdown. I feel I have a good understanding of how it works now, and have put together a doc describing it.
The current simulation of this Blitter in MAME is quite impressive as a high-level reproduction, but there doesn’t seem to have been much time spent researching the timing of operations.
This document aims to document how the behavior and timing of the Blitter actually works, and people can utilize this to make something that’s mostly accurate.
Also it is very fun to attach a Logic Analyzer to a PCB and figuring out how it works.
Preemptively Answered Questions
Q: But what about tuning Blitter Delay in MAME A: Trying to tune the existing Blitter Delay slider in MAME doesn’t really make any sense, since the slowdown introduced from it doesn’t have anything to do with how it works on real hardware. It’s still arguably better than no slowdown at all, which used to be the other option, but that’s about it.
Q: Will this make CV1000 emulation run with proper slowdown? A: Probably not really. While this should make it possible to have the Blitter part of emulation more accurate, there’s still no emulation of SH-3 Wait States either, which means that slowdown that’s due to CPU not having time to finish processing before VBLANK due to waiting will still not be accurate. I have no idea how much this matters for most games.’
Q: How much work is it to implement this? A: It should be very simple. And the simplest thing to do would be:
Rip out all the existing Blitter delay logic.
When sending a Command to start Blitter Operations, estimate the time they will take to compute.
Don’t return “Ready” for the Ready Requests until that time has passed.
This still doesn’t reflect how it’s performs on real hardware (where Operations are running concurrently with the CPU, and requesting new Operations when the existing ones are done executing), but in practice I don’t think that should really matter in terms of experienced gameplay performance.
If you have feedback on the document, or suggestions for further work, please reach out to me on Arcade-Project forums, Github or in comments on this blog.
Some Cave game difficulty rankings have been going around the Twitterverse recently, so thought I’d post my opinions and list, with some motivations behind the entries.
This is how I’d rank the difficulty of most Cave games in one credit. Videos of me clearing them are attached for all titles.
This does not cover the more difficult modes of the games, so no Mushi Ultra, SDOJ EX or similar.
This is for 1-ALL for multi-loop games
Entries are grouped into five difficulty groups.
Entires within groups are pretty similar in difficulty, but somewhat ordered by hardest (top) to easiest (bottom). This is much more vague than the overall grouping though, since the difficulty will be similar-ish.
Some comments why they are placed where they are is attached
Dangun Feveron (B-Roll or C-Roll) Very hectic game that’s easy to mess up in. Don’t go for TLB, and time out last boss (it takes a while). Still hard.
Dodonpachi Saidaioujou (A-Shot) Overall high difficulty. Last stage is really hard, and its best to try and save bombs for it. Ideally no-miss to S4 midboss extend, which requires pretty strict routing.
Ketsui (Type A) You get a lot of extends and bombs in this, but even considering this, a lot of routing is needed for the last three stages. The bosses are quite hard. The lock-on system rewards quick lock ons for survival which means strict routing.
Ibara (Bond Type C) The harder bosses can be skipped by saving up Hadou charges, but you still need to deal with the last two stages, which are hard unless you manage the rank . This means intentionally dieing a lot, and correctly. Requires a good amount of routing, and Garregga-esque rank management.
Dodonpachi Saidaioujou Exa Label (D-Shot) Overall easier than regular SDOJ if keeping rank low, but still hard. Some boss/midboss patterns are significantly easier.
Guwange (Any character, Gensuke is easiest) Most of the difficulty comes after full life refill in last stage. Getting to it requires some work, but is easier than the rest of the game. Later part of last stage is hard to no-miss, and you want to have some resources for last boss. Final boss pattern is real random and can mess you up unless you have some life to tank it.
Guwange Blue Label (Any character, Gensuke is easiest) Overall easier stages due to bullet slowdown on Shikigami, but some patterns are harder (especially phase 1 of final boss). About the same difficulty to clear.
ESP Ra.De. (Yusuke or J.B.) Last two stages require good routing. S4 boss is a little hard. Final boss has a lot of phases and is tricky.
Ibara Kuro (Dio Type D) No need to control rank like in regular Ibara. Some easy graze safespots for setting up score for extends. Still need to deal with stage parts of last two stages, but still an easier clear than vanilla.
Mushihimesama Futari 1.0 Original (N Palm) Plays similar to 1.5, but Ab Palm is garbage, and N Palm is a better option. S4 boss has a pattern with broken safespots that needs bombing. Last boss is also harder. Overall a much harder clear.
Akai Katana Exa Label (Type C) If keeping rank low, the patterns aren’t too bad. Just like in Shin, you can’t just reflect bullets on hard patterns, so you need to learn them, and bomb the trickier parts. Type C’s Spirit attack is extremely powerful and can melt hard boss phases like S6 boss phase 2.
Akai Katana Slash/Shin (Type B) Much harder than regular Akai Katana, due to Spirit form not reflecting bullets, and additional stage. Still not too bad. Worst patterns can be bombed, and the sword attack does good damage. Type B is real strong in this.
Progear no Arashi (Bolt + Nail) Recommend going for 11M extend. That gives a good amount of resources, which helps with the clear. Not much to say otherwise. Medium difficulty patterns, where the hardest can be bombs.
Espgaluda 2 (Asagi survival route) Tateha/Ageha is very hard, but Asagi can “cheat” due to her Kakusei being so strong. Save gems for bosses, then just melt them. Allows basically skipping last boss. Still needs routing and resource control though.
Muchi Muchi Pork (Any character, but Rafute is easiest) Some hard patterns, but you get 10+ lives pretty easily, and the bomb is strong, and can be used often. This allows for many mistakes while still clearing.
Pink Sweets (Kasumi B, inifinite lives glitch) Without infinite lives, this game is extremely hard. Setting up infinite lives require strict routing and careful play, especially on S3 midboss. I’d argue its similar in difficulty to many other clears.
Dodonpachi DOJ Black Label (B-L) Needs some routing, but you get enough resources in terms of hyper and bombs to skip a lot of the harder parts. Some early game scoring is sufficient to hit all score extends.
Picked up a Batsugun PCB, as partial trade for my Ibara Kuro. Looking forward to trying this out. My cab is currently in horizontal mode, so will have to rotate it first though. Might play some more horisontal games for now first.
One of my goals for this year is to sell games for more money then I spend, which means getting rid of some of the pricier items I own. This includes my Ibara Kuro PCB that I bought last year. This has gotten so expensive now that it’s hard to motivate keeping it, and I’ve finally sold it and shipped it away to its new owner today.
Ibara Black Label (also known as Ibara Kuro) is a remake of Ibara that plays nothing like the original game. Instead of playing like a typical Yagawa shmup, you have a dynamic rank system which increases with medal pickups, and resets on bombs and big cancels. In addition, there’s a multiplier that increments when grazing enemy bullets, which is not a system you’d typically see in Cave games.
Since the PCB is very rare and there’s so far no ports of this game on any non-arcade platform, the price of a PCB has shot up by a lot. While it’s a cool and unique game, I don’t think I’ll play it too much more (I prefer regular Ibara), and I’d rather spend that money elsewhere.
1CC Video and basic strategy
For character selection, I like Bond Type D for his speed and bomb. I’m not sure how much of a difference it makes in practice.
Stage 1: My execution of the initial scoring is pretty flawed, but start by setting up a Hadou Gun, and build a medal chain to increase your rank. Then later on, move to the top left corner and graze the big stream of bullets, to build a big counter for a cancel, to get some points towards an extend. There’s another nice stream of bullets you can setup later on as well (watch the video). No specific strats for the boss. Don’t be afraid to bomb. No missing this stage shouldn’t be a huge problem
Stage 2: I tend to use 5 Way for most sections with small enemies, and rockets for the larger ones. I don’t do any impressive scoring at all on this stage, but just focus on keeping the medal chain alive. The trickiest section to route is the part before the two flame towers. I recommend just replicating the video for that. On the boss, start by staying above the first phase and taking it out that way, similar to regular Ibara. For phase two, follow a Hadou Gun up the right side of the screen to build rank from medals, and then park yourself in the safe spot right below the health bar. This will allow milking a lot of points towards an extend.
Stage 3: Rockets feel quite strong here. Try to keep medal chain going and start getting a lot of bombs. I’ll use a hadou gun or two towards the later trains, but like to reach the boss with full bomb meter. For the boss, I’ll hadou it once when it gets to unmanageable, and that should be enough.
Stage 4: For the extend ship, I start by damaging it’s right side with a hadou gun activation that I fire on the left side. Then I can sit and safe spot it above the right bullets while building multiplier for some free points towards extend. 5 Way is very strong in this stage, and should basically be used at all times until the big end ships which Napalm work better against. Try to be at max bombs for the boss. The boss itself is pretty easy in phase 1, but will likely require a hadou gun for it’s second phase which has very dense patterns.
Stage 5: For the first section, small movements with rockets are kep. You can use the same trick with double hadou guns to get full bomb pickups from the tanks as in big Ibara. Just place them in a way so that they’ll destroy both sides of the tanks tracks, without hitting the tanks themselves. 5 Way is very strong at the later parts of the stage since there’s so many small enemies. For the boss, I quick kill phase 1 by placing a hadou gun shot at the edge of it’s sprites. It then has some pretty silly safe spots for phase 2, which trivializes that part of the fight (see video). You can milk this quite a bit if you want. The last phase is total bullshit and will need two hadou gun shots.
Stage 6: 5 Way is once again very strong. I trigger a pretty late third extend here in my video. Ideally I’d have it earlier, but three extends is what I’d typically end up with on my route. For the boss, I just rely on my hadou gun shots for anything that looks scary. In my clear, I get really really awkward hadou gun shots, since hitting the enemies that the boss spawns will do very little damage. This causes me to have to do some pretty silly dodges. If you can instead tag the boss in its last phase with two of them, it should be enough.
Pink Sweets is easily one of Caves hardest games, but on PCB there’s an infinite lives glitch that can be triggered, which trivializes the rest of the game.
This requires getting 4 extends without dieing, which is not much easier than a typical Cave 1CC. I got this a while ago, but finally took the time to record a quick commentary on how to trigger the glitch, if others are curious.
Extends spawn when you destroy 2500 enemies or destructible bullets, and then kill an enemy. This means you want to try to destroy as many destructible bullets as possible, especially on midbosses and bosses. This requires pretty careful planning.
On a good Stage 1 run, you should be able to get the zan counter to about 2000, triggering the first extend early on stage 2. If you trigger the first extend later in Stage 2, that’s not a huge problem since you can make up some on the Stage 2 boss, where I don’t care much about going for zan in this run.
S3 midboss is the make or break section of the run, since it’s very easy to mess it up, and it will sometimes give you random movements which makes it hard to trigger the deaths of side capsules when you want it.
For MAME practice, you can use the following lua script, to display the zan counter when playing, which simplifies routing. This will obviously not be helpful when playing for real on a PCB later as in the video above.
// Put this in pinkswts.lua
cpu = manager:machine().devices[":maincpu"]
mem = cpu.spaces["program"]
s = manager:machine().screens[":screen"]
cnt = string.format("CNT : %d", mem:read_i16(0x0c4a3ab6));
s:draw_text(225, 3, cnt);
// And run game with mame64.exe pinkswts -autoboot_script pinkswts.lua
For completitions sake, here is my full 1CC using the glitch, but the rest of the run isn’t very exciting.
The game itself is pretty interesting. It plays sortof like Yagawa’s earlier games, but is much less forgiving, and rank control isn’t as doable since you no longer get point extends.
Going for a non-infinite lives 1CC is very hard, and seems easiest by completely skipping item pickups and playing very carefully. It seems unlikely that I’ll get back to doing that, since this isn’t really one of my favorite games.
Other than playing arcade games, I also enjoy messing around with the hardware, and I’ve finished up a few small CV1000 related projects recently.
U13 CPLD Replacement
I reverse engineered the behavior of the U13 CPLD, and wrote a compatible bitstream, that can be programmed to EPM7032 CPLDs. This allows repairing boards where the internal flash has gone bad. An indepth description of this is available at the project page: https://github.com/buffis/cv1k_research/tree/main/U13_Research
Ideally, it would be better to use the original bitstream of the CPLD, but it’s read protected, and no public dumps are available, so this is the second best thing.
JTAG of CV1000 PCBs
I figured out how to use the JTAG port of the PCB’s to easily read/write the prog rom (U4) as well as the EEPROM. U2 reads/writes are also possible, but not recommended since they require some pretty sketchy bitbanging.
This allows for very simple dumping of U4, as well as upgrading to bugfix releases.