There are several approaches to figuring out encryption / compression schemes, all involve deep analysis of the data before and after the decoding step.
Unless the system is very simple you won't get far by just looking at the (usually very few) patterns available. In fact it's pretty easy to come up with some hypothesises that will turn out to be false later and will derail the whole research, taking up days to weeks to disprove. This is what accounts for most of the time, and all the failures tend to be quite discouraging. This is the pure analytical approach. Andreas Naive does statistical analysis, it's easier in the sense that it gives you some hints and directions of what to try. However, to even attempt that you need to have the knowledge, tools and experience - things that I lack :) You also need to have plenty of test data if the results are to have any meaning.
We got a bit lucky with M2/M3 protection. I still have no idea how Andreas managed to identify the encryption so fast and reverse it (though if I was to hazard a guess then I suppose the work he did on Atomiswave carts helped a bit - it's not the same system but bears resemblance). I explained what I know about NAOMI to him and then we had some brainstorming (which was mostly him shooting down my silly suggestions), and eventually came up with some ideas on how to generate the test patterns he needed. I modified my dumper tools to output the data.
I should explain that M3 turned out to be just a variant of M2 that uses cart's own RAM buffer. This is also the major weakness of this system that we exploited. See, it's easy to say "generate test data", but how exactly do you present your own patterns to the decoder if all it can do is access read-only memory? Well, one way would be to inject the pattern into EPROM chip that serves as game program carrier. It's possible but you're limited by available space (up to some 4MBs), while erasing and reprogramming takes a lot of time. Not to mention EPROMs have rather limited number of erase cycles they can go through (about 100) before errors start to show up. Obviously you'll also need all the necessary equipment to do the job in the first place :)
Another problem was that M2/M3 is actually position-dependant. In other words, the same value at address A and A+n will decode to different result. Data is decrypted in 16-bit words, so to fully test just one address location I'd have to reprogram the EPROM over 65 thousand times. Not really my idea of fun.
But, as I said, there is a RAM buffer on the cart. We figured it will help us crack M3 and then we'll worry about M2. Having the ability to actually upload data to the cart and read back the result reduced the hardware testing times to mere minutes. Obviously Andreas needed very specific patterns but that was just a few changes in the program code on my side. This also allowed the tests to be run on pretty much every cart out there (not just those in my possesion) without resorting to any hardware modifications. It's thanks to this feature that we were able to crack the encryption keys on most games so quickly.
This was M2/M3. M1 is completly different system. The complexity of the encryption is nowhere near that of M2/M3 so it's much weaker from cryptographic point of view - but it took much more work and many more months of head scratching to figure it out. Kinda funny, now that I think about it, that I identified and named the methods used on NAOMI carts as M1 through M3, and it was M3 to go down first and M1 last. Well, we have M4 now too :)
Anyway, M1 didn't work the same way M2/M3 did. I had only one cart, with only one encrypted sequence on it, so not exactly a lot of data to work on. Trying to "decrypt" other ROM areas only produced confusing results. What's worse, trying to decrypt data from the EPROM area gave completly different and incoherent results. The only way to inject custom patterns into decoder was to spoof a ROM chip.
Soon it became obvious why EPROM approach doesn't work - on this cart type ROMs are paired and read two at once, to produce 2*16 = 32 bits of data for each address location. EPROM output is only 16 bits wide - and while cart compensates for that during normal data transfers, decryption engine just treats the other 16 bits as pulled up. Thus completly changing any patterns we'd want to test. I had to create a device that would make up for the lack of any writable memory on the cart. That's where Altera Cyclone II FPGA comes in again :)
Just to sum this up, I needed 32 bits for data, tristatable and preferably bidirectional for logic analyzer. Add at least 16 bits of address to that, and some additional control signals (chip selects mostly). Tons of soldering. To add insult to injury the ROMs are 5V and FPGA inputs can handle 3V3 tops (some people abuse the protection resistors and claim 5V tolerance but it's a bit too pricey chip for me to fry just because I'm lazy). I had to add additional overvoltage protection circuitry, that would still allow me to have the bus bidirectional. The final result was this wire mess:
It allowed me to to test a few things and establish a working theory or two. I figured that the data is clearly XORed with some unknown pattern, to obfuscate what looked like a dictionary-based compression system. It's not like I didn't do any testing at all without the FPGA, but only now I could clearly verify my hypothesises. My initial guess was that the XOR mask is rather short, up to 8 bytes, perhaps different length on every cart using M1. I also suspected that some of that pattern actually "leaks out" to the ouput when you try to decode streams full of zeroes or ones. That later turned out be true but quite frankly it's not predictable enough to make use of this weakness. I started collecting M1 data from people who dumped carts with my tools and had some more stuff to look at. I also got GG2K cart from Roberto Malone, which he bought specifically to aid M1 cracking process. At some point I ecountered an output sequence that did not look cyclic - this scared me somewhat, for it was now possible that the XOR mask is generated on-the-fly by decoder with some pseudo-random sequencer. Now this would be a real show-stopper. I guess I need to thank SEGA for not doing that :)
Still, figuring out the mask sitting on top of compression was a very tedious task. I didn't know much about the compression itself at the time, and fighting on two fronts is never easy. Also, after M2/M3 success I was trying to think strategically - if I could simplify the discovery method than I could use other people and their carts to aid me in my research. So far I'm the only one who actively worked on NAOMI anti-piracy protection methods. Well, perhaps some other guys tried too, but kept it secret since their objectives were more... money related let's call it :) In the end I had a bright idea that worked out pretty nicely. Take a look at this photo:
What you see here is Actel FPGA (not to be confused with the one I was using) that does everything on the cart: addressing, reading and buffering, DMA and also M1 decryption. All that inside one, small black box. But it does have a weakness, namely it lacks necessary amount of memory for all that functionality. So, there are two smaller chips below it: On the left there's a 9-bit asynchronous FIFO queue, 256 entries long, that serves as ouput buffer for DMA mode. 9th bit is unused, and rather than have two of those chips (external cart data bus is 16-bit wide) it stores words as byte pairs. It's fast enough to do that, it's latency is less than 20ns. On the right there is a SRAM chip. It's pretty fast too, just 15ns, and has 32k x 8 capacity, though closer inspection will reveal that only A0-A6 address lines are used. That limits it to 128 usable cells, and once you know that the compression dictionary can have only up to 111 entries... rings any bells yet?
The bright idea was to tap into the address lines of the SRAM chip to monitor dictionary usage. This would greatly help with the compression part so that I could focus more on XOR mask. Since it's just 7 address lines, 8 data ones and two control signals, I decided to wire it up completly. It helps this chip is 3V3 so no need for the overvoltage protection, the transient snubbing resistors on my FGPA board will suffice. Imagine my surprise when I started up logic analyzer project and saw that the data stored in SRAM is already de-XORed... What a find :) It took just few minutes and a piece of paper to figure out the complete XOR mask for Oh! My Goddess cart and learn it's only 4-byte long. A bit more time passed and a prototype decoder was ready and working PC-side.
So I went ahead and soldered some wires to GG2K cart as well. This is pretty clean compared to OMG, wouldn't you say. And some screenshots from SRAM usage monitor for two different input patterns:
This time the XOR mask was also easy to calculate but then the decoder failed to produce expected result. It only got two first bytes (one output word) right. A moment later I discovered that there are two modes of operation, one direct and one based on substraction. OMG uses substraction mode and only dictionary data (and not even all of it), while GG2K uses direct mode and has some data bytes stored in the compression command stream. A few fixes and the code was working in all cases. Mystery solved. Time elapsed... eh, let's say months. Obviously not spent entierly on this problem but I kept poking it and thinking about it all the time.
It's possible that M1 would be broken sooner if I had GG2K instead of OMG to work on - but that is just another theory, one that is never going to be verified :) Fact is I was extremly lucky to get just 3 carts and happen to hit all protection methods (M1 for OMG, M2 for DOA2 and M3 for CVSSNK).
No NAOMI carts were harmed during research or making of this document.
And a small post scriptum, yes, I know that MVSC2 does not work well on T12/7 for some people. I even know why. I did say stick to T12/5 in case of problems. I also confirm that second player input will not work since I forgot to add one line of code. Happens. Stop bugging me about it.