They see me rollin'...

Decision has been made, there will be a short production run of GDEMU soon - as in, a few weeks. After that, when I can tell just how many people are actually going to pay for it, I will decide if I will continue or not. More details will follow once I get everything sorted out.

In other news, my testers notified me of some recently done Dreamcast dumps that appear to be broken on GDEMU, so I investigated.
Seems like there is a group, or groups, that started doing GD-ROM dumps using PC drives with a so-called "swap method". I did that too once, other than having slightly different "audio offsets" there is nothing preventing such dumps from working - if done correctly. And that's the key word here. Well, the group(s) in question decided they know better and introduced changes into tracks' positions and lengths - to "fix" things I guess. The worst part is they've just shoved all their changes into .gdi files and typical user has no way of knowing if it's a proper dump or not.

GD-ROM images made using GD drive in Dreamcast are not perfect either, but frankly no CD image format is, or will ever be, unless we store raw EFM stream (assuming we can even read it glitch-free, which is debatable). But small flaws can be corrected if the method is good in principle and the changes are always consistent. For GDIs it means you need to adjust for missing pregaps (not really needed) on data tracks and starting address on audio tracks. My guess is the new dumps were supposed to overcome that, but instead are plain broken.

As an example let's take a look at Chu-Chu Rocket, the PAL version. The new .gdi file says this for tracks 3 and 4:

3 45000 4 2352 "ChuChu Rocket! (Europe) (En,Ja,Fr,De,Es) (Track 03).bin" 0
4 355602 0 2352 "ChuChu Rocket! (Europe) (En,Ja,Fr,De,Es) (Track 04).bin" 0

Track 3 file is 730535904 bytes long, that is 310602 raw sectors. Since it's missing the pregap I mentioned, the actual length of the track is 150+310602=310752.
Let's add this length to it's starting address, to see where it ends: 45000+310752=355752. See the problem? Track 4 starts at 355602, which is less than 355752. This creates an impossible situation where tracks overlap.

But wait, what if we treated this new image just like the old dumps and actually did apply the fix for audio tracks as usual? Then it'd be 355602 plus 150 fix which equals 355752 and it checks out. Yay! Maybe this is the correct way after all, let's investigate further:

18 505295 0 2352 "ChuChu Rocket! (Europe) (En,Ja,Fr,De,Es) (Track 18).bin" 0
19 505819 4 2352 "ChuChu Rocket! (Europe) (En,Ja,Fr,De,Es) (Track 19).bin" 0

Track 18 is audio, 1232448 bytes long, which makes it 524 "sectors". If we add the fix, then it starts at 505295+150=505445, and ends at 505445+524=505969. Oops. Track 19 started already, at 505819... Now track 18 end overlaps with track 19 pregap.

The funny thing is, the situation with track 18 overlapping track 19 will kinda work still, since no code ever tries to acces the pregap, so the collision will never be discovered. But this tells you a lot about quality of this dump. Can it be fixed? Maybe. Question is, why break and fix it, rather than just use a known good dump.

I rest my case.

The Last of the Prototypes

Well, actually this is a test model. Though it was born as v4 prototype, it ended up working so well that I've bumped its status - kinda like with Evangelions #00 and #01 :)

2014-02-22 GDEMU_v4_1

And these two photos are courtesy of one of the testers:

2014-02-22 GDEMU_v4_2

2014-02-22 GDEMU_v4_3

So, in short: Does it work? It does, no problems so far. With proper CD Audio support and all that . Will I be selling it? Probably not because it's hand made and bloody expensive. Then again, I could use some extra money so a limited run is possible.

Also, sorry for not responding to comments and private messages but since I've first shown GDEMU PCBs I've been spammed with tons of stupid questions. I have neither time nor patience to answer those, so basically I now ignore the inbox.

Condition red

Spot differences between two pictures:

The first one was taken on system with Catalyst 13.1 WHQL and the second one on 13.2 beta. Day wasted on fixing this particular "bug". And before someone points out "This is what you get for using beta drivers" - there is a reason I need to be on 13.2 or newer.
  • Current Music
    Progressive rock
  • Tags

A little help

Dave Murphy from is trying to raise some money for his project. Please go visit this web page and see if you can help.

devkitPro is a great dev platform for all Nintendo DS homebrew, and also other consoles that I'm not really that familiar with: GBA, GP32, PSP, GC, Wii. It's a very nice set of free tools and libraries for Windows OS, and we should support it because there aren't really all that many.

In fact I'm using it myself - for ARM-related stuff, including GDEMU project. The VMS for NDS was built using devkitPro for example. There is some work being done to have SH4 family support in there as well, so that should also be interesting to Dreamcast homebrew community.
  • Current Music
    Blackmore's Night
  • Tags

GPU computing

Thanks to an article on SemiAccurate I learned about new AMD gadget called, wait for it, Gizmo. You can see it here.
Gizmo was most likely inspired by the success of Raspberry Pi as dev boards like this existed before but were never this cheap, or even available to general public. Let's compare it to RPi and Intel products then :P

- It's a PC (even runs Windows 7 since that's how AMD guys measured the silicon temperatures under stress).

First, it needs 3V "button" lithium battery, which is mandatory but apparently not part of the kit. In fact it has to be a battery with wires and a small plug, like in laptops, so forget about buying it in TESCO. Tsk.
Then you'll need a SATA hard drive, or SSD, so again forget about cheap SD cards. I suppose a CF card with IDE-to-SATA interface might do the trick if you don't need performance.
Lastly it will obviously need more power than a USB phone charger can provide, much more. The good news is it will accept anything from 9 to 24 volts, so it can be run on 12V lead-acid (car) battery for example.

So, compared to RPi it's not really that great for small projects. It's on par with some of the Intel N-series Atom ITX boards, like D945GSEJT or DN2800MT. Its form factor places it somewhere between RPi and ITX.

- It needs cooling (unless used in a lab environment).

While the board is all passive-cooled it's clearly stated in the docs that this is just enough for 25C ambient temperature and only without a case. If you want to put it in a case or use at higher temperatures you'll need to add a fan to the CPU radiator. There is a fan connector on the PCB for that purpose, though I would've like bigger heatsinks.
The CPU itself is rated at 6.4W but there's also the companion chip (the "south bridge") to consider. The VRM section is also going to generate some heat but I assume it can deal with it in most situations.

Again, a win for RPi and possibly also for the two Atom boards I mentioned since these will work in a case if there is enough convection present. I've seen fanless cases for these Atoms boards so it can be done. Obviously though it depends a lot on where that case will be put :) It might work in an air-conditioned room but not otherwise in summer heat. It's not black and white here.

- It's an APU.

And now we're talking. It's not that much smaller than ITX board and possibly runs hotter so does it have any good sides to it? Yup, the computing power available.
It's a dual-core fully out-of-order AMD64 architecture CPU clocked at 1GHz. That might not look very impressive compared to 1.86GHz N2800 Atom, which is also 64-bit capable and dual-core, with Hyper-Threading to boot, but Atoms are in-order architecture. Turns out it's difficult to make code that would not choke in-order CPUs so much. The compilers are to blame although some code (semi-random branching for example) is just not predictable enough to properly optimize.
The APU is not just CPU though, it's also the GPU next to it. Radeon HD 6250 in this particular case, with 80 shaders clocked at 280MHz.

So why exactly is a measly mobile GPU, the lowest of all AMD has to offer, that much of a win? Because its 80 shaders equal to 1 compute unit (CU), and you can do other stuff with it than just drive VGA output.

To make a point here I've run some tests. My code was trying to brute-force crack M4-type encryption key from dumped NAOMI data. These keys are only 32 bit long and the encryption algorithm is not even that complicated once you see it - again, thanks to Andreas Naive for making "obvious" things actually obvious to us, mere mortals :)
I wrote a cracker in C that, given a key, will decode 8 bytes of data and compare it with known pattern to check for match. To scan entire key space you need to run this code 4294967296 times. A typical, simple approach would be to create a cracking procedure that takes a key value as an argument and then make a loop that will call this procedure 2^32 times, checking the result. Here's how long it takes:

* Intel Core2 Duo E6600
- 1 core @ 2400MHz (2nd core not used)
- full out-of-order architecture
- Windows 7 Professional 64-bit
- 64-bit code (MinGW64 4.5.3 -O2)
+ 415s

* AMD Athlon XP processor 1700+
- 1 core @ 1466.909MHz
- full out-of-order architecture
- Debian Linux, 2.6.32 kernel
- 32-bit code (gcc 4.4.5 -O2)
+ 937s

* Intel Atom N270
- 1 core @ 1596.095MHz (HT not used)
- in-order architecture
- Debian Linux, 2.6.32 kernel
- 32-bit code (gcc 4.4.5 -O2)
+ 2799s

* Raspberry Pi ARM11
- 1 core @ 900MHz (O/C, core @ 450MHz, SDRAM @ 450MHz)
- ARMv6 architecture
- Raspbian Linux, 3.2.27 kernel
- 32-bit code (gcc 4.6.3 -O2)
+ 3378s

As you can see it takes some time, and the in-order Atom and RPi ARM are especially bad at it. And my RPi is running overclocked, the typical values are 700MHz for CPU, 250MHz for core and 400MHz for SDRAM so in reality it's even worse. Obviously you don't want to run crackers on your small dev board but what if this was face/shape recognition based on images from small camera on a robot? That does seem like a plausible use case.

Now there's this stuff called OpenCL which lets you distribute your computation-heavy tasks over multiple CPU cores, and also GPU compute units. I used the same cracker, except the main loop was thrown out and replaced by OCL framework. Here's how it went:

* Intel Core2 Duo E6600
- 2 cores @ 2400MHz
- full out-of-order architecture
- Windows 7 Professional 64-bit
- OpenCl code (AMD APP 2.6)
+ 105s

* AMD/ATI Radeon HD 5770
- 10 compute units @ 850MHz
- GPU architecture
- Windows 7 Professional 64-bit
- OpenCl code (AMD APP 2.6)
+ 6s

Yeah, that's whole 6 seconds. Not all code gets that much of a boost on GPU, this one was integer based with some logic operations but didn't have many branches in it. Even the CPU version got twice as fast as simple C code, most likely due to aggresive compiler optimizations - most loops had just 4 passes so it's a great place to unroll and use SSE2 vectorization.

Now, my 5770 has 10CUs clocked at 850MHz so in total 8500PU - "power units". It run for 6 seconds so it needed 51000PUs to complete the task. The 6250 has only 1CU at 280MHz so 280PUs total. 51000/280=182 seconds. In reality probably a bit more due to slower data transfers. Compare that to Atom results and you'll see why having that GPU is important :)
With dual-core CPU you can easily run a lot of data processing and offload the really heavy stuff to GPU, so it appears to be a great dev board for more advanced projects.

Now why did I bother with this long-winded explanation? Well, it looks like AMD has got all three next-gen consoles in the bag. We've had a lot of "insider leaks" lately, most of it is wishful thinking taken for gospel, especially when it comes to fanboys. Silly people. It's not about raw power anymore. Consoles will not be able to beat PCs with the numbers, not unless you want them to draw 1kW of power and cost the same as rack full of servers. It's about being smart with what limited resources you have. One can argue that's always been the case but this generation will show it even more. A typical PC that can run games in 1080p in 3D at 60fps would need some 300-400 Watts of power. Next gen consoles are promising the same level of fidelity (well, we shall see about that I guess) at half that power. This is what I find most interesting. I couldn't care less if the CPUs are 1.8 or 3.2GHz and how may gigabytes of RAM there are inside.

BTW, I've made some additonal calculations. My RPi runs on 5V and draws 0.5A so it used up 5V * 0,5A * 3378s = 8445Ws to get the calculations done. My Radeon 5770 has 108W TDP so let's assume I actually hit that, and that the rest of my PC drew 150W, which is VERY safe assumption as CPU was idle and so were the HDDs. (108W + 150W) * 6s = 1548Ws. So not only it was faster but also used less power :) Nice things, these compute units. With 16 thousands 128-bit wide registers it's no wonder each takes so much silicon space.
  • Current Music

End of the world

I might not belive that the world will end this December but two of my PCs decided not to wait and commited suicide.

My netbook died first, about a month ago, one day simply didn't turn on and that was it. No amount of messing with its internals would help. It was an old hand-me-down with Atom N270 that I got for free because of failed HDD. I replaced it, reinstalled OS and kept using it for a year or so. It had Win XP, 1024x600 matte screen, 1G of RAM and the battery would hold for about 2 hours - which was good enough for my needs. Hell, it flew with me around Europe a few times. I wasn't using it much at home so I don't need to replace it right away but I sure miss it.

Yesterday another N270 gave up the ghost, this time it was my Linux system that I keep running 24/7 for various purposes: router for my private LAN, WiFi AP, FTP/NFS server, and most importantly my dev machine for Dreamcast and NAOMI since I keep my cross-compiler tools there. I liked this board too, it was all-passive cooled and required only 12V input from a brick-type PSU so there were no fans at all. I think the BGA balls cracked because I would get random reboots lately and last week the system would not boot up until it has cooled down to room temperature. Eventually even that stopped working and now it will reboot randomly within 10 seconds of powering up, cold or not. So, right now half of my flat has no Internet and I need to fix that ASAP.

I ordered a new board, it's another Atom (N2800 this time) since I really want to keep the energy usage down to bare minimum and I don't need a lot of CPU power. Even N270 could easily deal with 100Mb/s traffic on both NICs while streaming from HDD, and it was 2.5W rated. Yeah, I know, it doesn't include the north bridge which was doing most of the job connecting all system components together :) So N2800 might be 6.5W but I expect NM10 to have improved over the old 945 (and GPU is now part of the CPU as well). I was also interested in AMD Brazos family but those chips are much more powerful and require active cooling, and I don't need Radeon HD in a headless PC. The good news is the new board will also be powered by single 12V so at least I get to keep the PSU - hopefully. I already had to buy a new memory stick (DDR3 now instead of DDR2), a new low-profile NIC (no PCI slot, just one PCI-E x1), and a new N-capable WiFi card (miniPCI-E). Well, at least my netbook HDD is going to be reused :P

There is one more old PC that I have, and obviously my main one that is not very old but it has its years. I swear, one of them dies in the next few weeks and I'm buying a replacement and calling it Apollo 13. In the meantime I started doing more frequent backups.

Anyway, so what's up with the GDEMU project. Well, there is progress but I've hit some problems - as usual. I came up with new logic for the FPGA and it works perfectly (so far) between MCU and FPGA but fails on the GD bus. And I have no idea why, I've tried pretty much everything by now, except adding some pull-ups to control lines but I don't expect this to help much. Doesn't look like an electrical problem.
The prototype works when FPGA is clocked within a very specific frequency range, but not really otherwise. BIOS loads the game, I get to see the first screen or so and then it dies because DMA goes completly out of sync - I still have tons of data in the buffer but the console expects to see end-of-DMA interrupt already. So obviously I'm missing a lot of read requests but I don't know why. Must be another race condition that I can't figure out. So, why not let it run at the frequency it works? Because the problem is still there, just not as obvious. It's not stable either way and you wouldn't want your game to freeze 3 hours in and who knows how long since last save, right?

To combat that I've finally gave in and bought USB based JTAG programmer for Altera FPGAs. Those things are costly but I found a cheap clone that should work nice. I expect it to arrive in a few days. With live JTAG uplink I will be able to transfer new settings directly rather than have to swap SD cards as I do now, and more importantly I'll be able to run a logic analyzer to see what is going on.

The world will most likely not end but thanks to all those troubles (and GoG discounts :) my bank account balance just might.

EDIT: Looks like it could be electrical issue after all. Well, I'm going to rip the Dreamcast apart now and solder some proper wires for ground return path. Lets see what that does.

Oh, and here's a photo:

2012-12-19 GD-EMU proto test

As you can see I got the JTAG unit today and I'm fresh out of USB ports on the hub :)

EDIT 2: Apparently one year was not enough to add proper idle support for Cedar Trail Atoms to Linux kernel. Not even the bleeding edge 3.7.1. If you run dmesg |grep intel_idle you'll see this:

intel_idle: does not run on family 6 model 54

So, if you're in this situation as well and you don't mind compiling kernel from sources, try this hack:

1) Locate drivers/idle/intel_idle.c in the source tree
2) Make a backup copy just in case :)
3) Edit the file, find "intel_idle_ids" table
4) Add "ICPU(0x36, idle_cpu_atom)," line to it, but keep it sorted by model code

Compile and install the modules and kernel. Reboot. Enjoy.

Now, I'm not saying this is the proper way of doing it but 20mA less current draw from PSU (at 12.2V) says it's working. I haven't seen any nasty side effects yet.
  • Current Music
    Delerium - Poem I
  • Tags

Genesis contd.

And behold, it was very good...

Okay folks, since there are so many questions about the GD-EMU project and noone can be bothered to read the answers from the time I showed you my first iteration of the idea, here it is all again:

1) Ready when?
No idea. I would not be making a custom PCB and ordering new parts and working on it if I didn't belive it can be done, but at the same time I cannot (and will not) make any promises about delivery dates. Obviously though if I can't make it work as I'd like in the next few months it's going to be shelved again.

2) How much?
Again, no idea. In fact it's not even decided I will be selling those. If it doesn't seem like I can turn a profit without investing all my free time into it, I'll just stop at prototype phase. While I understand that it would upset many of you, I'm not a charity worker. It's one thing to code a free application and share it with the world and quite another manufacturing a hardware device for sale.

All I can say right now is the prototype is pretty expensive (compared to a price of a working, pre-owned Dreamcast). But that is true for all prototypes. Things get considerably cheaper when mass-produced. Then again it's quite possible the first batches will still be priced higher because of low volume of sales - I'm sure as hell not going to invest my own money into this.

3) Kickstarter? Preorders?
While Kickstarter seems like a good option, it's a no-no because I'm not a US resident. End of story right there. I will also not take any kind of preorders (or other money offers) until I'm certain the device will work and can be manufactured in suitable quantities. Things get serious when money are involved and I'm a rather cautious person.

4) Features?
It will be a 100% compatible replacement for GD-ROM drive, except using SD cards. It might offer better loading times but otherwise will function in the same way. It's meant to provide a backup solution for the laser and other mechanical parts of the drive which are no longer in production and fail after so many years of use. While many of you will interpret this last sentence as "it will play game rips" I'd like to point out that I never condoned software piracy. I think I made my point clear when I refused to fix any bugs in Makaron that were related to CDI rips of the games (as opposed to proper GDI images). Many of these "bugs" were actually how the rips worked on a real console, although these could be somewhat helped if I wanted to. But I didn't. So, if you are/were a Dreamcast user then you should be familiar with region locks, video cable restrictions, bootable (or not) homebrew, etc. Using GD-EMU will not remove/help with any of these. You might try image patching, sure, but I will not give any support for these modifications if there are any problems.

As for user interface - I like simple things that work as expected. I've seen too many projects that looked nice but didn't deliver what was promised in the first place. My goals are perfect compatibility and stability. Anything else is extra. I think 2 buttons is enough to select which game on the card should be "inserted".
If that's not enough for you, code a good Dreamcast app that will select games from the card - it can be put as the first image on it, which will boot by default. Then we can talk about how to make the hardware do what the app/user wants.

5) USB link to PC?
That's in plans, but no work has been done yet. I'm not even sure the USB port on the prototype works properly :) So, eventually yes, but probably not from the start. USB host support (as in USB HDDs and FLASH drives) is probably not going to happen. Did I mention I like simple solutions?

6) Other features?
Well, if it ever happens that I make tons of profit on these things, which I doubt, I might reconsider my stance on UI, USB host, and other things. But that would have to be a considerable amount of money to motivate me :)

7) Open source?
Highly unlikely. If only because some people could just take all my work and start selling their own devices. While I'm not stopping anyone from creating a different/better project, they better be prepared to spend as much time on it as I have. I've already helped many people by sharing important bits and pieces of info, and even programs made by me. There is goodwill and there is stupidity - and I have to say that more often than not I've came to regret my decisions. Once burned...

8) Pics or it didn't happen.
There are photos of my all-FPGA approach on this blog, and even some short movies on YT of it working (with minor issues) if you know where to look. I will post pictures of the V2 prototype connected once it actually does work. I'm redoing much of my FPGA code and this might take some time as I want to try another approach.


Let there be light:
2012-08-30 GD-EMU proto V2 #1

And there was light:
2012-08-30 GD-EMU proto V2 #2

More pictures to follow soon :)

Status so far:
Voltage regulators - check
MCU starts - check
Bootloader operational using 3V3 UART - check
MCU JTAG - check
C runtime stub + simple exception handling - check
Status LEDs - check
UART 115200 8N1 console - check
Interrupts - check (need to investigate if registers are really properly saved though)
External RAM - check (problem found, should be fixed now)
High speed SD interface - in progress

Random fun fact: Many SD/SDHC cards exhibit various little quirks in SPI mode so the code needs to be aware of those to work properly in every case. One would think the native SD protocol is so tightly standardized that there should be no such surprises. Well, I just found a bunch of 2GB Kingston SDs that respond to ACMD41 with bad CRC7...

EDIT: Turns out the R3 answer is the only one not protected by CRC7, that space is marked as reserved and just filled with all-ones. I'm still not getting the busy bit within reasonable times on these Kingstons but I suppose reading the docs few more times might teach me something new again.

Anyway, here's the actual thing:
2012-08-30 GD-EMU proto V2 #3

Now it's a proper prototype, with all these wires and blinking LEDs. A few things are still missing on the PCB but right now I need to get SD protocol working so I can fetch FPGA configuration image and test it.


High speed SD interface - check
DMA on SD i/f - check
Basic FAT support - check
FPGA - in progress

I'm using my own FAT library, which has no write support but it was designed to be fast while consuming as little RAM as possible. In fact current SD cards are so fast it makes sector buffering impractical, since the lookups and LRU queues kill any gains with additional overhead. I suppose it'd be different if the CPU was clocked above some 400MHz and had some fast L1 cache.
Right now I get average of ~10MB/s in test that seeks to random part of 1.2GB file and reads 1-3500 consecutive 2352-byte long chunks. This is to simulate RAW image reads for GD-ROM. So pretty well I'd say, a nice boost compared to 2.5MB/s I got over SPI.

The native SD interface required a pretty much complete rewrite of some code, so I'm not 100% sure it's stable and all, but seems to work for hours without problems so far.

With a piece of wire

SDR in 5 easy steps:

1) Design your radio

2012-06-23 DR2A PCB

2) Build it

2012-06-23 DR2A working

3) Get control software

2012-06-23 HDSDR @ 7880kHz

4) Get decoder software

2012-06-23 MULTIPSK HF FAX

5) Amaze your friends

2012-06-23 Weather Fax 7880kHz

Considering I got this far with a 2m piece of wire hanged across my window, I'd say it's a success :)
  • Current Music
    Russian radio :P
  • Tags