Deunan
Having fun with my ARM 
11th-Mar-2011 10:20 pm
I have ARM7 dev board (AT91SAM7S256 MCU from Atmel) that I've used in a few projects, though none of which was ever completed. It's a kind of testbed, for developing code that will be eventually run on a different CPU. One of the things I used this MCU for was SPI-based SD card support, for simple FAT16/32 library. And that in turn was a part of GD emulator project. Once I was happy with the code I moved all the software to Nios2 soft-CPU running on FPGA.

Recently though I decided to pursue the original idea, that is interfacing FPGA with a separate MCU. In this case the FPGA would only handle ATA/ATAPI stack and I2S protocol, plus data buffering through external SRAM chip. This could probably be done with a CPLD device but I already have an FPGA and the entry-level devices are cheap enough should I decide to stick with Cyclone II series. This way I can have logic analyzer (for debugging) in the project as well. So, I fetched the ARM box from the shelf, unpacked the kit, connected the power supply, and realized I can't do anything with it :)

As you know I'm using Windows 7 Pro x64 now, had it for a few months already. I quite like it, and the only thing that irks me is the lack of low-level parallel port support. I have software written for Windows 9x that requires direct I/O access, something that was quickly fixed with giveio.sys in 2000/XP but will no longer work on 64-bit OS. While I do realize that every piece of hardware and software will eventually become obsolete, and direct I/O access belongs to DOS era, I really miss that functionality. There are some solutions available but none of them is compatible with giveio. So I ended up with non-functional chip programmer (old piece of junk but it works and has some unique features that I use) and it also turned out that my Wiggler-clone JTAG adapter is now unusable as well.

Without JTAG adapter I can't update the ARM7 program. Sure, there is the SAMBA boot manager, but I don't like it. Running the code from RAM is faster but there is 128k of FLASH on this MCU and 64k of RAM, and I need as much RAM as I can get for buffers and structures involved in accessing filesystem on the SD card. Also, since I run the part at 48MHz, I need to add one wait cycle for FLASH memory access (it can go full-speed only up to 30MHz), so I want to be running from FLASH to have accurate representation of how fast the final code will be. Why not downclock the MCU to 30MHz? Well, internal SRAM can still work full-speed and this is important for SPI DMA, and I'm also planning on having USB uplink with a PC so I need clock rate to be 48MHz. I can clock the SPI bus at 24MHz (the limit for SD cards is 25), which is important too.

Anyway, turns out there are some cheap DIY USB JTAG adapters out there, based on FT2232 chips. I gotta say, FTDI did a good job there, as there are other USB serial port chips but surely theirs are the most popular. Also, cheap and easy to get too. This time though I was too lazy to solder everything myself, I figured I'd buy a clone of a well-known JTAG adapter and have it delivered. While putting together your own tools and hardware is fun, I really wanted to have a go at that ARM and quite frankly I wouldn't be able to make my own adpter cheaper than some 25 Euro - which is how much I paid. It's Amontec JTAGkey clone, does the voltage level shifting too though the range is a bit more limited, but 2.7-5V is enough for my needs.

First problem was the FT2232 in the adapter sports different than default PID, so FTDI drivers will not recognize it. Amontec haven't prepared a 64-bit driver it seems and also I didn't want to use theirs anyway, who knows how old it is and what bugs it has. Instead I modified INF files in newest FTDI driver package and added all the necessary entries to it, along with correct product names. Obviously this will not work well with the signature inside the driver binaries so Windows complained - but all I had to do was confirm and it installed properly. I've heard about some lengthy procedures, involving switching 64-bit Windows to some test mode and what not, to install unsigned drivers, but I didn't need that. With this approach I get to use one driver for all connected FTDI devices, I can always update it, and most importantly - it works perfectly.

Second problem was OpenOCD. While free to use - something I really like in my software - it's no longer possible find a precompiled binary for FTDI drivers. It seems that sometime in 2009 they figured that linking with 3rd party closed source library, even if dynamic, violates their GPLv2 license. Picture me doing a facepalm here. There are GPL replacements for USB libs but - figures - these don't work well with 64-bit Windows OSes. I know that license stuff is important, but why shoot yourself in the foot like that... One could argue it's Microsoft fault for requiring the drivers to be signed and disliking open-source software in general. That's just one side of the coin, though. The signing process is not exactly extremly difficult or expensive, and the authors could have just made an exception for the FTDI libs. It's their code after all. To me this looks more like a case of open-source people not liking Microsoft :)
Well, I can always compile the damn thing myself. My personal build can be linked to FTDI libs without violating GPL. It's just... the sources are missing Makefiles and the configuration script needs to be run in POSIX-like environment. Picture me doing another facepalm here. I'm kinda alergic to Cygwin and the various bits and pieces of informations I found suggested that OOCD can be natively built on Windows, so I got creative. I installed MinGW64 - well, unpacked really. Once you have several compilers in the system, each targeting a different CPU, you need to be careful of what ends up in the PATH variable :) For ARM development I use devkitPro, which is also the first choice for Nintendo DS homebrew as it comes with nice system libs, and it sport MSYS environment. I married that MSYS with MinGW paths and finally got OpenOCD to configure and compile itself. And hey, it works too :) I only wish this method was at least mentioned somewhere in the docs, because README and INSTALL are pretty much targeted for *nix platforms only - again, maybe it's just me, but it sure looks like someone doesn't like Microsoft.

Third problem was my ancient OOCD configuration scripts didn't work with version 0.4.0 and I had to rewrite that stuff. I strongly recommend you read the docs here, it's much easier once you realize you can reuse the ready-made scripts and example files. At first I copy-pasted some of the stuff I googled but ended up with a pretty big script I didn't understand much. Then I found the TCL folder in OOCD sources and most of the configuration data I needed was already there :) Long story short, my script is now working and I can program the ARM7 FLASH memory with my code.

The whole purpose of this endavour was to try out a few ideas I had, one of them being running MCU in Thumb mode rather than ARM all the time, which should improve code execution speed. Another one was to try multiple block reads. Single reads are easier but each time you issue a request the card will waste about 1-2ms preparing itself for the transfer. Multi-read obviously works only for continuous memory areas but there is next to no delay between each sector, so unless the files are very fragmented this will result in a significant speedup. There is a small issue of having to properly stop the multi-read sequence if the the address you want is not the next one (and it will happen every now and then since FAT structures need to be read and processed as well) but I got that worked out. The end result for a sequence of 1000 sectors:

- single block reads: 1100kB/s
- multible block read: 2500kB/s

Effective file read speed (12MB file on FAT16) was well over 1800kB/s. With these figures I am finally back on target for x4-x12 CD read spead that real GD-ROM can do.
Comments 
15th-Mar-2011 03:41 am (UTC) - parallel port
Anonymous
Does your Desktop have parallel port? if you do why don't you just install Virtual PC run winxp 32bit and flash it that way. Thats what i do when i need to flash the old fashion way since I'm too using 7 64bit.
15th-Mar-2011 10:45 am (UTC) - Re: parallel port
I haven't tried Virtual PC, I'm using VirtualBox and it doesn't support parallel ports at all. I guess I could try it, though I know that the programmer software is very picky and likely won't work "just because".
The plan right now is to add USB support to the hardware using AVR MCU, this would allow me to use it with my netbook too and also not worry about changing the motherboard in future. Obviously I would need to write my own PC-side software - but I can do that, no problem.
16th-Mar-2011 04:03 am (UTC)
Anonymous
or you can get a medium pc with 32 bits and use it eveytime you need to use windows 32 bits
16th-Mar-2011 11:02 am (UTC)
Yeah, I do have one, it's not in very convenient location though. I use it when I have no other choice but I'd rather be able to connect the programmer to my own PC/netbook.

I have the protocol mostly figured out (everything I need to get it working with my code) and I bought some AT90USB16 for experiments. What I need is some free time to actually sit and code (I'll need a PCB too, if I want it cheap I'll have to wait 3 weeks for it to be manufactured and sent to me).
16th-Mar-2011 11:00 am (UTC)
Anonymous
thank you for hard working,Deunan.
it sounds great !
i'm the man asked you GD-ROM emulator at 20th-Jan-2011.
so can you emulate Dreamcast GD-ROM drive with FPGA + ARM7 ?
if you can do , will you show us the code and circuit?
cheers,
16th-Mar-2011 11:20 am (UTC)
That's the plan, but the emulator is not functional yet. I'm running some experiments to check the transfer speeds and seek latency on FAT16/32 and various SD cards. Also, the ATA/ATPI part on FPGA needs rewriting because I want the final version to use cheap and available parts. Something that can be soldered without using heat gun/IR lamp (so no BGA chips).

In last 2 days I modified and corrected my FAT handling procedures. These can only do read access but it's pretty fast and doesn't use much memory - 512 bytes for FAT buffer, 512 for each opened file, plus some variables. I only need one file at a time for this project.

The only thing I can't do fast is seek to file position before the current one (this requires the allocation table to scanned again), but I don't do that much in my GD code. Typical access is read or streaming, with raw sector data rejection, I get 2.3MB/s on that now. Seek to the end of 1.2GB file on FAT32 is about 220ms, which is more or less the latency of GD drive. I'd say it looks good so far.

Now I have to come up with a good method of pushing data between ARM and FPGA (I'd probably be limited to 8-bit bus for that) and a way to connect some 128-256k of static RAM to FPGA as independant data buffer. Ideally it'd be dual-port RAM but that's expensive, I think I can use a fast (up to 20ns) part and just interleave access.
31st-Mar-2011 05:35 pm (UTC)
Anonymous
MAny thanks for the continued work on dreamcast whereas many others have stopped altogether. Since Makaron lacks a dedicated forum for tech support I've been wondering as to why it runs slower on Core i3 processors than it did on C2D's. Even on my old 2800+ and Geforce6800 I got faster speeds than the newer i3. I've used the same settings as before, but it seems there's severe audio stuttering. Alas, a buffer option does not exist. Disabling audio yielded no frame benefits (still <50). Maybe I should wait for an update? (btw I'm using a standalone Radeon 5670 card and tried all affinity options). Any help would be greatly appreciated, as this problem affects both DC and Naomi.
31st-Mar-2011 10:52 pm (UTC)
What i3 is that, exactly? And what Makaron version are you using? See, with the older versions the most common problem was mistaking the cause and effect - which was the imperfect sound processing causing the slowdown, not the other way around. This is also why I dropped DirectSound entirely (it's next to non-functional on Vista and Windows 7) in favour of XAudio2.

And yeah, the Dreamcast version is way behind NAOMI now, sorry 'bout that. Got plenty of work on my hands and little time to spare.
1st-Apr-2011 01:43 am (UTC)
Anonymous
Thanks for the prompt respone. The emulator I'm using is T12/7, on an i3 370M, which should be enough for Naomi emulation? Does Realtek HD Audio support XAudio 2 as implemented in Makaron? I find this problem affects the older DC Makaron equally, regardless of setting. I certainly hope it's not a Catalyst thing.
1st-Apr-2011 02:22 am (UTC)
Anonymous
Sorry for the double-post (please remove the first), but I managed to resolve this issue by forcing Vsync in Catalyst Control Center. Hopefully this will be of help to others if they query the same problem via Google or other search engines. Any chance of including a Vsync option in a future Makaron revision? :)
1st-Apr-2011 09:04 am (UTC)
There was a way, once, to change this via INI file. There is no perfect setting, sometimes it's better with vsync, sometimes without. And most importantly, graphics drivers nowdays can just ignore it and use override. Which is good it the user knows about it and controls it, but sometimes the driver will just mess with you for the fun of it.

And there is me moving slowly to DirectX 11 - which works in a bit different way. I want to see how it works first, then I'll add options to mess with it :)
12th-Apr-2011 09:55 pm (UTC)
It’s really a nice and helpful piece of information. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thanks for sharing.

15th-Apr-2011 04:41 pm (UTC)
Your site article is very intersting as well as fanstic,at the same time your blog theme is exclusive and ideal,great job.To your success.

20th-Apr-2011 07:18 pm (UTC) - Sega Model 3
Anonymous
Hey . Just wanted to share with you that sega model 3 emu is out.
http://www.trzy.org/Supermodel/WhatsNew.html :)

20th-Apr-2011 10:37 pm (UTC) - Question
Anonymous
The capcom IO would be a perfect tool for PC to JAMMA adapter except the USB on the NAOMI isn't recognized by a PC. How hard would it be to write a program for the PC so the naomi USB would be recognized so MAME could handle it? Is that even possible?

Wouldn't the capcom i/o be a perfect and semi cheap PC to JAMMA interface? Am I missing something?

Thanks!
-Long time reader.

21st-Apr-2011 09:33 pm (UTC) - Re: Question
NAOMI doesn't use JAMMA, it uses JVS - which is a serial bus based on RS485. It has nothing to do with USB, except the connectors and cables.
28th-Apr-2011 01:02 pm (UTC) - Dreamcast
Anonymous
All you have done for the world that likes this stuff is so excellent. Made me play Rez almost like I had a Dreamcast.

Thanks!
3rd-May-2011 07:55 am (UTC)
Anonymous
Dude if you do that gd rom emulator and produce a few , i will be the first one who buys it!
12th-May-2011 02:32 pm (UTC) - vsync
Anonymous
Amazing job on this emulator! By far the best dreamcast emu out there. Is there any possibility you can add the vsync option back to the ini file? I can't seem to force it in catalyst control center. The screen tearing is awful and was just hoping to eliminate it. Thanks and great job!
This page was loaded May 25th 2013, 3:54 pm GMT.