Deunan
Cranking old engines 
2nd-Mar-2012 11:31 pm
Lookit what goodies came in the mail today:



There is a reason why I'm showing you Lady Phantom - this is one of the somewhat problematic CD-ROMs for PCE. First, the TOC:

-> Track 01  (Audio, 00:47:65, LBA: 0 / 00:02:00)
-> Track 02  (Mode 1, LBA: 3590 / 00:49:65)
-> Track 03  (Audio, 02:34:63, LBA: 5157 / 01:10:57)
-> Track 04  (Audio, 02:23:59, LBA: 16770 / 03:45:45)
-> Track 05  (Audio, 02:48:09, LBA: 27554 / 06:09:29)
-> Track 06  (Audio, 02:30:08, LBA: 40163 / 08:57:38)
-> Track 07  (Audio, 02:48:02, LBA: 51421 / 11:27:46)
-> Track 08  (Audio, 02:27:37, LBA: 64023 / 14:15:48)
-> Track 09  (Audio, 00:33:26, LBA: 75085 / 16:43:10)
-> Track 10  (Audio, 00:09:50, LBA: 77586 / 17:16:36)
-> Track 11  (Audio, 00:33:54, LBA: 78311 / 17:26:11)
-> Track 12  (Audio, 02:13:59, LBA: 80840 / 17:59:65)
-> Track 13  (Mode 1, LBA: 90874 / 20:13:49)
-> Track 14  (Audio, 02:14:25, LBA: 91550 / 20:22:50)
-> Track 15  (Mode 1, LBA: 101625 / 22:37:00)
-> Track 16  (Audio, 02:55:64, LBA: 102325 / 22:46:25)
-> Track 17  (Mode 1, LBA: 115514 / 25:42:14)
-> Track 18  (Audio, 02:05:73, LBA: 116214 / 25:51:39)
-> Track 19  (Mode 1, LBA: 125662 / 27:57:37)
-> Track 20  (Audio, 01:54:28, LBA: 126384 / 28:07:09)
-> Track 21  (Mode 1, LBA: 134962 / 30:01:37)
-> Track 22  (Audio, 01:43:38, LBA: 135678 / 30:11:03)
-> Track 23  (Mode 1, LBA: 143441 / 31:54:41)
-> Track 24  (Audio, 02:30:03, LBA: 144149 / 32:03:74)
-> Track 25  (Mode 1, LBA: 155402 / 34:34:02)
-> Track 26  (Audio, 01:52:62, LBA: 156088 / 34:43:13)
-> Track 27  (Mode 1, LBA: 164550 / 36:36:00)
-> Track 28  (Audio, 02:09:10, LBA: 165266 / 36:45:41)
-> Track 29  (Mode 1, LBA: 174951 / 38:54:51)
-> Track 30  (Audio, 00:37:56, LBA: 175575 / 39:03:00)
-> Track 31  (Mode 1, LBA: 178406 / 39:40:56)
-> Track 32  (Audio, 01:21:72, LBA: 179070 / 39:49:45)
-> Track 33  (Mode 1, LBA: 185217 / 41:11:42)
-> Track 34  (Audio, 01:43:24, LBA: 185889 / 41:20:39)
-> Track 35  (Mode 1, LBA: 193638 / 43:03:63)
-> Track 36  (Audio, 01:11:69, LBA: 194338 / 43:13:13)
-> Track 37  (Mode 1, LBA: 199732 / 44:25:07)
-> Track 38  (Audio, 01:31:39, LBA: 200368 / 44:33:43)
-> Track 39  (Mode 1, LBA: 207232 / 46:05:07)
-> Track 40  (Audio, 03:42:69, LBA: 208632 / 46:23:57)
-> Track 41  (Audio, 01:01:60, LBA: 225351 / 50:06:51)
-> Track 42  (Mode 1, LBA: 229986 / 51:08:36)
-> Track 43  (Audio, 04:11:57, LBA: 230654 / 51:17:29)
-> Track 44  (Mode 1, LBA: 249536 / 55:29:11)
-> LeadOut  (LBA: 250814 / 55:46:14)

It's very rare for any disc to have so many data and audio tracks mixed together. PCE doesn't use ISO9660 so the devs just split the game data as they saw fit.
If this was a well-mastered Yellow Book compliant CD-ROM then we'd just rip the tracks, ignore pre/postgaps and it'd be a complete, working dump. It's not so easy in this case. Let's analyze the transition from track 14 to track 15.

-> Track 14  (Audio, 02:14:25, LBA: 91550 / 20:22:50)
-> Track 15  (Mode 1, LBA: 101625 / 22:37:00)

The last sections of track 14 are silent, this is normal except for cases where you don't want gaps in audio. Since the next track is data it's a good idea to do this properly. Except, you know, the very last sectors are filled with junk here :)

LBA  :  101472

0000 :  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................  <- silence
SUBC :  01 14 01 02 12 22 00 22  34 72 00 00 00 00 00 00  ....."."4r......

LBA  :  101473

0000 :  6A 82 AF 21 BC 18 71 CA  A4 57 3B 7E 93 60 6D E8  j..!..q..W;~.`m.  <- junk
SUBC :  01 14 01 02 12 23 00 22  34 73 00 00 00 00 00 00  .....#."4s......

LBA  :  101474

0000 :  6A 82 AF 21 BC 18 71 CA  A4 57 3B 7E 93 60 6D E8  j..!..q..W;~.`m.  <- more junk
SUBC :  01 14 01 02 12 24 00 22  34 74 00 00 00 00 00 00  .....$."4t......

Notice how Q sub properly identifies this as audio track 14, index 1, time 22:34:74 - which is (22*60 + 34)*75 + 74 = 101624. With the usual TOC offset of 150 we get LBA 101474. Figures.
Also, TOC says track 14 should be 2:14:25 long and sure enough this is the last audio section, 2:12:24, according to Q (these counters start at zero).
Now, since there should be a 2-second long pregap for a data track that follows an audio track, and the data track starts at LBA 101625, so that means the first Mode 1 sector should be at LBA 101625 - 2*75 = 101475:

LBA  :  101475

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 35 00 01  ............"5..  <- Mode 1 header, first sector of a pregap
SUBC :  01 14 01 02 12 23 00 22  34 73 00 00 00 00 00 00  .....#."4s......  <- Q address goes back!

And we have a Mode 1 header! Except Q sub still says it's audio track #14-1 :) Also, data header address is 22:35:00 and Q has 22:34:73.
It doesn't take a genius to notice that 22:34:73 was also the first audio sector with "junk" instead of proper silence. Gee, I wonder what would happen if we extracted that junk and run it through descrambler. Yup. It is, in fact, the same sector we are looking at now and the header is 00 FF FF FF FF FF FF FF FF FF FF 00 22 35 00 01.
So... two last sections of track 14 and two first sectors of track 15 are overlapping. Depending which track you're reading, the drive will return either raw bytes for audio or descrambled sectors for data. In fact, it's even worse:

0000 :  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
0010 :  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
0020 :  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
...
0440 :  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................
0450 :  00 00 00 00 00 00 00 00  00 00 00 00 00 FF FF FF  ................
0460 :  FF FF FF FF FF FF FF 00  23 B5 00 61 00 28 00 1E  ........#..a.(..
0470 :  80 08 60 06 A8 02 FE 81  80 60 60 28 28 1E 9E 88  ..`......``((...
0480 :  68 66 AE AA FC 7F 01 E0  00 48 00 36 80 16 E0 0E  hf.......H.6....
0490 :  C8 04 56 83 7E E1 E0 48  48 36 B6 96 F6 EE C6 CC  ..V.~..HH6......

The data sector 22:35:00 starts in the middle of audio section 22:34:72. Which is perfectly OK. Except it cannot be properly ripped when the drive already cooks the raw data into sectors - since you can sync data to headers but there is no way to sync the audio.
And this is why I said it's not always possible to rip a CD-ROM track by track, even using subcodes as a reference. It only works for discs where audio sections match data sectors perfectly and even then it's down to drive logic to figure out how to join subchannel data with post-CIRC stream. So, claiming that any dump is a perfect copy of the master image is just... silly.

Let's see what happens next with our dump:

LBA  :  101476

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 35 01 01  ............"5..
SUBC :  01 14 01 02 12 24 00 22  34 74 00 00 00 00 00 00  .....$."4t......  <- note that Q still says track type is audio

LBA  :  101477

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 35 02 01  ............"5..  <- M1 address: 22:35:02
SUBC :  41 15 00 00 01 74 00 22  35 00 00 00 00 00 00 00  A....t."5.......  <- Q address: 22:35:00 (index 0, so a pregap)

LBA  :  101478

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 35 03 01  ............"5..
0920 :  00 00 00 00 00 00 00 00  00 00 AD 90 29 C9 83 16  ............)...
SUBC :  41 15 00 00 01 73 00 22  35 01 00 00 00 00 00 00  A....s."5.......

LBA  :  101625

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 37 00 01  ............"7..  <- First sector with actual data
SUBC :  41 15 00 00 00 01 00 22  36 73 00 00 00 00 00 00  A......"6s......  <- Q still says pregap

LBA  :  101626

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 37 01 01  ............"7..
SUBC :  41 15 00 00 00 00 00 22  36 74 00 00 00 00 00 00  A......"6t......

LBA  :  101627

0000 :  00 FF FF FF FF FF FF FF  FF FF FF 00 22 37 02 01  ............"7..
SUBC :  41 15 01 00 00 00 00 22  37 00 00 00 00 00 00 00  A......"7.......  <- Q finally switches to index 1

Q never matches the header address and in fact even the track type is wrong for some sectors. But don't assume that this 2-section offset is constant for all the audio/data tracks on the disc. While it could be the case, it's not guaranteed. The spec allows up to +/- 1 second offset so the difference could be anywhere from zero to 150 and, depending on what the drive logic thinks is the start of audio data, the header position might end up in different part of the audio section. In fact it's allowed for the header to start well inside a frame, as long as it's on 4-byte boundary.

So... all these frames, sections, offsets, sectors, tracks and gaps start to confuse you? Well, keep in mind the CD, while digital in nature, was meant to carry audio. Data sectors were added later and kinda slapped onto existing audio layer. This let the manufacturers reuse their designs by just adding header detection and descrambling hardware (along with some RAM for data buffering). But thanks to that we have this mess on our hands now :)
Comments 
2nd-Mar-2012 11:36 pm (UTC) - zyrobs
Anonymous
Looks like you didn't take offset corrections into account (cd drive *and* disc offset), and thats why your drive is later sectors instead of audio. Pretty sure all of this has already been figured out by now though.
3rd-Mar-2012 12:10 am (UTC) - Re: zyrobs
At the very least read ECMA-130 please. Get to know the low-level structures and basics of CIRC. Then you can come back here and explain, in proper terms, what particular "offset" do you mean.

Also, what part of "not constant" did you not get?
3rd-Mar-2012 02:39 am (UTC) - Re: zyrobs
Anonymous
So its impossible?
and what it can be done about it?
Do you plan to release your own pc engine emu?
3rd-Mar-2012 09:32 am (UTC) - Re: zyrobs
Well, for discs where the overlap is almost zero, or constant, you can come up with a set of "magic" numbers that will "fix" the dump. But what if the offset you speak of is, say, 50 sections - which is still well inside the +/-75 limit. Will you reconstruct up to 50 sectors on each track for that dump and still claim it's "proper rip"? It will work, sure, but it won't be a clean dump.

No, I don't plan on doing my own PCE emulator. I just want a better dumping format, the low-level frame rip I explained in my previous post would be a good candidate. Or, I hear MESS wants direct pickup stream rips. Kinda overkill but that might be also a good idea for a new format.
4th-Mar-2012 09:27 pm (UTC) - Re: zyrobs
Anonymous
How friendly.

I'm not talking about CIRC and its relation to subcodes. I'm talking about why the audio and data tracks overlap each other.
You know exactly what audio offset correction is: you talked about it in your last post. On that PCECD game, the audio track has a negative offset, a particularly huge one (judging by your numbers: -1484 samples, if I interpreted everything correctly). So when your drive is supposed to read LBA 101474, it ends up reading LBA 101474 + 1484*4 bytes, which is already in the data track part. And since it is reading it as audio, the user data is scrambled.

Note that the audio offset is caused by two things, once by your drive firmware, and once on the disc itself (in layman terms, think of it as the cd manufacturing plant not having correct read offset when replicating the master disc). Drive offset can be measured per-drive, factory one is per-disc (or per manufacturing plant, but you'd need to be very obsessed to gather numbers for each one of those - not impossible though).

As for why subcodes mismatch the lba in data sectors, I have no explanation for that. But I'm fairly sure that it's not even required here, to rip user data track to track. Unless you want to rip subcode data track to track as well, but that's not something I've seen done before, and ultimately pointless when you are making full cd dumps anyway. (I do have seen apps that can rip proper subcodes, without the offset mismatch on data tracks, for the full disc).
4th-Mar-2012 11:19 pm (UTC) - Re: zyrobs
The whole "audio offset correction" is just a gimmick to work around different implementation of the Q sub and post-CIRC data syncing on each drive. Nothing more. It's useful for ripping plain CDDA discs and that's about it.

You apparently did not read the document I suggested so I see no reason to argue with you. See, I do know why Q sub does not match Mode 1 headers and it's a perfectly normal situation. While the disc could have been mastered better, this is still well withing tolerance. Hint: the pre/postgaps? These exist for a reason.

This "disc offset" you speak of is one way to counter this problem, but it works only because so far all CDs had the address difference constant for all tracks. This should be the case but a really lousy mastering could have made it variable. What are you going to do then, come up with some kind of "track offset"?
Not to mention this fixes the dump but it's not longer a 1:1 rip of the disc. Basically you're re-mastering it with perfect sync of audio sections and data sectors. This is a hack and not how a proper preservation is done.
5th-Mar-2012 01:09 am (UTC) - Re: zyrobs
Anonymous
Well of course it's a workaround, how else would you counter the problem? Build your own "better" cd drive that syncs samples to subs perfectly? I know that you probably could, but I don't think you have every videogame ever to dump with a perfect drive - you'd have to mass manufacture it on some level, and send it to other people who can dump their games.

I've read the Yellow book (a long time ago), but I'm not talking about the subcodes here at all, but user data. You know why Q sub does not match, that's great, and I'm not even arguing that part.

The "disc offset" I mentioned is something you should consider as metadata. The dump is still a 1:1 rip, insomuch that it contains all the information of the original CD. You can also burn it with the specified disc offset value (combined with the write offset of your burner, of course), and you get the SAME data as on the original disc - because the junk data is just the preceding/following data sector, scrambled. (and if it's not, that can be saved as well...)

The upside is that you don't have to save the same disc HUNDREDS of times if it happens to be a popular title that was printed in the millions, in multiple cd plants, and had many offset variations on the same user data.
The downside is that I'm not aware of any burning app that can burn mixed mode discs with write offsets applied (and backtracking the offset with scrambled data sectors). But hey, Dreamcast GDI files cannot be burned either, so as far as preservation goes, the data is all there.
5th-Mar-2012 10:55 am (UTC) - Re: zyrobs
We could dump the disc at lower level then sectors. Frames, or even EFM stream from the pickup. A dump like that could always be converted to tracks if need be.

I agree that with "disc offset" (lets say it is considered metadata and hopefully written somewhere) you do get a working dump, which contains all the disc data. That is true. But it's not a 1:1 rip. We could argue if it's enough that it works or not but let me just point out that there might be consequences of "fixing" the dump. Some games are broken by really small timing issues for example, and PCE is using hard-coded track numbers and perhaps even sector addresses. It's simply better not to mess with it - we have to emulate the system with all the imperfections it has.

Again, let me say that we do have low-level floppy rippers now - and really, it's 2012, and perfect dumps have only started appearing a few years ago. So I say dumping CDs at lower level is well worth investigating, even if we end up redumping those again. At the very list the problematic ones.

Dreamcast GDs are sometimes problematic as well but those have one advantage over standard CDs: namely, there is only one type of drive that reads them. So, a dump made with Dreamcast is going to be functional even if it's not 1:1 perfect rip. We could, and should, work on this in future (who says a low-level dumper can't be made to support GDs) but at the very least there are no emulation-related problems because of that. And I would know :) It's also thanks to the GD structure that allows data tracks only as the first and last in the HD area, and the audio track in between (if any) are not mixed with data. The only place where a problem like on this PCE disc could happen is the last two tracks, and as long as there is no junk that would cause glitches on the last audio track it can be safely ignored. There is also track 1 and 2 in the SD area but it's not used by the game. Should still be dumped properly but as I just said, it can wait. PCE needs attention now.
5th-Mar-2012 10:13 pm (UTC) - Re: zyrobs
Anonymous
Good luck building a reader that can save the straight 0s and 1s from a disc... Remember that all the EFM, CIRC, multiple levels of parity, and even pregaps were added there so the discs can withstand huge numbers of abuse and still work. Even if you do create such a reader, what are the chances that you can actually read all the low-level data? Getting Q-subs to have no errors is rarely possible even on pristine condition discs.

As for PCE CDs... I believe it has been 100% figured out by now.
6th-Mar-2012 01:07 am (UTC) - Re: zyrobs
I'm perfectly aware of all the low-level stuff that will need to be handled in software. CIRC is a bit complicated but the rest is pretty straightforward. In fact I'm prettu sure equipment like this already exists, except nobody is going to give it to me for free so I need to make my own. Bootleg games do come from somewhere you know :)

Reading low-level data means there will be bit glitches from time to time, EFM is not perfect after all, but multiple passes can help with that. It may even take several discs to get one complete, glitch-free dump in case some have scratches for example. We can have intermediate dumps that will work (after all drives deal with this stuff on the fly) on emulators for such cases, until we can obtain a perfect one. IF we can get a perfect one. Still, it's better than what we have now.
5th-Mar-2012 07:03 am (UTC)
Anonymous
I suppose you could ignore the toc... Make a dummy data disc filled with nonsense/zero/whatever data, have the drive read this disc for its toc, swap it with the non YB compliant disc and dump it in raw mode with the sub-channels or not.

It's a hacky way, technically wrong, not as simple as it sounds and there's no guaranty that the sub-channel data will be valid or even that the drive will be able to read audio tracks as data (the opposite is possible on all drives by design but you have to read, modify and inject the initially skipped data in the beginning and erase data in the end, again more complicated than it sounds but I suppose you know the deal already anyway).



On the bright side, at least it's not a Sega Saturn disc with some data placed thousands of "sectors" away from the previous place with data, making the drive lose its way and not knowing where it reads from. :p


~Psy
5th-Mar-2012 10:29 am (UTC)
Using a full-audio trap disc and swapping it for the PCE one could work, but it's also way off being perfect. As long as you let the drive to come up with sectors it's always going to be the internal logic messing with things.

We have low-level floppy disc dumpers now, for the very same reason really - the internal drive controllers were unable to read the media exactly as it is. What I want to do it something like that for CD-ROMs.

I haven't really worked with Saturn CDs and I wonder if it's a mastering error. The spec says that data sectors inside a track should "immediately follow" each other - but it's not exactly clear what that means. In theory there could be "audio" gap between sector N end and sector N+1 header. This is not a proper audio though since the track type will be set to data, it just makes the drive nervous since it needs to read a lot to find the next sector header.
Problems like this would also be taken care of with low-level dump but obviously the emulator code would now have to do all the work, including handling such cases properly. But then again, if the original hardware can do it then the emulator should be able as well :)
10th-Mar-2012 10:40 pm (UTC)
Anonymous
What do you mean? Saturn discs are either with single data track or with 2 data tracks, but in this case the gaps between them are also in Mode1/Mode2, so no issues there. Or are you talking about the rings?
5th-Mar-2012 08:00 pm (UTC)
Anonymous
Ah, so the plan is to create a low level optical disc dumper. It makes sense since you're dealing with media, software and hardware limitations and design side-effects here.

Kinda makes you happy that technology goes back into using memory chips for storing massive amounts of data once again.


In the end, there's no way to make 1:1 copies of optical or magnetic media using out of the box consumer hardware. Legal issues also prevent the massive creation of such tools making the whole idea even harder to implement.

I hope you are prepared to face legal charges if some greedy publisher gets on the way.


In any case I wish you good luck with the project.
~Psy
6th-Mar-2012 12:50 am (UTC)
Well, DVDs are a bit saner than CDs since the idea was to store data on them, as in sectors and not streams. Most consumer drives don't allow ripping anything more but the user content, but that also means games and apps couldn't do anything tricky with it, so no need to bother with raw dumps.

As for FLASH-based devices, I like SD cards because SPI protocol is free :) Too bad anything else needs dedicated hardware (CRC for each data line for example is not not easily computed on CPUs, or not fast enough, and it's mandatory). Hardware approach does however mean you get the license cost with the price of the chip. Not many MCUs have stuff like this though...

Also, truth be told, if any "big player" decides to send lawyers my way I'm probably going to drop the project. Or at the very least officially :) EU laws seriously screw us over in this regard... This is also why I'm very hesitant on finishing my GD emulator. Chances are I'm gonna get sued (but that won't stop the Chinese from copying my project :P)
10th-Mar-2012 10:47 pm (UTC)
Anonymous
IIRC, it's a CD-ROM^2-only issue, TOC vs. sub desync. Theoretically, TOC is in priority; on another hand, .sub was probably mastered before the TOC and should be more precise. Actually, you can dump everything starting from lead-in and finishing with lead-out, even without unscrambling the data, but you need a Plextor drive for this.
10th-Mar-2012 11:03 pm (UTC)
Anonymous
My bad about the previous comment, it's not the TOC vs. sub sync issue. This case is a bit different. Yes, you can get a good dump, but only reading these areas _backwards_ and only in 0xD8 (reading all the sectors as audio) mode or after swapping with a CDDA disc (if you don't have a Plextor drive). I'll try to look at this disc closer, though (have it). Hello from Redump.org, btw :-)
11th-Mar-2012 01:11 am (UTC)
Well, the main problem is the audio doesn't have sectors :) It's a stream of data, so it's up to the drive logic to come up with "sectors" when you rip audio in digital form.
This is exactly why post/pre-gaps exist, when they added sector-based data storage system on top of binary stream there had to be some margins for positioning accuracy. These are quite generous but I suppose back in late '80 it was necessary to have some freedom, as the mastering equipment wasn't well developed yet.

Since on this disc the offset is about 2.5 data sectors in length, and also negative, it causes problems for us. But a properly made drive can easily deal with this, after all will seek well before the requested LBA and then start reading and scanning for data sync marker. So, even with dumps as they are now, a proper emulator should be doing the same (rather than assume sectors in data tracks start exactly on the first byte).
11th-Mar-2012 08:30 pm (UTC)
Anonymous
http://redump.org/disc/2414/ (was already dumped before, now verified). Also, there's no junk, when reading the whole CD with 0xD8 command (btw, most likely those were leftovers from unscrambling the nearby data track, since the offset is around 2 sectors and negative, 2 sectors before the data track can be problematic and show "junk" and wrong .subs). Will verify on other drives, though. http://www.mediafire.com/?nwmbp51pb17gv3b -- the .sub, it's good with no missing/doubled sectors (these areas, as I've already said, should be read sector by sector backwards), every sector was properly read, not generated.
12th-Mar-2012 12:12 am (UTC)
Obviously the Q sub "going back" is just a side effect of what I do and not really present in the stream. But both that, and the last audio sectors being filled with Mode 1 data, tell us that this is how it's stored on the disc.

I'm not quite sure what D8 command does on Plextor but if you acknowledge there is an offset then the audio postgap, when dumped, must be filled with "junk". Or else you're regenerating these parts.

Again, this is a perfectly normal situation. There is a reason why last 1s of audio is considered post-gap and first 2s of data track is pre-gap. It's for handling situations like these, especially when you consider that allowed values for the data LBA to Q sub offset are +/- 75. I really wish I could get my hands on a poorly mastered disc where this offset it close to 50 or so, that should trip even that D8 command masking.
12th-Mar-2012 03:50 am (UTC)
Anonymous
"Junk" isn't a part of the data, it's a part of the scrambled data track, which goes into audio when reading the data using regular commands. See, the drive automatically fixes the offset for the data parts, when descrambling (since it reads all the missing data from the next/previous sector, when the needed bytes are missing and shifts that data automatically), but doesn't shift the audio data. As a result, you may see the audio data with parts of the data track unshifted. In short, due to the firmware issues, you get the same piece of the data track twice - once as a part of the data track, fully descrambled and aligned and once - as a part of the audio track, in scrambled form, due to an error. 0xD8 command reads everything as audio, so, no shifting and no additional errors there. You can try to insert an audio CD (better with a single dummy 700MB track), then swap it with your game CD without using the "Eject" button (using the pin hole or using the partially disassembled drive). As a result, you will be able to read all the data in raw mode, without descrambling.
12th-Mar-2012 01:36 pm (UTC)
Tell you what, I do have a PX-760A now so I will experiment more. I'm still not sure what D8 command does but it can't possibly account for any arbitrary offset so if the first data sector N starts at audio frame with Q address N-2 then it's gotta show in the dump. Period.

Dumped data will always align itself properly since the drive knows exactly where the header is, but the point here is the sync marker can very well be inside an audio section, and not exactly at the beginning. Or even have a bigger offset, as is in this case. We can contest if it's the data that's poorly positioned or is it the drive not figuring out properly where the audio sections start/end - but that's not the issue, the fact is there is an offset. So it should show in the dump. And we don't want that because it makes the dump problematic to work with.

If we kept all dumps as pre-descrambled audio that wouldn't be much of a problem, right? Same with low-level dump. Except we don't have to worry about offsets anymore, there are none. More work for the emulators though.

I'm going to prepare and burn a few tricky images when I have some more free time, maybe I can come up with something that will convince you.
12th-Mar-2012 03:55 pm (UTC)
Anonymous
Yes, if we kept the solid images, we could remove the unneeded bytes from the very beginning, so the data would start exactly from the first byte of the first sector, without any shifts/offsets. With PX drive it's also possible to dump the sectors -150 to -1, which contain the first sector's pregap (which is not always empty, especially on audio cds) and the lead-out area (but since it contains the same amount of bytes repeated many times, it worth to keep only a part of that data). In this case, you could store the whole CD image in a single 2448 bytes/sector format and no cues would be needed, since all the needed track type/address data is in the subcode area. That's what would be a real 1:1 dump (or the closest possible without the special/custom standalone devices)
12th-Mar-2012 03:46 pm (UTC)
Anonymous
Guess, I'll explain everything from scratch, in case someone is following the discussion :) As you've already said, the CD format wasn't designed for storing the computer data, only the sound data, 2352 bytes per sector + 96 subchannel bytes. You're not really correct when you say, that "the audio doesn't have sectors" - the audio doesn't have numbered sectors (no sector #0, sector #1, etc.), but logically it is a series of small 2352-bytes long pieces. Each "piece" (sector) has an additional 96-bytes long part, called subchannel or subcode. It is needed to navigate, because when you choose the needed track to play, the CD player needs to find it somehow, so the subcode part of the tracks keeps the information about the position of this sector on a CD. Also, there are 2 areas called lead-in and lead-out, lead-out is a dummy area of the disc, which indicates there's no more data in this session; lead-in area is for storing the TOC, which keeps the information about the total number of tracks on a CD and a Minutes:Seconds:Frames(MSF) format address for the beginnings (INDEX 01) of each of the tracks, so it could be easily located by the reading/playing device. Many discs also have a pregap area (INDEX 00), which belongs to the same track, usually it is empty, but can hold the very beginning of the track or the very end of the previous track. It was made for the earliest CD players, so they could start playing the track "somewhere" inside the pregap (so it's like a runway for the plane).
Now, about the data. Later it was decided to store the data on CDs as well. Since the data needs an additional error correction, it was decided to scramble (encode) the data sector contents and burn to the disc in a scrambled form. Each 2352-bytes data sector contains 16-bytes header, the rest is a scrambled data. Header keeps its own MSF address, which matches the sub part.
About the offsets. Since the positioning of the laser isn't precise, when you burn the data on a disc or when you read from the disc, the data is always shifted into a constant amount of sectors. So, reading the data from the sector #0 (AMSF 00:02:00) won't give you the same sector #0 as it was in the master image for burning, because the data was moved closer to the beginning or closer to the end of a CD - once during the burning (the amount of sectors the data is shifted during the burning is called Write Offset and is unique for each CD-Writer) and once during the reading (the amount of sectors the data is shifted during the reading is called Read Offset and is also unique for each CD-Reader; read and write offset for the same drive usually differ). Write+Read offset is called Combined offset and we need to know exactly this value to compensate both offsets when reading the audio tracks in EAC with offset correction.
About the "junk". Of course, the data sectors aren't the same as the audio sectors, you can't play them "as is". Imagine, that the writing offset of the CD burner was +100 (and all the data was shifted 100*4=400 bytes closer to the leadout [1 sample=4bytes]) and the reading offset of the CD reader is -80 (so it shifts the data 80*4=320 bytes closer to the lead-out). Combined offset is +100+(-80)=+20. It means, that when we read the sector #0 (2352 bytes), we get +20*4=80 bytes from the previous sector, then it goes a 16-bytes header of the sector #0 (AMSF 00:02:00), then the 2352-16-80 bytes of the sector itself, the last 80 bytes of the sector are missing, since they were shifted into the next sector. Drive's firmware contains a descrambler, which decodes the data after the reading. Since it can't decode the incomplete sector, it reads the next sector and takes the missing 80 bytes from there, then descrambles the whole sector and you see the contents of the data sector #0 in your reading program. When it reads the sector #1, it also automatically reads the next sector to get the missing bytes for the proper descrambling.
13th-Mar-2012 03:18 pm (UTC)
"logically it is a series of small 2352-bytes long pieces" - yes, but the keyword here is "logically".

The actual frames (as in 33-byte F3 frames) contain shuffled data, so even if you read them one by one these will not represent a single logical piece. The control bytes however, where subs are stored, are calculated prior to shuffling and added as-is, so these are continuous. In other words, subs are stored along with user data in frames, but are not tied together.

When the drive does CIRC processing it restores the original byte ordering, and now it has to figure out how to rejoin subs and data. And there are no rules on how to do that, so there isn't a "proper" way, really. Hence all the audio offsets.

So, when you look at audio it's just stream of data and a separate stream of subchannels, stored along each other but otherwise not tied very closely. Just enough to more-or-less figure out the addressing. And this is why digital data sectors, as in proper sectors, come with sync markers, headers, own addressing, and don't have to be placed perfectly since all these things allow us to find each sector anyway.
12th-Mar-2012 03:46 pm (UTC)
Anonymous
OK, let's imagine our disc has 2 tracks - 1st track is data and the 2nd track is audio. You read the data sector by sector as was described above. The program reads the very last sector of the data track, takes the missing bytes from the next sector (audio), descrambles the data and reads the next sector as audio. Since the next sector is audio (the first sector of the 2nd track), it doesn't have any header, so the drive just reads it "as is". But since it has the 80 shifted bytes from the previous data sector, you will see them as "junk" or "garbage" bytes at the very beginning of the track. So your image now have these bytes in 2 copies: as a part of the data track and as a part of audio track. If the offset is +588 or larger, you can even take the last 588 bytes of the junk and manually unscramble it - you will get the exact copy of the last sector of the data track. That's why offset correction is needed: to shift all the data back properly (I'd say un-shift). For this, you need to read any data sector without descrambling and count the amount of bytes between the beginning of the physical sector and the first byte of the sync of the corresponding header. You need to either read the data with Plextor and its 0xD8 command (reads the data as audio) or you can insert the Audio CD and swap it with the data one without using the eject button, so the drive will "think" the disc inside is audio and no descrambling is needed (drive reads the types of the tracks from TOC only once, when you load the tray).
PCE problem. The example above demonstrated the case, when the combined offset is positive. But in your case, it was negatve and all the data was shifted closer to the lead-in. So the very end of each audio track, located before the data track, now contains a number of the bytes shifted from the following data track. Such sectors (marked as audio, but contain data) can confuse certain models of drives, so they can show wrong subchannel sectors, incorrect data or something. So, if we're talking about the proper dumps, you need to read everything without descrambling.
29th-May-2012 04:52 pm (UTC) - emu
Anonymous
and your emulator?
when will a new version?
This page was loaded May 22nd 2013, 3:41 am GMT.