Archived from groups: comp.sys.cbm,comp.emulators.cbm,rec.games.video.classic (
More info?)
On Sun, 24 Jul 2005 22:41:33 -0400, Pete Rittwage <peter@rittwage.com>
wrote:
>
>Anyone that has any insight into Activision's later protections (beyond
>the Pirateslayer years), please contact me. It looks like maybe density
>changes within the tracks or something.
I was thinking about this several days ago, and came up with an idea
that is not trivial, but I believe would be the be-all-end-all way to
address intratrack density changes, as well as other very difficult
protections such as timed sync lengths, missing short syncs, sync-less
tracks, etc.
Addressing with G64 just a single game such as you mention would
involve the format supporting density changes in-track, assuming the
program reads only certain parts at certain densities, and figuring
out which parts to sample at which density. Such a process I think
will have to have manual intervention per program, at least as tedious
as hacking it, and hard to be quite confident it is right.
The problem we have with even current mnib+emulator format is the 1541
is doing too much processing, so we do not have pure enough
information. By the time we pull a byte from the shift register, the
info on the read signal itself has already been lost, going through
the edge sensing to generate 1s, density-dependent generation of 0s,
and sync sensing. The ideas here would instead require upfront work
of a new even lower-level emulator format, trivial hardware change to
small hardware circuit, and mnib updates, but would offer very high
confidence of really solving the problem for arbitrary games.
Background
Info is based on 1541 Repair and Maintenance Manual and Inside
Commodore DOS. Any references to chips assume the original 1541 'long
board', not the 1541 II 'short board' which coalesced some chips into
PLDs.
Read signal from the 1541 is amplified and converted to TTL, available
on pin 7 of UH4 (LM311 voltage comparator). This is then used in
conjunction with UG2 (74LS86 XOR), UG3 (9602 one-shot), and UF6
(72LS74 D flip-flop) to detect read signal transition. The result is
UG3 pin 10 output is a 1 pulse to indicate read signal transition, 0
for no transition. Thus far the operation is density independent.
UE7 (74LS193 counter) divides the 16Mhz clock by a value from 13-16,
depending on the density setting in the VIA PB6:5. This then
determines the rate at which 0's will be clocked out to the shift
register if no transition occurs. At this point we're already
dependent on what density the program expects, so no need to look
higher into counter to resync to each transition, the shift register
and sync detection.
1) Synchronous read signal sampling
Sampling the read signal itself takes any density, sync, etc. out of
the picture. Given that density determines a 13-16 divider of a 16Mhz
clock, sampling in a density independent fashion requires sampling at
16Mhz. In theory, just hook one of those parallel port input pins to
UH4 pin 7, and sample it at >16Mhz sampling rate. This would provide
raw data capable of reading anything a 1541 disk track could have used
for protection. (in theory, a read signal glitch of sub-66ns could be
missed by this but still have been caught by the 1541 transition
detection, but this would have been too short to have been reliably
written by drives or possibly even represented on the media).
In practice, 16Mhz sampling may be no problem for modern logic
analyzers, but no standard PC hardware is going to be capable of
achieving it. Latencies of FSB->PCI or proprietary NB/SB link->ISA
likely mean an IN opcode will take 100s of ns to a couple us.
Additional hardware would be needed to make true 16Mhz sampling
feasible.
a) Shift register(s)
To reduce the necessary sampling rate by a factor of 8, add a small
circuit with shift register such as 74164 or 74595, where UH4 pin 7 is
clocked into the shift register on each transition of the 1541 16Mhz
clock (TBD - can such a load be placed on the oscillator?). A counter
such as 74193 to count to 8 clocks and latch the shift register would
also be needed, as well as a T flip-flop which would be toggled after
each 8 clocks (delay flip-flop transition slightly from latching shift
reg). Parallel output of the shift register goes to parallel data,
flip-flop output goes to a handshake. Mnib16Mhz polls the parallel
handshake, and when it has transitioned, samples the 8 bits of data.
This is much less intensive than direct 16Mhz sampling, but for
reliable operation still requires guaranteeing 2 IN opcodes in <500ns
- not very likely on a parallel port on the other side of an ISA
bridge.
An alternative which could make that sampling feasible though is to
not use parallel port for sampling. IDE also operates at TTL levels,
is 16 bits wide, one hop closer to the CPU, and was intended for
higher speed data. Cascade 2 shift registers together. Capture 15
bits at a time, use 1 bit for the toggle flip-flop (optionally use 14
bits and also add 1 bit for an index hole sensor - might as well solve
track synchronization protection for good too). Dedicate an IDE
channel to this since it will definitely not be a real IDE device and
would interfere with anything else on the channel. Setup the channel
for PIO mode 4 (via the BIOS or chipset-specific registers) to get
shortest cycle times. Mnib16Mhz polls IDE data register (0x01f0 or
0x0170) for T flip-flop transition, on transition stores the 14 or 15
bits. 14 bit sampling in this manner would require a single IN from
the data port take < 1000/(16/14) =~870ns. PIO mode 4 has a 120ns
cycle time on the IDE bus, plus FSB, PCI, etc. delays. I believe most
modern hardware can achieve this a fast enough IN for this.
This all assumes no other sources of significant delay. A high speed
CPU with instructions cached running with ints disabled would be
necessary. Data cache misses would not be critical - misses would be
stores, which will likely be buffered on modern CPUs and not delay
execution. No other memory delays could be tolerated though. No XMS
copies, no BIOS calls for EMS, no hard or even soft protected mode
page faults. The only >1M memory access methods under DOS that may be
fast enough would be RealMem, non-paged protected mode if you can get
it, or possibly EMS without BIOS usage.
b) Counter
Since we know the 16Mhz rate is much higher than real bit rate of any
density used, we expect to generally have runs of 13*4, 14*4, 15*4, or
16*4 of the same read signal (this amount of data could later be
trivially RLEed). The assumption here is that we'll always have runs
of such sizes (a safe assumption unless a disk used sub-bit-time
glitches), so why not reduce necessary sampling rate by a hardware
counter to encode runs.
Would need an 8-bit counter clocked off 16Mhz, a way to buffer/latch
counter output (TBD), and a read signal transition detection (UG3 pin
10 probably usable) to latch the counter value and toggle a flip-flop
to indicate new data ready. A D flip flop would be used as well to
latch the read signal level. To deal with counter overflow on very
long runs, another counter could trigger at 64 to also latch counter
output and toggle the T flip flop (but not change the D flip flop).
This would provide a 6 bit run counter, run level bit, and bit which
transitions on each new count ready. Assuming no sub-bit-time
glitches, sampling just parallel port data in this fashion would be
safe at an IN latency of <~3us. Using handshakes for level and T flip
flop could get an 8 bit run but require another IN, moving to IDE
instead of parallel could allow 8 bit run and other 2 bits in a single
IN.
Same concerns about no other meaningful delays such as memory paging
would still apply.
2) Asynchronous read-signal sampling
Just sampling (and time stamping) as fast as we can *might* still be
good enough as long as we can significantly beat 3.25us. The trick is
to achieve >4Mhz sampling (since in the 1541 a counter counts to 4 and
resyncs on each read signal transition, so this could buy us a factor
of 4 breathing room). This is not as guaranteed as 16Mhz sampling and
requires slightly more complex software, but if this fast of sampling
is possible, there is no need for circuitry to provide a hardware
assist.
Connect read-signal UH4 pin 7 to the fastest reasonable TTL input on
the PC - ie. a dedicated IDE channel operating in PIO mode 4. Poll
IDE data as fast as possible. Along with the read signal bit level,
store a timestamp. The legacy 8254 takes at least 2 8 bit ISA reads
to read, so it is right out. The only timestamp source fast enough is
the free-running clock counter present on CPUs since Pentium,
accessible via RDTSC opcode (assumes 32 bit mode, this gets timestamp
to edx:eax). The timer has oddities, and may be read out of order
with other opcodes, will not advance when halted, etc. As long as it
advances when blocked waiting for an IN (I believe it does), the rest
can be dealt with though.
Post processing would then need to construct a simulated 16Mhz sampled
track based on the samples, timestamps, and time for a track rotation
(easier with that index hole sensor). (RDTSC values would need
converted into normal time using CPU clock frequency, which can be
tough to calculate accurately) As long as IN sampling is <250ns, no
transitions would be missed, though the time of transition may be off
from reality by some sub-bit-time. This timing requirement is very
difficult, but IDE on the right hardware might make it feasible. It
would be worth a shot to check if 4Mhz sampling can be achieved before
considering extra hardware for true 16Mhz sampling.
Reading a raw disk
If the arduous timing requirements can be met, the rest is easy by
comparison. Raw disk data would be stored out as quickly as possible
in >1M memory, storing multiple rotations. Checking/triming track
match would have to be fuzzy by at least +/-1 count of a given read
signal run to account for reading it the second time through sampling
starting at a slightly different point, and variations in rotation
speed. Index hole sensing could narrow down point to start looking
for track repeat.
Raw .16mhz emulator disk
This lower level data would mean yet another disk format, though a
conceptually simple one. For a track, it would contain # of bits
(need not be multiple of 8), and store raw bit data for signal level.
Size would vary based on how many samples were picked up when reading
(ie vary by drive RPM), but at 300RPM would be
16000000/5bits=400KB/track, so a 16MB image for a 40 track disk. Of
course this could be trivially compressed with RLE. A simple fixed
1bit value/7bit run would get down to <1-2MB, slightly smarter storage
methods could still improve significantly on that.
Emulator support would need added for this. I do not know how complex
or pluggable the disk image model is in VICE or CCS64, but again this
is at least conceptually straightforward. A track pointer would
advance through the signal level at a fixed rate simulating 300RPM,
and transition detection, count-to-4/resynch on transition, and
clocking 0 bits based on current density settings would need emulated.
I'm assuming emulation of byte-ready/sync/shift register port are
already there, the new module would just be the means to supply their
values.
The format could likewise be translated to G64 by running through the
tracks at given densities.
Effectiveness
It's a dangerous statement to make, but if 1541 disks were sampled and
emulated in this manner, I can't think of a disk track based
protection that could not be handled. Sync lengths would be
faithfully reproduced, short syncs wouldn't cause problems, syncless
tracks wouldn't matter. Custom formats and changing density within a
track would be non-issues.
I've heard rumors of protections using >2 0 (non-transition) bits in a
row and exploiting some bug where a false 1 bit could be introduced,
so the protection is to reread the track multiple times and see that
the 1 bit positions don't always match. I cannot confirm the validity
of that rumor, but even if true, with the read signal level stored
this could be emulated by randomly introducing false transitions to
the emulator during a long non-transition run.
The only disk protection I know of which couldn't be addressed by this
is relying on inter-track synchronization, but having a defined track
start point, emulating track position+time for stepper movement, and
reading from the index hole would address that.
Development
This idea is pretty involved, requiring high speed sampling, probably
extra hardware, and a new emulator disk format. There may be smarter
ways to achieve the kind of low-level emulation I'm talking about, but
I don't believe there is a higher level alternative you could be
confident in applying to an arbitrary protected disk.
If you want to discuss this or have other ideas and want to bounce
them off me, that would be great. If you decide to implement
something here, I could help with logic design of the circuits, but am
not the one to help design in discretes such as resistors or
capacitors if needed, let alone touch a soldering iron (though this
*might* be breadboardable). I'd be up for tackling VICE usage of a
raw format too (I'm assuming CCS64 is not open source).
Aaron