Tuesday, August 12, 2014

PXL 2000 video signal format

UPDATE: For the experimental signal decoder (and source code), check out  http://sevkeifert.blogspot.com/2014/09/pxl-2000-decoder.html

A decade ago I was cleaning out some out stuff and and ran across my (broken) PXL2000 and a box full of old cassette videos.

I don't know if you remember these, but the PXL2000 is a handheld camcorder which was unique in that it recorded onto standard audio cassette tapes.

I thought it  would be nice if there were some software that would convert or decode the analog signal on the tapes to a modern movie file. Since I couldn't find one, I decided to put a little research into making one. I could just restore my pxl 2000....also I'm curious how the pxl works. :)

First I spent a couple hours reverse-engineering the raw analog signal format on my cassette tapes.  Fortunately I still have a working cassette player.   I just plugged this into my computer and digitized a section of tape.

I sampled the pxl 2000 video data at 192 khtz and viewed it in a wave editor. (44.1 khtz resolution will not work). The signal was VERY high pitched.

I boosted the video channel as much as possible without clipping.) Here are screen shots of the wave in a wave editor:

(wide zoom)

(medium zoom)

(close zoom)

(192 khtz sample points in relation to signal size)
I was a little bit familiar with NTSC (which I expected) but the signal didn't look like anything I had seen before.  It looked like PXL used it's own proprietary video signal.

Summary of signal format:

1. It looks like  one of the stereo channels is used for video, one is used for audio (standard analog wav format).

2. Looking at a wide zoom, it looks like amplitude is used to store video data. The entire video signal is roughly all at a constant frequency. (On my sample, there are occasionally small dc offsets in the AM...maybe because the tape is so old and it's bleeding over from audio channel?).

3. Long pulse every 92 packets probably demarcates an image frame (roughly every .5 sec at regular tape speed). This matches what is known about the frame rate. If the video is running about 15 fps, that means the data for a 92x110 video frame must be compressed within roughly 9/15 seconds (tape runs about 9x in camcorder)....just not enough room for any fancy encoding. Note, the long pulse is equal to two AM packets. Note sync signals are proportional to surrounding amplitude (seems exactly 5x larger than regular signal...may be usable video data I'd think it's unlikely).

4. Looking at the medium zoom, the small pulse signal probably demarcates a row of pixels (looks to be 110 oscillations in between). Amplitude modulation in between this sync signal probably describes brightness/darkness of 110 pixels. likely brightness(i) = posterize(amplitude(i), 8). These are probably all painted/recorded in real time, as opposed to buffering the pixels for a single time-slice. If you notice, occasionally there are sharp changes in the signal from a row the next row.

5. It is possible rows may be interlaced (note that some pixel rows appears to repeat a pattern ... halves sometime look like could align). The 110 length packet could be split in the middle...each half describing even and odd rows. Although: the images frames transition into the next very smoothly, which would suggest interlacing (perhaps an s-shaped path down and up?).

The video signal does not look exactly like NTSC to me...although it seems similar. The signal looks roughly like:

[long pulse signal about 230 oscillations long]
 [AM signal 110 oscillations] [5 small pulses] 
 [AM signal 110 oscillations] [5 small pulses] 
 ... 92 total AM packets... 
[long pulse signal about 230 oscillations long] 
 [AM signal 110 oscillations] [5 small pulses] 
 [AM signal 110 oscillations] [5 small pulses] 
 ... 92 total AM packets... 

So the video signal probably maps to:

[image frame sync signal]
 [row of 110 pixels] [sync signal] 
 [row of 110 pixels] [sync signal]
 ... 92 rows total ... 
[next image frame sync signal] 
...and so on...

Although I was puzzled why there are only 110 oscillations, as several people have reported 90x120 video. If that is true, I'd expect  90 packets of 120 oscillations -- unless I just can't count :).

Also, I  looked closer the sampled wave (@ 44.1 khtz) and noticed an odd pattern. The first two packets and last packet of the frame have a regular wave pattern (which is more easily seen at the lower sample rate).

(close up...regular patterns unlikely to hold interesting data.  Or black due to frame edge bleed.)

If this is significant, it only leaves 89 regular packets for data. This is odd, since it would be hard to explain where the 90th pixel's data is stored (if there are 90 pixels).

It looks like the long pulse might hold data, but that would be kinda silly (in my opinion). Maybe what is happening is that the tape speed changes slightly as the circuit prepares to ramp up for the large signal. Or, an edge of the image may always be dark, due to the camera.

In some cases the signal is so weak that no peaks are present (perhaps just my recording is bad, but I've tried to boost the signal as much as possible) So, only the large sync peaks can be reliably detected. It will be necessary to keep an average time between sync peaks for when the signal vanishes, or always divide the packet into 110 parts. See Figure:

It appears some signal that is bleeding over from the other tracks. Here is a clip of the audio and video data. You can see the video appears to have bled over into the audio track and vice-versa. Plus, as a tape sits for a long time, the tape will be sandwiched in a roll of tape that may transfer a magnetic signal to the next loop. makes me think the dc offset can be ignored... there doesn't seem to be any pattern to it.

Generally the audio/video signal makes sense, though oddly the data part seems slightly smaller than it should be.   However, the pixels on a TV aren't square, and it would be difficult to count them on a TV (as it is also hard to count them on a tape signal).
Building a decoder:
1. The hardest part on building a software converter would be parsing data from the slightly damaged analog signal.    The parser would need to be able to 
    a. detect relative peaks (primary AM signal)
    b. detect relative sync regions (regions louder than relative data)
    c. extract wave audio on second track
    d. handle damanged audio/video signal (missing signal, dc offset, clipping, etc)
Though, once the peaks/inflection points are extracted, I'd expect putting those back into an image would  be much more straight forward.
I did test out the Java Sound API a while back, but didn't think it was stable enough to build an analog parser with (at the time).
update 2014-09-08
I ran a quick test using java to decode the video, testing with a few random (sequential) frames.  This was a bit easier than I expected ...  I think I see my old drum set (the drums had clear heads with o-rings.   I think the dark spot is the 'tone control' or whatever it's called).  :)
This seems to confirm the basic video format, though needs quite a bit of tuning to clean up the sync:

I used the high and low points of the wave to construct the row, effectively doubling the width of number of pixels.  So, to fix the aspect ratio, it displays each row twice.   The signal was *not* interlaced (it was just coincidental that my first batch of wave samples were symmetrical).
Update 2014-09-10
I decoded a small sample of signal, and stitched the frames back together with avconv:
        avconv -r 15 -i frame_%05d.png movie.flv
It is definitely my old drum set:

The black/white values are inverted from what I initially thought.  A high signal is black, a low signal is white.   Which I suppose makes more sense from a storage perspective... you generally won't film the sun; filming a black or dark image is more common.
Aside from  tuning, now the decoder needs to parse audio (left track) and merge it with data (right track) at 15 frames/second.  
Update 2014-09-29
As suggested by T.Ishimuni, using the first derivative of the AM signal looks better than using the straight AM signal.  The straight AM signal looks a bit grainy to me, and I think is likely more distorted by  DC offset. 


T.Ishimuni said...

Terribly interesting.
I started writing a decoder some years ago, but gave up for lack of data (and time). I got horizontal sync pulses to line up OK, but the image was garbage, more or less.

From the U.S. patent filing, 4875107, it looks like the data is actually FM encoded.

The little waveform data I was actually able to get my hands on looked a bit different from yours. I think 44kHz should be enough to sample the tapes at, since they run the tape at ~8X normal speed in order to get ~90kHz. Dividing that by 8 should make it easy to recover signal at 1X.

I think that sampling at 192kHz, you might even be seeing the tape bias itself! (again keeping in mind that it's recorded at ~8X). So the tape's signal is AM encoded, but apparently the video is FM encoded. Go figure.

In the little data that I have, and having just read a bit of the patent filing, I looked at the spectrum of a couple of sample frames, and I seem to see what could be the FM carrier around 14kHz.

If you're interested in bandying about a bit of raw waveform data and C code in the interest of making a software decoder, let me know.


sevkeifert said...

Ah, you are right... according to the patent it should be FM encoding. That is very odd. Thank you for the information. :)

Initially I tried 44kHz sampling, but If I recall, the signal looked like a zig/zag line.

One thing I also wondered about is if the final product may deviate from the patent or specs. For example, if a prototype is patented, and then the engineers are forced to refine the product. Or if there are multiple versions of the camecorder.

I was wanting to write an open source decoder, though wasn't sure about the best way to extract AM peaks. If you have C code, I could check it out. I haven't done much more research because of lack of time as well. :)

T.Ishimuni said...

Good point about deviation - I don't doubt it. Although the patent doesn't even specify a few key parameters. No sweat.

I'd be happy to exchange code - maybe offline until there is something worth trumpeting. What I have is highly experimental and does only a single frame, without demodulating it properly.

I'd also love to see a few frames worth of your raw waveform data. Ping me if you like, out of band at tim dot my last name at gmail.


David Sutherland said...

Would love to see this signal decoded. I suspect you'd get a lot more interest from the greater PXL 2000 community if they heard about what you were doing.