Tuesday, August 12, 2014

PXL 2000 video signal format

For this post, we will look at the analog audio/video signals for the PXL 2000 camcorder, reverse engineer the signal formats, and build a working decoder.

Decoding the PXL 2000 Audio/Video Signal

I don't know if you remember these, but the PXL2000 is a handheld camcorder which was unique in that it recorded onto standard audio cassette tapes.



My camcorder no longer works, so I thought it  would be nice if there were some software that would convert or decode the analog signal on the tapes to a modern movie file. Granted, I could just fix my PXL 2000 camcorder, but I was curious how the PLX worked. :)

Since I couldn't find any existing software, I decided to put a little research into creating a decoder.

The first step is to reverse-engineering the raw analog signal format on my cassette tapes.

Fortunately I still have a working cassette player.   I just plugged this into my computer and digitized a section of tape.

I sampled the pxl 2000 video data at 192 khtz and viewed it in a wave editor. (44.1 khtz resolution will not work). The signal was VERY high pitched.

I boosted the video channel as much as possible without clipping. Here are screen shots of the wave in a wave editor:



(wide zoom)


(medium zoom)




(close zoom)





(192 khtz sample points in relation to signal size)
This is good.  There is a definite repeating pattern in the signal.  I was a little bit familiar with NTSC (which I expected) but the signal didn't look like anything I had seen before.  It looked like the PXL used it's own proprietary video signal.

Summary of signal format (rough guesses):

1. One of the stereo channels is used for video, one is used for audio (looks like standard analog wav format).

2. Looking at a wide zoom, it looks like amplitude is used to store video data. The entire video signal is roughly all at a constant frequency. (On my sample, there are occasionally small dc offsets in the AM...maybe because the tape is so old and it's bleeding over from audio channel?).

3. Long pulse every 92 packets probably demarcates an image frame (roughly every .5 sec at regular tape speed). This matches what is known about the frame rate. If the video is running about 15 fps, that means the data for a 92x110 video frame must be compressed within roughly 9/15 seconds (tape runs about 9x in camcorder)....just not enough room for any fancy encoding. Note, the long pulse is equal to two AM packets. Note sync signals are proportional to surrounding amplitude (seems exactly 5x larger than regular signal...may be usable video data I'd think it's unlikely).

4. Looking at the medium zoom, the small pulse signal probably demarcates a row of pixels (looks to be 110 oscillations in between). Amplitude modulation in between this sync signal probably describes brightness/darkness of 110 pixels. likely brightness(i) = posterize(amplitude(i), 8). These are probably all painted/recorded in real time, as opposed to buffering the pixels for a single time-slice. If you notice, occasionally there are sharp changes in the signal from a row the next row.

5. It is possible rows may be interlaced (note that some pixel rows appears to repeat a pattern ... halves sometime look like could align). The 110 length packet could be split in the middle...each half describing even and odd rows. Although: the images frames transition into the next very smoothly, which would suggest interlacing (perhaps an s-shaped path down and up?).


The video signal does not look exactly like NTSC to me...although it seems similar. The signal looks roughly like:









[long pulse signal about 230 oscillations long]
[
 [AM signal 110 oscillations] [5 small pulses] 
 [AM signal 110 oscillations] [5 small pulses] 
 ... 92 total AM packets... 
]
[long pulse signal about 230 oscillations long] 
[
 [AM signal 110 oscillations] [5 small pulses] 
 [AM signal 110 oscillations] [5 small pulses] 
 ... 92 total AM packets... 
]
...repeats....

So the video signal probably maps to:



[image frame sync signal]
[
 [row of 110 pixels] [sync signal] 
 [row of 110 pixels] [sync signal]
 ... 92 rows total ... 
]
[next image frame sync signal] 
...and so on...

Although I was puzzled why there are only 110 oscillations, as several people have reported 90x120 video. If that is true, I'd expect  90 packets of 120 oscillations -- unless I just can't count :).


Also, I  looked closer the sampled wave (@ 44.1 khtz) and noticed an odd pattern. The first two packets and last packet of the frame have a regular wave pattern (which is more easily seen at the lower sample rate).




(close up...regular patterns unlikely to hold interesting data.  Or black due to frame edge bleed.)

If this is significant, it only leaves 89 regular packets for data. This is odd, since it would be hard to explain where the 90th pixel's data is stored (if there are 90 pixels).

It looks like the long pulse might hold data, but that would be kinda silly (in my opinion). Maybe what is happening is that the tape speed changes slightly as the circuit prepares to ramp up for the large signal. Or, an edge of the image may always be dark, due to the camera.




In some cases the signal is so weak that no peaks are present (perhaps just my recording is bad, but I've tried to boost the signal as much as possible) So, only the large sync peaks can be reliably detected. It will be necessary to keep an average time between sync peaks for when the signal vanishes, or always divide the packet into 110 parts. See Figure:

 
It appears some signal that is bleeding over from the other tracks. Here is a clip of the audio and video data. You can see the video appears to have bled over into the audio track and vice-versa. Plus, as a tape sits for a long time, the tape will be sandwiched in a roll of tape that may transfer a magnetic signal to the next loop. makes me think the dc offset can be ignored... there doesn't seem to be any pattern to it.

Generally the audio/video signal makes sense, though oddly the data part seems slightly smaller than it should be.   However, the pixels on a TV aren't square, and it would be difficult to count them on a TV (as it is also hard to count them on a tape signal).
Building a decoder:
1. The hardest part on building a software converter would be parsing data from the slightly damaged analog signal.    The parser would need to be able to 
    a. detect relative peaks (primary AM signal)
    b. detect relative sync regions (regions louder than relative data)
    c. extract wave audio on second track
    d. handle damanged audio/video signal (missing signal, dc offset, clipping, etc)
Though, once the peaks/inflection points are extracted, I'd expect putting those back into an image would  be much more straight forward.
I did test out the Java Sound API a while back, but didn't think it was stable enough to build an analog parser with (at the time).
--
update 2014-09-08
I ran a quick test using java to decode the video, testing with a few random (sequential) frames.  This was a bit easier than I expected ...  I think I see my old drum set (the drums had clear heads with o-rings.   I think the dark spot is the 'tone control' or whatever it's called).  :)
This seems to confirm the basic video format, though needs quite a bit of tuning to clean up the sync:




 
I used the high and low points of the wave to construct the row, effectively doubling the width of number of pixels.  So, to fix the aspect ratio, it displays each row twice.   The signal was *not* interlaced (it was just coincidental that my first batch of wave samples were symmetrical).
--

Sample Decoded Video

Update 2014-09-10
 
I decoded a small sample of signal, and stitched the frames back together with avconv:
    
        avconv -r 15 -i frame_%05d.png movie.flv
It is definitely my old drum set:



The black/white values are inverted from what I initially thought.  A high signal is black, a low signal is white.   Which I suppose makes more sense from a storage perspective... you generally won't film the sun; filming a black or dark image is more common.
Aside from  tuning, now the decoder needs to parse audio (left track) and merge it with data (right track) at 15 frames/second.  
--
Update 2014-09-29
As suggested by T.Ishimuni, using the first derivative of the AM signal looks better than using the straight AM signal.  The straight AM signal looks a bit grainy to me, and I think is likely more distorted by  DC offset.   I included a patch so that the decoder can find either the first derivative (default) or direct AM signal.
--

PXL 2000 Decoder Software

I published all the code on github (GPL open source).  Code and documentation is here:


This decoder can convert a PXL 2000 video signal from either a wav file or line-in to digital video.   In theory, you may be able to recover signal from tapes that no longer play in a PXL 2000 camcorder (with proper boost/compression).

Screenshot:



Features:

  • can decode from line-in or wav file
  • shows preview of decoded video
  • brightness/contrast control
  • speed control
  • sync controls tab (allow fine-grain tuning for your specific signal)
  • converts video signal to png frames
  • resamples audio to normal speed
  • creates a sample avconv script (with calculated fps) that will create a video file
  • saves time code of each frame
  • offers both GUI and command line modes

Requirements:

  • Java JDK 6+ to compile, and 
  • You'll need something like avconv or ffmpeg to merge the decoded png's and audio to a video format. 
  • If you use a wav file, the decoder is currently tuned for  stereo 16-bit audio sampled at 192khtz.
The stable code is all the default "master" branch.  Any other branch should be functional but is more experimental.
--
Update 10/3/2018

Michael Turvey started a new project on github, using a FFT for sync detection... a great idea!  The project goal is to get the highest quality image from the analog signals.  Project details are here:

https://github.com/mwturvey/pxl2000_Magic



17 comments:

T.Ishimuni said...

Terribly interesting.
I started writing a decoder some years ago, but gave up for lack of data (and time). I got horizontal sync pulses to line up OK, but the image was garbage, more or less.

From the U.S. patent filing, 4875107, it looks like the data is actually FM encoded.

The little waveform data I was actually able to get my hands on looked a bit different from yours. I think 44kHz should be enough to sample the tapes at, since they run the tape at ~8X normal speed in order to get ~90kHz. Dividing that by 8 should make it easy to recover signal at 1X.

I think that sampling at 192kHz, you might even be seeing the tape bias itself! (again keeping in mind that it's recorded at ~8X). So the tape's signal is AM encoded, but apparently the video is FM encoded. Go figure.

In the little data that I have, and having just read a bit of the patent filing, I looked at the spectrum of a couple of sample frames, and I seem to see what could be the FM carrier around 14kHz.

If you're interested in bandying about a bit of raw waveform data and C code in the interest of making a software decoder, let me know.

Regards,
Tim

Kevin said...

Ah, you are right... according to the patent it should be FM encoding. That is very odd. Thank you for the information. :)

Initially I tried 44kHz sampling, but If I recall, the signal looked like a zig/zag line.

One thing I also wondered about is if the final product may deviate from the patent or specs. For example, if a prototype is patented, and then the engineers are forced to refine the product. Or if there are multiple versions of the camecorder.

I was wanting to write an open source decoder, though wasn't sure about the best way to extract AM peaks. If you have C code, I could check it out. I haven't done much more research because of lack of time as well. :)

T.Ishimuni said...

Good point about deviation - I don't doubt it. Although the patent doesn't even specify a few key parameters. No sweat.

I'd be happy to exchange code - maybe offline until there is something worth trumpeting. What I have is highly experimental and does only a single frame, without demodulating it properly.

I'd also love to see a few frames worth of your raw waveform data. Ping me if you like, out of band at tim dot my last name at gmail.

Cheers,
Tim

David Sutherland said...

Would love to see this signal decoded. I suspect you'd get a lot more interest from the greater PXL 2000 community if they heard about what you were doing.

Unknown said...

Hello David & Tim!
I have the PXL 2000 & tapes. I am interested in getting them converted to dvd or simply to PC. Have you guys figured this out yet? I know noting about this stuff & am looking for someone or a way to do myself.

Thanks
Nickol

David Sutherland said...

Nickol - perhaps you could record a very clean input signal with your camera to tape and then sample the tape output and upload it as a sample for other coders to improve the decode algorithm.

If you are interested in selling your camera please let me know. I'd like to make sample signal sets to help improve the decoding software.

Mike said...

I just stumbled across this a few days ago. I've got a number of old tapes that I've been trying to digitize for a while now. My biggest issue has been getting something that can take the RF output signal of the PXL2000 camcorder. NTSC just isn't so common any more, and the solutions I've tried don't seem to like the PXL2000's signal very much. Running across this, it seems like a much better way to go for digitizing the recordings than capturing the output of the camcorder's playback. I just ordered a cassette player with a line out, so I should be able to digitize some more samples soon. However, after playing with the included WAV files in Audacity, it occurs to me that these files might still be pretty high quality. If I understand properly, one of the biggest issues looks to be determining where the line sync boundaries are. If you view the spectrogram of the single frame file, then the frame sync's light up. Knowing that the frames are evenly spaced should also help in generating a better image.

Mike said...

Looking at the signal some more, including some fresh recordings, the video format is becoming more clear. The pixels are AM modulated on a 20 KHz signal (assuming a standard cassette playback speed here), but the sync signals are not AM encoded. Instead, the sync signals are a ~14.5 KHz signal. Knowing this, I think it's possible to come up with a much better algorithm to detect the sync pulses. Looking at some of my old tapes, I also am seeing a lot of places where the 20 KHz pixel signal is not picked up. I'm not sure if this is because of poor equipment, tape degradation, or (more likely) both. The frame sync pulse is almost exactly 2500 samples at 192KHz sampling rate, and the line sync's are approx. 70 samples at a 192KHz sampling rate.

David Sutherland said...

Mike, it's great to have someone with your technical insight looking at the problem. Keep it up!

Do you think you can fork the code and add to it? Perhaps port it to a language you are more familiar with?

maybe we can find @sevkeifert to update his code, or someone else who could work with you on updating his if you aren't into coding.

https://github.com/sevkeifert/pxl-2000-decoder

David Sutherland said...

Or are you the "Mike" already credited?:

Kevin Seifert (Java code, AM analysis)
Mike Leslie (C code, FM analysis)

Kevin said...

Interesting point :-) That idea could improve the sync detection.

I noticed the frequency change on the sync pulse too. At first I wasn't sure if the recorder was designed to do this, or if some kind of battery drain might happen when writing a stronger signal (maybe the sync circuit is affecting power to the motor?). Though seems unlikely to just be a random power issue, since I'd expect the recorded frequency to speed up, not slow down in that case.

So it could look for both a sharp spike in amplitude, combined with a drop in avg frequency. The only tricky part is how to handle cases where the tape goes to zero. Maybe it stops tracking frequency if signal is too low (below a fixed threshold).

Kevin said...

Actually, it's been a while since I looked at the code, but it is factoring in frequency changes (using the previous algorithm described).

Here are the relevant lines:

// note: sync signal slows down slightly.
// scaled value up by the larger tick counts between peaks
int ticks = peakTickData[i];
int svalue = peakDeltaData[i] * (ticks - minTick)^freqScale;
int absvalue = Math.abs(svalue);

So IOW, rather than looking for a "minimax" between two values (amplitude and frequency), it multiplies two values (amplitude * distance between signal peaks) and is looking for a basic max in the combined product. The freqScale is just a multiplier to give the frequency component more (or less) weight. So possibly, tuning of freqScale of the could improve sync detection as well.

Mike said...

Interesting, I'll have to play with those values to see if I can tweak it so it will read my tapes. Thanks for chiming in with the pointers.

I was thinking I might want to try an FFT to filter out the different signals, at least to pick out the sync signals. I'd assume that the original hardware does something similar with a bandpass filter

I was also thinking that if an entire frame is captured and processed at once, there's a lot of redundant information that can be used to make the signal less fuzzy. I.e. I'm hoping that each scan line is a very constant duration. If that holds, then the locations of all of the horizontal scan lines can be "combined" in a sense to get a very stable idea of where each of them should be, and hopefully generate a much cleaner picture, with much less risk of shifting any line even one pixel to the left or right. Crossing my fingers on this one.

But talk is cheap. I forked the github a few days ago and started looking at the code. Hopefully I'll have some time to try some code this weekend.

David-- That was a different Mike. I can't take any credit for the great work that's been done on this project to date.

Mike said...

As it turns out, using an FFT for determining the location of the sync pulses works quite well. As I got deeper into the problem space, I ended up starting a new project that you can find here: https://github.com/mwturvey/pxl2000_Magic/

Instead of treating the source as a stream, it makes multiple passes across the whole dataset to do the decoding, creating intermediate datasets with each pass (generate FFT data, find sync's, find frames, etc.) And it's really slow because of the FFT. But, the approach seems very robust to even poor quality recordings, given what I've tested. Since it knows the sync boundaries of a single line, it can handle the cases where the AM signal basically flattens out to a line gracefully. To get the pixel data, the algorithm I'm using just runs a sliding window across the scanline and finds the value of the steepest slope in that sliding window. That becomes the pixel value. The number of steps the sliding window takes to move across the scanline is the number of horizontal pixels.

Kevin said...

Very cool! I'll check out the source. :-)

Kevin said...

I checked out the code, it's very good. I'm still analyzing it. Thanks for working on this! :-)

If I understand your overall approach, the FFT is used to explicitly find signals that are ~15khtz? (Or whatever the sync region frequency is exactly).

Maybe I could improve the streaming algorithm, without doing a full FFT. If I design a function that spikes when 15khtz is encountered, it could act as an amplifier for the sync regions.

For example, I'd just need a function like:

amplifier(peak1,peak2) = 100, when signal is around 15khtz.
amplifier(peak1,peak2) = 1, otherwise

Or more specifically:

amplified_sync_signal = raw_signal_amplitude * constant1 / (constant2 + |current_peak_time - previous_peak_time - delta_15khtz|)

"constant1" and "constant2" just control the range of the function.
"delta_15khtz" is the expected distance between peaks when frequency is 15khtz.

So then, when the signal approaches 15htz, the denominator would approach zero, causing the entire multiplier to spike. That could greatly amplify the regions of signal where frequency goes to 15khtz, and help pick out sync peaks in real time.

I won't have time to look at this for a few weeks though

Kevin said...

Well, I tested a couple quick patches for tracking frequency, though I'm getting less accurate results than my initial algorithm. :-)

I haven't tried merging in a full FFT scan though.

For me, the current sync detection -- max(frequency * amplitude) -- is getting around 0% sync misses. At least, with the sound samples I was working with.

Really, 192khz resolution is almost a bit too low for recording the signal. Since there are only 4-5 samples between data segment peaks, and 6-7 samples between the sync region peaks. Those are pretty low numbers for accuracy.