Decoding the PXL 2000 Audio/Video Signal
I don't know if you remember these, but the PXL2000 is a handheld camcorder which was unique in that it recorded onto standard audio cassette tapes.My camcorder no longer works, so I thought it would be nice if there were some software that would convert or decode the analog signal on the tapes to a modern movie file. Granted, I could just fix my PXL 2000 camcorder, but I was curious how the PLX worked. :)
Since I couldn't find any existing software, I decided to put a little research into creating a decoder.
The first step is to reverse-engineering the raw analog signal format on my cassette tapes.
Fortunately I still have a working cassette player. I just plugged this into my computer and digitized a section of tape.
I sampled the pxl 2000 video data at 192 khtz and viewed it in a wave editor. (44.1 khtz resolution will not work). The signal was VERY high pitched.
I boosted the video channel as much as possible without clipping. Here are screen shots of the wave in a wave editor:
(wide zoom)
(medium zoom)
(close zoom)
(192 khtz sample points in relation to signal size)
Summary of signal format (rough guesses):
1. One of the stereo channels is used for video, one is used for audio (looks like standard analog wav format).
3. Long pulse every 92 packets probably demarcates an image frame (roughly every .5 sec at regular tape speed). This matches what is known about the frame rate. If the video is running about 15 fps, that means the data for a 92x110 video frame must be compressed within roughly 9/15 seconds (tape runs about 9x in camcorder)....just not enough room for any fancy encoding. Note, the long pulse is equal to two AM packets. Note sync signals are proportional to surrounding amplitude (seems exactly 5x larger than regular signal...may be usable video data I'd think it's unlikely).
4. Looking at the medium zoom, the small pulse signal probably demarcates a row of pixels (looks to be 110 oscillations in between). Amplitude modulation in between this sync signal probably describes brightness/darkness of 110 pixels. likely brightness(i) = posterize(amplitude(i), 8). These are probably all painted/recorded in real time, as opposed to buffering the pixels for a single time-slice. If you notice, occasionally there are sharp changes in the signal from a row the next row.
5. It is possible rows may be interlaced (note that some pixel rows appears to repeat a pattern ... halves sometime look like could align). The 110 length packet could be split in the middle...each half describing even and odd rows. Although: the images frames transition into the next very smoothly, which would suggest interlacing (perhaps an s-shaped path down and up?).
The video signal does not look exactly like NTSC to me...although it seems similar. The signal looks roughly like:
[long pulse signal about 230 oscillations long] [ [AM signal 110 oscillations] [5 small pulses] [AM signal 110 oscillations] [5 small pulses] ... 92 total AM packets... ] [long pulse signal about 230 oscillations long] [ [AM signal 110 oscillations] [5 small pulses] [AM signal 110 oscillations] [5 small pulses] ... 92 total AM packets... ] ...repeats....
So the video signal probably maps to:
[image frame sync signal] [ [row of 110 pixels] [sync signal] [row of 110 pixels] [sync signal] ... 92 rows total ... ] [next image frame sync signal] ...and so on...
Although I was puzzled why there are only 110 oscillations, as several people have reported 90x120 video. If that is true, I'd expect 90 packets of 120 oscillations -- unless I just can't count :).
Also, I looked closer the sampled wave (@ 44.1 khtz) and noticed an odd pattern. The first two packets and last packet of the frame have a regular wave pattern (which is more easily seen at the lower sample rate).
(close up...regular patterns unlikely to hold interesting data. Or black due to frame edge bleed.)
It looks like the long pulse might hold data, but that would be kinda silly (in my opinion). Maybe what is happening is that the tape speed changes slightly as the circuit prepares to ramp up for the large signal. Or, an edge of the image may always be dark, due to the camera.
In some cases the signal is so weak that no peaks are present (perhaps just my recording is bad, but I've tried to boost the signal as much as possible) So, only the large sync peaks can be reliably detected. It will be necessary to keep an average time between sync peaks for when the signal vanishes, or always divide the packet into 110 parts. See Figure:
Sample Decoded Video
PXL 2000 Decoder Software
I published all the code on github (GPL open source). Code and documentation is here:
Screenshot:
Features:
- can decode from line-in or wav file
- shows preview of decoded video
- brightness/contrast control
- speed control
- sync controls tab (allow fine-grain tuning for your specific signal)
- converts video signal to png frames
- resamples audio to normal speed
- creates a sample avconv script (with calculated fps) that will create a video file
- saves time code of each frame
- offers both GUI and command line modes
Requirements:
- Java JDK 6+ to compile, and
- You'll need something like avconv or ffmpeg to merge the decoded png's and audio to a video format.
- If you use a wav file, the decoder is currently tuned for stereo 16-bit audio sampled at 192khtz.
Update 10/3/2018
Michael Turvey started a new project on github, using a FFT for sync detection... a great idea! The project goal is to get the highest quality image from the analog signals. Project details are here:
https://github.com/mwturvey/pxl2000_Magic
Michael Turvey started a new project on github, using a FFT for sync detection... a great idea! The project goal is to get the highest quality image from the analog signals. Project details are here:
https://github.com/mwturvey/pxl2000_Magic
17 comments:
Terribly interesting.
I started writing a decoder some years ago, but gave up for lack of data (and time). I got horizontal sync pulses to line up OK, but the image was garbage, more or less.
From the U.S. patent filing, 4875107, it looks like the data is actually FM encoded.
The little waveform data I was actually able to get my hands on looked a bit different from yours. I think 44kHz should be enough to sample the tapes at, since they run the tape at ~8X normal speed in order to get ~90kHz. Dividing that by 8 should make it easy to recover signal at 1X.
I think that sampling at 192kHz, you might even be seeing the tape bias itself! (again keeping in mind that it's recorded at ~8X). So the tape's signal is AM encoded, but apparently the video is FM encoded. Go figure.
In the little data that I have, and having just read a bit of the patent filing, I looked at the spectrum of a couple of sample frames, and I seem to see what could be the FM carrier around 14kHz.
If you're interested in bandying about a bit of raw waveform data and C code in the interest of making a software decoder, let me know.
Regards,
Tim
Ah, you are right... according to the patent it should be FM encoding. That is very odd. Thank you for the information. :)
Initially I tried 44kHz sampling, but If I recall, the signal looked like a zig/zag line.
One thing I also wondered about is if the final product may deviate from the patent or specs. For example, if a prototype is patented, and then the engineers are forced to refine the product. Or if there are multiple versions of the camecorder.
I was wanting to write an open source decoder, though wasn't sure about the best way to extract AM peaks. If you have C code, I could check it out. I haven't done much more research because of lack of time as well. :)
Good point about deviation - I don't doubt it. Although the patent doesn't even specify a few key parameters. No sweat.
I'd be happy to exchange code - maybe offline until there is something worth trumpeting. What I have is highly experimental and does only a single frame, without demodulating it properly.
I'd also love to see a few frames worth of your raw waveform data. Ping me if you like, out of band at tim dot my last name at gmail.
Cheers,
Tim
Would love to see this signal decoded. I suspect you'd get a lot more interest from the greater PXL 2000 community if they heard about what you were doing.
Hello David & Tim!
I have the PXL 2000 & tapes. I am interested in getting them converted to dvd or simply to PC. Have you guys figured this out yet? I know noting about this stuff & am looking for someone or a way to do myself.
Thanks
Nickol
Nickol - perhaps you could record a very clean input signal with your camera to tape and then sample the tape output and upload it as a sample for other coders to improve the decode algorithm.
If you are interested in selling your camera please let me know. I'd like to make sample signal sets to help improve the decoding software.
I just stumbled across this a few days ago. I've got a number of old tapes that I've been trying to digitize for a while now. My biggest issue has been getting something that can take the RF output signal of the PXL2000 camcorder. NTSC just isn't so common any more, and the solutions I've tried don't seem to like the PXL2000's signal very much. Running across this, it seems like a much better way to go for digitizing the recordings than capturing the output of the camcorder's playback. I just ordered a cassette player with a line out, so I should be able to digitize some more samples soon. However, after playing with the included WAV files in Audacity, it occurs to me that these files might still be pretty high quality. If I understand properly, one of the biggest issues looks to be determining where the line sync boundaries are. If you view the spectrogram of the single frame file, then the frame sync's light up. Knowing that the frames are evenly spaced should also help in generating a better image.
Looking at the signal some more, including some fresh recordings, the video format is becoming more clear. The pixels are AM modulated on a 20 KHz signal (assuming a standard cassette playback speed here), but the sync signals are not AM encoded. Instead, the sync signals are a ~14.5 KHz signal. Knowing this, I think it's possible to come up with a much better algorithm to detect the sync pulses. Looking at some of my old tapes, I also am seeing a lot of places where the 20 KHz pixel signal is not picked up. I'm not sure if this is because of poor equipment, tape degradation, or (more likely) both. The frame sync pulse is almost exactly 2500 samples at 192KHz sampling rate, and the line sync's are approx. 70 samples at a 192KHz sampling rate.
Mike, it's great to have someone with your technical insight looking at the problem. Keep it up!
Do you think you can fork the code and add to it? Perhaps port it to a language you are more familiar with?
maybe we can find @sevkeifert to update his code, or someone else who could work with you on updating his if you aren't into coding.
https://github.com/sevkeifert/pxl-2000-decoder
Or are you the "Mike" already credited?:
Kevin Seifert (Java code, AM analysis)
Mike Leslie (C code, FM analysis)
Interesting point :-) That idea could improve the sync detection.
I noticed the frequency change on the sync pulse too. At first I wasn't sure if the recorder was designed to do this, or if some kind of battery drain might happen when writing a stronger signal (maybe the sync circuit is affecting power to the motor?). Though seems unlikely to just be a random power issue, since I'd expect the recorded frequency to speed up, not slow down in that case.
So it could look for both a sharp spike in amplitude, combined with a drop in avg frequency. The only tricky part is how to handle cases where the tape goes to zero. Maybe it stops tracking frequency if signal is too low (below a fixed threshold).
Actually, it's been a while since I looked at the code, but it is factoring in frequency changes (using the previous algorithm described).
Here are the relevant lines:
// note: sync signal slows down slightly.
// scaled value up by the larger tick counts between peaks
int ticks = peakTickData[i];
int svalue = peakDeltaData[i] * (ticks - minTick)^freqScale;
int absvalue = Math.abs(svalue);
So IOW, rather than looking for a "minimax" between two values (amplitude and frequency), it multiplies two values (amplitude * distance between signal peaks) and is looking for a basic max in the combined product. The freqScale is just a multiplier to give the frequency component more (or less) weight. So possibly, tuning of freqScale of the could improve sync detection as well.
Interesting, I'll have to play with those values to see if I can tweak it so it will read my tapes. Thanks for chiming in with the pointers.
I was thinking I might want to try an FFT to filter out the different signals, at least to pick out the sync signals. I'd assume that the original hardware does something similar with a bandpass filter
I was also thinking that if an entire frame is captured and processed at once, there's a lot of redundant information that can be used to make the signal less fuzzy. I.e. I'm hoping that each scan line is a very constant duration. If that holds, then the locations of all of the horizontal scan lines can be "combined" in a sense to get a very stable idea of where each of them should be, and hopefully generate a much cleaner picture, with much less risk of shifting any line even one pixel to the left or right. Crossing my fingers on this one.
But talk is cheap. I forked the github a few days ago and started looking at the code. Hopefully I'll have some time to try some code this weekend.
David-- That was a different Mike. I can't take any credit for the great work that's been done on this project to date.
As it turns out, using an FFT for determining the location of the sync pulses works quite well. As I got deeper into the problem space, I ended up starting a new project that you can find here: https://github.com/mwturvey/pxl2000_Magic/
Instead of treating the source as a stream, it makes multiple passes across the whole dataset to do the decoding, creating intermediate datasets with each pass (generate FFT data, find sync's, find frames, etc.) And it's really slow because of the FFT. But, the approach seems very robust to even poor quality recordings, given what I've tested. Since it knows the sync boundaries of a single line, it can handle the cases where the AM signal basically flattens out to a line gracefully. To get the pixel data, the algorithm I'm using just runs a sliding window across the scanline and finds the value of the steepest slope in that sliding window. That becomes the pixel value. The number of steps the sliding window takes to move across the scanline is the number of horizontal pixels.
Very cool! I'll check out the source. :-)
I checked out the code, it's very good. I'm still analyzing it. Thanks for working on this! :-)
If I understand your overall approach, the FFT is used to explicitly find signals that are ~15khtz? (Or whatever the sync region frequency is exactly).
Maybe I could improve the streaming algorithm, without doing a full FFT. If I design a function that spikes when 15khtz is encountered, it could act as an amplifier for the sync regions.
For example, I'd just need a function like:
amplifier(peak1,peak2) = 100, when signal is around 15khtz.
amplifier(peak1,peak2) = 1, otherwise
Or more specifically:
amplified_sync_signal = raw_signal_amplitude * constant1 / (constant2 + |current_peak_time - previous_peak_time - delta_15khtz|)
"constant1" and "constant2" just control the range of the function.
"delta_15khtz" is the expected distance between peaks when frequency is 15khtz.
So then, when the signal approaches 15htz, the denominator would approach zero, causing the entire multiplier to spike. That could greatly amplify the regions of signal where frequency goes to 15khtz, and help pick out sync peaks in real time.
I won't have time to look at this for a few weeks though
Well, I tested a couple quick patches for tracking frequency, though I'm getting less accurate results than my initial algorithm. :-)
I haven't tried merging in a full FFT scan though.
For me, the current sync detection -- max(frequency * amplitude) -- is getting around 0% sync misses. At least, with the sound samples I was working with.
Really, 192khz resolution is almost a bit too low for recording the signal. Since there are only 4-5 samples between data segment peaks, and 6-7 samples between the sync region peaks. Those are pretty low numbers for accuracy.
Post a Comment