The sound, actually sounds very similar. Average sound apparently sounds as if all sounds are playing simultaneously. I must add however, that due to averaging the audio tracks the total volume goes down a lot. It’s simple math: an average number can never be higher than any of the numbers you are averaging over. Since sound in TV Series isn’t at max volume all the time (there are also quiet parts) the average audio volume will be lower. I cheated a bit here and normalized the audio afterwards so it becomes easier to hear and you don’t need to turn your speakers/headphones to max volume (and blow off your ears later if you forget to lower the volume after)! The more tracks you average the lower the volume is likely to become.
The thing with average frames is that you’re looking at a mix, but most of the pixels aren’t actually present in the original frames… it’s something new. After reading about 100 Special Moments by Jason Salavon I wanted to try out the median too. The median is a different metric that takes the middle value from a list of ordered values. So (almost) every pixel value in the median output is also in one of the input frames. The same with audio. Every timestep one audio sample is picked from the 24 audio tracks. So the Median Friends Season 1 Episode is composed of tiny samples from all 24 episodes that are the most ‘neutral’ representation at that time.
This looks and sounds a lot different than the average episode! Much more contrasty and vibrant.
Again I normalized the audio afterwards. To me it sounds a bit like a broken radio!
This is the code used: