Like the infinite monkeys typing Shakespeare, but with audio instead.
If there was a program that created a series of sounds at random intervals, pitches, amplitudes, etc., how long would it take to produce an output that sounds like music, some sort of recognisable recording (e.g. a bell ring, a dog barking), or perhaps even a human voice?


I think it is safe to say that OP’s question was lay speak for “what is the mean time to get to a result”. Other than that I don’t think you actually addressed the question.
Let me try to get it started:
Randomly generating music might be akin to password cracking. Cracking short or simple passwords can be very fast, while cracking long or complex passwords can be very long. The rate of password guessing also affects the time to get a result.
To calculate an answer, we need the following information:
You might be able to take a genre of music, and decompose the songs within to get some answers… I don’t have the time for that. Anyone want to take a stab at estimating the calculation?