This has been totally pissing me off trying to find detailed info on Voice Frame Size in VoIP applications.
Audio is encoded in increments called frames, with the typical frame size being 10ms. Hence the packet size, or number of frames per packet, is a measure of how much audio is sent in each IP packet. Experience has shown that a 20-ms packet is a good compromise between audio quality and bandwidth consumption. Reducing to 10ms doubles the number of packets put onto the network, but only 10ms of audio can be lost when a packet fails to reach its destination or arrives out of order. Going beyond 20ms reduces the number of packets put onto the network, but there is greater potential for poor voice quality when there is high network loss.