Abstract: A system capable of speech gap modulation is configured to: receive at least one composite speech portion, which comprises at least one speech portion and at least one dynamic-gap portion, wherein the speech portion(s) comprising at least one variable-value speech portion, wherein the dynamic-gap portion(s) associated with a pause in speech; receive at least one synchronization point, wherein synchronization point(s) is associating a point in time in the composite speech portion(s) and a point in time in other media portion(s); and modulate dynamic-gap portion(s), based at least partially on the at variable-value speech portion(s), and on the point(s), thereby generating at least one modulated composite speech portion. This facilitates improved synchronization of the modulated composite speech portion(s) and the other media portion(s) at the synchronization point(s), when combining the other media portion(s) and the audio-format modulated composite speech portion(s) into a synchronized multimedia output.