Typically, such translations are processed in two stages, the first is transcribing the audio into text, sometimes with timecode to create subtitles. But most often it is just the text of an interview or presentation. At the second stage, the actual translation of the text into the desired language is performed.
We took several videos from our archive of completed translations (here
) and compared the length of the video
and the decrypted text volume. In total, I left three different audio
files in the study: the presentation of the company`s plant, the interview in German with a professor, and the script for a TV couch game (yes, such texts are also being translated).
We got the following averaged results:
For a business text, when shots show general views, production, working (silent) people - the rate of speech turned out to be low - 60 words
For interviews, the pace was higher - 120 wpm. For fun, I looked at similar orders in Korean, Japanese, and French and came to the conclusion that with constant "speaking", this is the common rate of speech.
For the script, the pace was even higher - 145 wpm, but this style of conversation is definitely not the norm.
What is the practical use of such research?
If we take the average speech rate equal to 100 words per minute, then we can a priori calculate
the volume of the future text in the decryption and, based on this information, instruct the translator not to decrypt the text in the language of the audio carrier, but immediately write down the translation
into the required language.
It is quite possible that the found ratio of 100 words per minute, which is approximately equal to 720 characters in the transcript, will help translators more accurately estimate the amount of work to be done.