In RealHearing we aim to provide the best diarisation1 of the speaker. Although current models only work for asynchronous transcription, we are working to improve real-time transcription.
Some factors can affect the accuracy of speaker diarisation models. Talk time is one of them. If a speaker speaks for less than 15 seconds, it can be difficult for the model to accurately identify him or her. Generally, it is recommended that a speaker speaks for more than 30 seconds for accurate identification.
The pace and type of communication also have an impact on accuracy. If the conversation is clear and orderly, and if there is little background noise, the model is more likely to correctly tag each speaker. Therefore, for proper functioning, it is essential to speak clearly, maintain an appropriate pace, avoid interruptions and that speakers who are not actively participating turn off their microphones.
 Diarisation: The process of identifying and tagging speakers in an audio file or voice recording. The goal of diarisation is to separate an audio recording into individual segments for each speaker and assign a unique tag to each speaker.