The development of voice-to-text conversion software scans several decades. Until the late 1990s, software and hardware limitations required slow and unnatural dictation, where each word was analysed and reproduced from a dictionary database. This was a sacrifice the majority of radiologists were unwilling to make.

Most, if not all, modern speech engines use a combination of acoustic conversion where phonemes (unique language sounds) are converted to text, and language model software, which tracks historical word and phrase connections. The product of these two methods eventually produces the textual radiology report. With such advances the user can, and should, dictate naturally and continuously. However, this quantum step has had its own limitations and difficulties.

SR debate

Hospital administrators consider speech recognition as a modern muse that offers the dual advantages of markedly improved turnaround time and tremendous cost savings. But on the clinical side it is viewed by some as ‘the siren song of Parthenope’, luring unsuspecting radiologists to the rocks of decreased productivity and distraction during image interpretation. Moreover, the advantages of SR accrue only indirectly to the radiologist in most healthcare systems, whereas the disadvantages sit squarely on their shoulders.

Anecdotally, radiologist resistance is less prevalent in Europe than in the US. However, this misalignment of incentives must be addressed for successful SR implementation. Steps can be taken to minimise the disadvantages of SR for radiologists, significantly improving user acceptance of a system burdened withmixed popularity.

Speech engine accuracy

Clearly, improved recognition accuracy should be everyone’s ultimate goal. Vendors are developing software improvements incrementally, but much accuracy improvement can be achieved by the user. Basic knowledge of the SR software is a prerequisite for good accuracy. Modern speech engines perform best when given contextual phrases for analysis rather than individual words. Thus, a continuous dictation style will yield much better results than a choppy interrupted one.

“Those who have populated the sidelines waiting only for speech engine advances are ill advised. Even 100% accuracy would not be sufficient for successful SR.”

This remains true for corrections as well. Rather than attempting to correct an individual word, a phrase should be selected to give the speech engine a better chance at success. The software retains historical corrections by individual users. Active correction and training of problem words and phrases in the first months of use will go a long way towards long-term improved accuracy and user acceptance.

Faulty microphone position is a frequent cause of transcription error. The last thing a radiologist is thinking about is holding a hand-held microphone in the correct position as the head is turned and the eyes scan images for subtle abnormalities across multiple monitors. For this reason, many have found the use of a headset microphone to be beneficial. Most of these microphones have inherent noise-cancelling features.

Attention should be paid to the acoustic environment. Sudden background noises such as a slamming door or loud overhead page often have a deleterious effect on accuracy. This can be subtle and unnoticed by the unsophisticated user and may only manifest itself in the form of delayed recognition, requiring slowing or even a pause in dictation and voice commands.

An unsuspecting user may continue to dictate at the same speed, causing chaotic software behaviour. The use of acoustic ceiling tiles, carpeting and noise dampening wall panels is helpful. In extreme cases inexpensive white noise generators or room-size noise-cancelling devices have proven beneficial.

Those who have populated the sidelines waiting only for speech engine advances are ill advised. Even 100% accuracy would not be sufficient for successful SR. There are other features of speech recognition that must be evaluated and understood.

Macros and templates

Optimisation of SR requires heavy use of macros and templates. These are stored textual reports or report fragments that are instantiated by the user in lieu of full dictation. For a normal study this reduces dictation time from up to several minutes to near zero. Even the dictation of a study with one or several abnormalities can start with a normal template and then be modified using speech commands and dictation. An additional advantage of template use is that proofreading time is markedly decreased.

“With this technology, the need for paper requisitions or data entry by the radiologist within the reading room is virtually eliminated.”

Sophisticated users often have hundreds of different templates. Consequently, these should be named in a systematic manner consistent across all modalities to minimise confusion in choosing a template by voice command. The more advanced modern SR systems will determine the exact template from RIS data and automatically present it to the user for any needed modification.

Templates should be created with this type of modification in mind. Each organ system or concept should be a separate sentence or paragraph that can be easily selected and modified by the user using voice command only. Templates need not be limited to full reports. Any frequently used sentence or even a problem word should be recorded and saved for future use.

Numerical data in a report such as a fetal ultrasound can be easily dictated or typed into a template. This could even be performed by the technologist or a clerical helper presenting the radiologist with a nearly complete report awaiting any necessary modification. Future interoperability may allow automatic insertion of numerical data from the modality or the PACS to the SR report without human intervention.

PACS navigation

Navigation and control of PACS is an ongoing issue for radiologists. A recent survey revealed navigation to be one of the top three areas of user dissatisfaction with image interpretation systems. The addition of SR to this mix adds an entirely new dimension. Most radiologists create a report simultaneously while viewing and interpreting images. Thus, any combined navigation devices must control PACS and SR at the same time. These new and more complex requirements have rendered the conventional mouse/keyboard combination parachronistic.

The QWERTY keyboard is a vestige of the mechanical typewriter invented in the latter half of the 19th century. The mouse was invented and introduced for general use in the 1960s. While the point-and-draw capabilities of the mouse are still useful in dealing with 2D images, newer devices will ultimately be needed for advanced visualisation work.

The keyboard has not fared so well. The absolute requirement for visual continuity during image interpretation, as well as the need for one hand to control the mouse, requires complete replacement of the current keyboard with an alternative completely controllable by the other hand. This has been successfully accomplished by a number of radiologists using third-party mechanical devices linked to PACS and dictation software.

“The ultimate goal is to view images, create and finalise a report and navigate a work list without the user’s eyes ever leaving the images.”

The ultimate goal is to view images, create and finalise a report and navigate a work list without the user’s eyes ever leaving the images. The use of visual icons, pull-down menus or any mechanical device requiring visual search is unacceptable under these circumstances. A multifunction mouse in one hand and a programmable mechanical device with haptic control and scrolling capability in the other in the other has been used in the Geisinger practice for years. The mechanical devices can be
used to control mainly PACS while voice commands can and should be used for the necessary navigation of the SR system.

Voice commands are being investigated for control of PACS as well, which may have limited functionality for the same reason that voice commands for control of cars on motorways are not used. The separation of whether one is dictating, commanding SR or commanding PACS may prove to be too complex and distracting for the average radiologist. Ultimately, any navigation advances must consider report creation as an integral part of the PACS software and hardware user interface.

Systems interoperability

The terms interface and integration have been used inconsistently for a number of years in informatics. Of late, interoperability has gained traction. Whatever the terminology, the concept of the sharing of information across different systems is a requirement for efficient use of SR.

Bidirectional interoperability between SR, the radiology information system (RIS) and PACS is a necessity and has been moderately well perfected in most cases. When a specific exam accession number is created within RIS, that data is sent to the SR system and a work list of undictated cases is created awaiting radiologist input. When the user opens an unread case in PACS a software call is made to SR for the accession number. The dictation shell is now available for text input via dictation or template.

When the radiology report is finalised, the text is sent to the RIS archive. The RIS is typically responsible for the distribution of the report to the electronic medical record (EMR) and hence the ordering healthcare providers. A report finalisation message is also sent to PACS stating the case should be closed and marked read. Many PACS at this point have the capability of automatically opening the next available case on the work list and repeating the cycle. Thus, a nice rhythm can be created for the user whereby images are viewed, the study is dictated and no other navigation or input is necessary.

With this workflow the need for paper requisitions or any type of data entry by the radiologist within the reading room is virtually eliminated. Interoperability between these three often disparate information systems creates improved radiologist efficiency.

If only it was that simple. Complexities multiply when more than one accession number (for example, chest, abdomen and pelvis CT studies) must be dictated. Resident workflow and addenda must be accommodated as well. These are the issues that separate an excellent SR system from a merely adequate one.

“Some vendor software can include associated workflow benefits that may counteract otherwise compromised productivity.”

Workflow advances

The implementation of SR need not be a net negative for the radiologist. Some vendor software can include associated workflow benefits that may counteract otherwise compromised productivity. SR systems can include critical results communication packages that decrease the radiologist’s time and distraction in streamlining the frequent pages and phone calls to ordering providers.

Decision support software can be included within speech recognition. This can facilitate instant access to a favourite textbook, online information, interactive atlas or a third-party decision-support vendor.

Newer systems are beginning to include more sophisticated speech engines with natural language processing allowing easier manipulation of macros and templates. Ultimately, these advances may reverse the time negative workflow consequences of many SR systems.

The conversion to SR is one that carries potentially negative consequences for end-user productivity and accuracy. Administrators wishing to implement this software must be cognisant of this and take steps to minimise disadvantages of SR for the radiologist. This should include a quiet and comfortable environment, adequate training and support, and perhaps incentives – financial or otherwise – for SR use.

Purchase of associated software packages such as decision support and critical results communication must be considered as well. Ultimately, the success or failure of such a project lies with the acceptance and support of the radiologist.