SYSTEM AND METHOD FOR AUTOMATIC PROVISION AND CREATION OF SPEECH STIMULI FOR TREATMENT OF SPEECH DISORDERS
A method and system for automatically generating speech stimuli for treatment of speech disorders. The method comprises receiving, from a fluency practice device, a feedback respective of a fluency practice session; generating at least one recommendation based in part on the feedback; configuring a fluency practice exercise based on the at least one recommendation; retrieving at least one stimulus based on the configuration; and providing the retrieved at least one stimulus for practicing during the fluency practice exercise.
Latest Novotalk, Ltd. Patents:
- Method and device for detecting speech patterns and errors when practicing fluency shaping techniques
- METHOD AND DEVICE FOR DETECTING SPEECH PATTERNS AND ERRORS WHEN PRACTICING FLUENCY SHAPING TECHNIQUES
- Method and device for detecting speech patterns and errors when practicing fluency shaping techniques
- SYSTEM AND METHOD FOR ENHANCING REMOTE SPEECH FLUENCY THERAPY VIA A SOCIAL MEDIA PLATFORM
- METHOD AND SYSTEM FOR ONLINE AND REMOTE SPEECH DISORDERS THERAPY
This application claims the benefit of U.S. provisional application No. 62/098,355 filed on Dec. 31, 2014, the contents of which are hereby incorporated by reference.
TECHNICAL FIELDThis disclosure generally relates to the field of speech teaching solutions, and more particularly to a system and methods for remotely training persons with speech disorders to speak fluently.
BACKGROUNDSpeech disorders are one of the most prevalent disabilities in the world. Generally, speech disorders are classified as fluency disorders, voice disorders, motor speech disorders, and speech sound disorders. As one example, stuttering is classified as a fluency disorder in the rhythm of speech in which a person knows precisely what to say, but is unable to communicate or speak in accordance with his or her intent.
Many clinical therapy techniques for speech disorders are disclosed in the related art. Conventional techniques for treating speech disorders and, in particular, anti-stuttering techniques, are commonly based on regulating the breath and controlling the rate of speech. To this end, speech therapists train their patients to improve their fluency. Such conventional techniques were found effective, in the short-term, as a speech disorder is predominantly a result of poorly coordinated speech production muscles.
In more details, one common stutter therapy technique is fluency shaping, in which a therapist trains a person (a stuttering patient) to improve his or her speech fluency through the altering of various motor skills. Such skills include the abilities to control breathing; to gently increase, at the beginning of each phrase, vocal volume and laryngeal vibration to speak slower and with prolonged vowel sounds; to enable continuous phonation; and to reduce articulatory pressure.
The speech motor skills are taught in the clinic while the therapist models the behavior and provides verbal feedback as the person learns to perform the motor skill. As the person develops speech motor control, the person increases rate and prosody of his or her speech until it sounds normal. During the final stage of the therapy, when the speech is fluent and sounds normal in the clinic, the person is trained to practice the acquired speech motor skills in his or her everyday life activities.
When fluency shaping therapy is successful, the stuttering is significantly improved or even eliminated. However, this therapy requires continuous training and practice in order to maintain effective speech fluency. As a result, the conventional techniques for practicing fluency shaping therapy are not effective for people suffering from stuttering. This is mainly because not all persons are capable of developing the target speech motor skills in the clinic, and even if such skills are developed, such skills are not easily transferable into everyday conversations. In other word, a patient can learn to speak fluently in the clinic, but will likely revert to stuttering outside of the clinic.
Therefore, the continuous practicing of speech motor skills is key to successful fluency shaping therapy. Consequently, the dependency on therapists and on frequent visits to clinics reduces the success rate of the fluency-shaping therapy. For example, a patient who waits a few days or weeks between therapy sessions may be more prone to stuttering than patients who more frequently attend therapy. Lack of regular practice between therapy sessions further deteriorates the effectiveness of the therapy.
In the related art, various electronic devices are designed to improve the outcome of the anti-stuttering therapies, including fluency-shaping therapy. Such devices are primarily used to reduce the fear and anxiety associated with stuttering, to allow immediate speech fluency, to alter speech muscle activities by altering vocal perception (motoric audition devices), and to develop awareness and control of speech motor skills (biofeedback devices).
A primary disadvantage of existing devices for reducing stuttering is that such devices cannot be used to train patients remotely and, specifically, to remotely train speech motor skills that are essential for the success of a fluency shaping therapy. For example, one electronic device used to reduce stuttering is an electromyography (EMG) device that displays the activity of individual muscles. Using the EMG device outside of the clinics does not provide a real-time indication to the therapist of how the patient performs. Thus, the therapist cannot provide guidelines or modify the therapy session as the patient practices.
The conventional solutions for therapy outside of clinics are very limited in their functionality. Such solutions are typically based on a server computing a speech therapy assessment based on voice data received from a remote device of a person. The speech therapy assessment is performed respective of a specified clinical moderation. Then, a speech therapy technique is suggested to the patient. The conventional solutions for therapy outside of clinics provide basics means for suggesting a therapy. However, existing solutions face challenges in assessing the best therapy method merely by analyzing speech features, as there are no known symptoms for such disorders.
Furthermore, the conventional solutions cannot efficiently implement procedures for fluency shaping therapy. For example, such solutions fail to provide any means for closely monitoring and providing real-time feedback to the patient practicing speech motor skills and overseeing the treatment. As another example, a patient having difficulty to perform one of the exercises may feel frustration, thereby increasing the fear and anxiety associated with patient stuttering. This would achieve the opposite effect of the desired outcome.
It would therefore be advantageous to provide an efficient solution for remote speech disorders therapy.
SUMMARYA summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all embodiments. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term some embodiments may be used herein to refer to a single embodiment or multiple embodiments of the disclosure.
Certain embodiments disclosed herein include a method for automatically generating speech stimuli for treatment of speech disorders. The method comprises receiving, from a fluency practice device, a feedback respective of a fluency practice session; generating at least one recommendation based in part on the feedback; configuring a fluency practice exercise based on the at least one recommendation; retrieving at least one stimulus based on the configuration; and providing the retrieved at least one stimulus for practicing during the fluency practice exercise.
Certain embodiments disclosed herein also include a system for automatically generating speech stimuli for treatment of speech disorders. The system comprises a processing unit; and a memory, the memory containing instructions that, when executed by the processing unit, configure the system to: receive, from a fluency practice device, a feedback respective of a fluency practice session; generate at least one recommendation based in part on the feedback; configure a fluency practice exercise based on the at least one recommendation; retrieve at least one stimulus based on the configuration; and provide the retrieved at least one stimulus for practicing during the fluency practice exercise.
The subject matter disclosed herein is particularly pointed out and distinctly claimed in the claims at the conclusion of the specification. The foregoing and other objects, features, and advantages of the disclosed embodiments will be apparent from the following detailed description taken in conjunction with the accompanying drawings.
It is important to note that the embodiments disclosed herein are only examples of the many advantageous uses of the innovative techniques herein. In general, statements made in the specification of the present application do not necessarily limit any of the various claimed embodiments. Moreover, some statements may apply to some inventive features but not to others. In general, unless otherwise indicated, singular elements may be in plural and vice versa with no loss of generality. In the drawings, like numerals refer to like parts through several views.
The network 110 may be the Internet, the world-wide-web (WWW), a local area network (LAN), a wide area network (WAN), and other networks configured to communicate between the elements of the 110. Each user device 120 may be a personal computer (PC), a personal digital assistant (PDA), a mobile phone, a smart phone, a tablet computer, a wearable computer device, a game console, and the like.
In a non-limiting example, the user device 120-1 is utilized by a person (e.g., a stuttering patient) and will be referred to hereinafter as the “patient device” 120-1, and the user device 120-n is utilized by a speech therapist and will be referred to hereinafter as the “therapist device” 120-n. It should be noted that one or more patient devices can communicate with a single therapist device, and multiple therapist devices can communicate with one or more patient devices. For the sake of simplicity of the discussion, only one therapist device and one therapist device are shown in
Each of the devices 120 is configured to communicate with the server 130. The server 130, according to the disclosed embodiments, is configured to monitor, execute, and control a speech therapy session between the patient device 120-1 and the therapist device 120-n. The interface between the devices 120 and the server 130 may be realized through, for example, a web interface, an application installed on the devices 120, a script executed on each of the devices 120, and the like. In an embodiment, each user device 120 is installed with an agent 125 configured to perform the disclosed techniques. In certain configurations, the agent 125 can operate and be implemented as a stand-alone program and/or can communicate and be integrated with other programs or applications executed in the user device 120. Examples for a stand-alone program may include a web application, a mobile application, and the like.
In an embodiment, an audio/video communication channel may be established between the therapist device 120-n and the patient device 120-1. This enables, for example, the therapist to view and listen to the patient, and to demonstrate to the patient the correct way to perform an exercise. The audio/video communication channel can be a peer-to-peer connection between the devices 120-1 and 120-n or through the server 130. To this end, an audio/video channel is established between the devices 120-1 and 120-n to allow direction communication between the patient and the therapist. In an embodiment, the audio/video channel can be established before or during a therapy session. The channel, in one embodiment, is established over HTTP. In an embodiment, an agent 125 of each respective device 120 is configured to stream video streams from one device to another over the established channel.
It should be noted that the patient using the device 120-1 can practice without the therapist being connected through the device 120-n. The agent 125, in part under the control of the server 130, may be configured to provide an immediate feedback to the patient's performance respective of the preset target specification.
Specifically, as will be discussed in greater detail below, the agent 125 is configured to conduct a fluency shaping therapy. As noted above, such therapy requires exact and specific execution by the patient. To this end, the agent 125 is configured to capture sound samples from that patient device 120-1, to analyze the sound samples, to provide an immediate visual feedback to the patient device 120-1 and preferably also to the therapist device 120-n, and to check whether the patient performance meets a predefined target template.
Each agent 125 ensures that the speech production is timed carefully, continued for a pre-determined amount of time, and produced in a very specific manner with a great deal of control. The visual feedbacks rendered by an agent 125 and displayed over the respective user device 120 guarantee that the patient feedback is only based on the patient's performance. The objective feedbacks allow the patient to speak with the required precision. In an embodiment, the objective feedbacks are realized through visual cues used to define the amount of time to prolong the syllable or word. In an embodiment, colors may be used to illustrate the various elements of voice production. These elements help the patient focus on producing speech that is more exact and, therefore, more correct.
According to some embodiments, the therapy of a person is structured as a course. During the course, the patient learns techniques for improving speech fluency using the system 100. Specifically, the server 130 is configured to authenticate a patient using the patient device 120-1 who wishes to initiate a therapy session. The server 130 retrieves, from the database 140, exercises that should be performed during the session, and sets the agent 125-1 (operable in the patient device 120-1) with the information related to the exercises. If a therapist is also part of the session, the server 130 is configured to also send this information to an agent 125-n (operable in the therapist device 120-n). In this case, the server 130 is further configured to establish a peer-2-peer channel (e.g., over HTTP) between the devices 120-1 and 120-n.
The agent 125-1 is configured to analyze the user's performance relative to the target template. A target template predefines the specifications for performing the expected vocal productions of an exercise. The agent 125-1 is configured to render a visual feedback respective of the user's performance, the target template, and the comparisons' results. The visual feedback can be rendered by the agent 125-n to display on the therapist device 120-n (if connected). In this embodiment, the processing is performed by the agent 125-1, which communicates the results of the processing to the agent 125-n. The agent 125-n renders visual feedback respective of the processing results. In an embodiment, a progress report is generated at the end of each session detailing the patient's performance.
The main purpose of the course is to ease the process and to improve the effectiveness of learning a new manner of speaking which, in turn, leads to more fluent speech patterns. In addition, the server 130 is configured to adjust, in real time, to the patient's progress. The server 130 is further configured to determine, based on the progress reports, progress indicators such as, but not limited to, the patient's current progress level, previous successes, difficulties, and errors. Based on the determined progress indicators, the server 130 is configured to create individualized stimuli for each practice session, thereby personalizing the experience for each user. Therefore, it should be appreciated that the structured, graduated and interactive course would allow a patient to produce fluent speech at a regulated speech rate in different spontaneous speaking situations.
The various embodiments will be discussed in more detail now. Each agent 125-1 may implement a feedback generator (not shown in
In yet another embodiment, the agent 125-1 is configured to generate a breathing indicator displayed on the patient device 120-1. The breathing indicator, once displayed on the patient device 120-1, provides a visual indication of the timing of inhalation or exhalation within a pre-determined time period. Identification and analysis of the use of breathing while practicing fluency shaping techniques helps to improve speech fluency.
In yet another embodiment, each agent 125 (e.g., the agent 125-1) is configured to perform an analysis of fluency shaping and to generate progress reports respective thereof. The analysis is of stimuli production in comparison to a known template of a fluency shaping technique, both during the practice session (analysis of the stimulus) and at the end of the practice session (analysis of all the stimuli in total). It should be noted that an efficient analysis in real time of the speech technique (based on the template) using the outer envelope of the speech signal (a superficial measure) provides the patient with a deeper understanding of his/her speech characteristics. The generated reports can be saved in the database 140 communicatively connected to the server 130.
In yet another embodiment, an agent 125 (e.g., an agent 125-1) is configured to track the patient activity and to report such activity to the server 130. The activity may be tracked with respect to the fluency shaping techniques as practiced by the patient and may include statistical data generated based on the tracked activity. Such data includes, but is not limited to, time spent practicing on a daily, weekly and/or monthly basis; error statistics; breathing statistics; statistics on the practice chats conducted with others; cumulative achieved perfect productions of patterns; and the data. The generated statistical data is saved in the database 140.
In an embodiment, tracking all data enables personalization of the therapy course for each patient, generation of an alert if the patient does not progress, modifications to the therapy course, and/or recommending a course or training session that is appropriate to the level of the patient. It should be appreciated that, by tracking the patient activity, the patient is encouraged to continue his/her practice and to achieve higher scores/ranks.
The therapist using the device 120-n can set a speech therapy course for each patient. Such a course is composed of multiple training sessions to be practiced. Each such session is composed of a set of exercises for practicing and improving the speech motor skills of the patient. In an embodiment, the set of exercises is designed to practice fluency shaping techniques, such as speech rate, phonation, gentle onset, and breathing. For each exercise, the therapist may define target specifications or templates visually presented to the patient via, e.g., the patient device 120. The settings for the course may be saved in the database 140 and can be modified at any time by the therapist.
In an embodiment, the therapist, upon accessing the server 130, may be provided, via the device 120-n, with an interface for setting the course, i.e., the training sessions and their exercises. An exemplary and non-limiting graphical interface 200 for setting various parameters of an exercise template as part of a treatment plan (or course) is depicted in
To start a therapy session, the patient logs in to the server 130 using the device 120-1. The server 130, upon authenticating the patient, retrieves the current therapy session to be practiced from the database 140. The therapy session is displayed, on the patient device 120-1, through an interface (e.g., a web-page) showing the various exercises that the person needs to practice during the therapy session.
When an exercise 310 is selected, the server 130 is configured to render a visual target template 320 respective of the selected exercise and the level set for the user of the patient device 120-1. In an embodiment, the displayed visual target template 320 is timed based on voice production by the patient. In an exemplary implementation, the displayed visual target template is displayed as a shadow. For example, as shown in
It should be appreciated that display of the target template, e.g., as a shadow graph, allows the patient to produce a voice in an attempt to match the target template, thereby improving the efficiency of the exercise by allowing the patient to see the difference between the target performance and the current performance, in an attempt to match the target template.
The produced voice is captured, sampled, analyzed, and compared to the target template. If the comparison results in an error (e.g., if the patient vocal production is not properly captured, if the produced voice is below a threshold, and so on), an error indication is presented to the patient at a location that the error occurs; otherwise, a positive feedback is displayed to the patient. The various embodiments for capturing, sampling, analyzing, and comparing the produced voice to the target template are discussed below.
According to one embodiment, the produced voice may be visually demonstrated to provide an immediate visual feedback about the patient performance. In an embodiment, the visual demonstration may include voice coloring that is achieved by two different colors differentiating between the “softness” and “hardness” of the patient's voice. A visual demonstration may include any color in the color schema, a pattern, an image, and the like. This allows the patient to better understand how the vocal cords are pressed. The voice coloring and the comparisons to the target templates are demonstrated in exemplary
In another embodiment, a breathing indicator (not shown) is displayed to the patient showing the duration of time that the user needs to breathe before trying another target template. The breathing time may be set according to the exercise being performed and the level of the patient. The duration of time can be set by the therapist or automatically by the server 130 or the agent 125-1. Training a patient through breathing (inhaling) in a relaxed manner reduces stuttering. Thus, properly breathing before voice production improves the patient's performance with respect to the target template. In an embodiment, the breathing indicator is displayed as the agent 125-1 identifies that the patient ends voice production.
An exemplary and non-limiting screenshot 500 illustrating the breathing indicator 510 is illustrated in
In yet another embodiment, the agent 125-1 is configured to measure the rate of fluent speech and to provide a visual speed monitor (not shown) on the display. This allows implementation of the fluency shaping techniques in spontaneous speech at different speech rates, e.g., a controlled, fast, and slow speech rate. The visual feedback includes both a “colored” display of the produced voice as discussed above and a rate-meter showing a current speech rate of the patient. The speech rate is measured and displayed as the patient produces sounds. The speech rate may be measured as syllables per second.
It should be appreciated that the speech rate monitor 610 aids in the maintenance of a regular or predetermined speech rate during a practice and, in turn, helps the patient to maintain speech fluency over time. The speech rate monitor 610 gives feedback about the expected rate as well as about the deviations from that expected rate. This monitor can help a patient in transferring the fluency shaping techniques learned to spontaneous speech using a slow-normal rate of speech (standardized/regulated).
It should be noted that some or all of the embodiments described above with respect to the agent 125 can equally be performed by the server 130. For example, the server 130 may receive voice samples, process the samples, and generate the visual feedbacks to the user devices 120. As another example, the server 130 may receive voice samples, process the samples, and send the processing results to the agents for rendering of the visual feedbacks
In some implementations, each of the user devices 120 and the server 130 typically includes a processing system (not shown) connected to a memory (not shown). The memory contains a plurality of instructions that are executed by the processing system. Specifically, the memory may include machine-readable media for storing software. Software shall be construed broadly to mean any type of instructions, whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Instructions may include code (e.g., in source code format, binary code format, executable code format, or any other suitable format of code). The instructions, when executed by the one or more processors, cause the processing system to perform the various functions described herein.
The processing system may comprise or be a component of a larger processing system implemented with one or more processors. The one or more processors may be implemented with any combination of general-purpose microprocessors, microcontrollers, digital signal processors (DSPs), field programmable gate array (FPGAs), programmable logic devices (PLDs), controllers, state machines, gated logic, discrete hardware components, dedicated hardware finite state machines, or any other suitable entities that can perform calculations or other manipulations of information.
It should be understood that the embodiments disclosed herein are not limited to the specific architecture illustrated in
The process begins with audio sampling of the voice produced by a user of the system. The voice, as captured by a microphone 705, is sampled by an audio/digital converter 710. The microphone 705 may be, e.g., a microphone installed on a user device (e.g., the patient device 120-1). The sampling may be performed at a predefined rate. As a non-limiting example, the sampling rate is 800 Hz.
The voice samples produced during a predefined time interval are buffered into a buffer 720 to create voice chunks out of the samples. A duration of a single voice chunk is greater than a duration sample. In an embodiment, the size of each voice chunk may depend on a configuration of the buffer. The voice chunks may be output from the buffer at a predefined rate, for example, 10 Hz. The output voice chunks are then filtered by a low pass filter (LPF) 730 to remove or reduce any noises. In certain configurations, the LPF 730 can be applied prior to chunking of the voice samples, i.e., before the buffer 720.
The voice chunks are converted from the time domain to the frequency domain using a fast Fourier transform (FFT) module 740. Having the signals (voice chunks) in the frequency domain allows for extraction of spectrum features by a spectrum analyzer 750. Analysis of the spectrum features may be utilized to determine the quality and correctness of the voice production.
In an embodiment, the spectrum analyzer 750 extracts spectrum features that are valuable for the processing of the voice production. To this end, the zero edge frequencies may be removed and dominant frequencies may be maintained. In an embodiment, dominant frequencies are frequencies in the spectrum having an absolute amplitude level higher than a predefined threshold. In another embodiment, dominant frequencies are frequencies in the spectrum having an absolute frequency level higher than a predefined threshold. In yet another embodiment, two sets of dominant frequencies are output based on the frequencies and on the amplitudes.
The spectrum analyzer 750 computes the energy level of the dominant frequencies to output an energy level for each voice chunk. The energy may be computed as the average over the dominant frequencies. The computed energy level is represented as an integrated number. In an embodiment, the energy level can be factored by a predefined power. An exemplary energy computation may be seen in Equation 1, below:
Ef(ω1, ω2)=β∫ω1ωR|F(ω)|kdω Equation 1
Where, ‘ωi (i=1, . . . , R) are the number of dominant frequencies in the spectrum. The factor ‘β’ is a predefined number, while the power k′ may be equal to or greater than 2. The computed energy level Ef is of a single voice chunk and is input to a feedback generator 760, an error generator 770, and a rate-meter generator 780.
The feedback generator 760 plots the visual feedback respective of the voice production. The energy of each chunk is a point in the graph illustrating the voice production (for example, see
The volume threshold may be determined during a calibration process of a function of energy measured during silence (Es) and/or during a normal speaking of the user (En). The function can be an average or weighted average of the Es and En values. One non-limiting example for performing the calibration process will be described in detail below.
In a further embodiment, the feedback generator 760 dynamically sets the boundaries of the target template (shadow graph) to visually indicate to the patient when to start and end the voice production. To this end, the feedback generator 760 compares the energy level Ef to the silence energy (Es). When the energy level Ef is greater than the silence energy (Es), the beginning of a voice production may be determined, and the start and finish indicators as well as the shadow graph may be rendered and displayed on the patient device. The finish indicator may be set to be displayed a predefined time interval after the start indicator. An exemplary shadow graph with start and end indicators is shown in
The feedback generator 760 is further configured to display a breathing indicator as the voice production ends. To this end, the feedback generator 760 compares the energy level Ef to the normal production energy (En). When Ef is lower than En, the end of a voice production may be determined, and the breathing indicator may be rendered and displayed on the patient device. An exemplary breathing indicator is illustrated in
The error generator 770 is configured to compare a voice production (between start and finish) to a respective target template. The comparison is for the entire voice production such that all computed energy levels Ef of the voice chunks are buffered and analyzed to detect an error related to the production of the voice. Specifically, the detected errors are related to the patient's performance with respect to various fluency shaping exercises.
Following are non-limiting examples for errors that can be detected: a gentle onset, a soft peak, a gentle offset, a volume control, a pattern usage, a missed of a subsequence voice production, a symmetry of the voice production, a short inhale, a too slow voice production, a too fast voice production, a too short voice production, a long voice production, and an intense peak voice production.
As an example, a “too soft” error indicates that the air-flow between syllables is too low. The detected errors provide the user with an immediate feedback on how she/he may improve her/his voice production. It should be noted that, if no error is detected, a positive feedback may be provided to the user. Various examples for displaying the errors are shown in
In one embodiment, the analysis of the voice production respective of the target pattern is not a one-to-one comparison, but rather checking if the computed energy levels match the target pattern in amplitude and/or direction. In another embodiment, the analysis of the voice production respective of the target pattern is a one-to-one comparison, where matching to target template (graph) is required. In yet another embodiment, both of the comparison approaches can be utilized.
A non-limiting example for detecting a too soft error is now explained with reference to
The rate-meter generator 780 is configured to measure the number of syllables per second in voice production and to render a speech rate monitor. In an embodiment, the rate-meter generator 780 operates in three ranges: controlled, slow, and normal. In order to measure the speech rate, the number of peaks of energy levels (Ef) in the voice production are counted, where each such peak represent a syllable. When measuring the speech rate, the duration of a voice chunk can be shortened relative to other exercises. For example, the voice chunk duration can be changed from 100 msec to 20 msec. An exemplary graphical representation of speech rate monitor generated by the rate-meter generator is shown in
In certain embodiments, the rate-meter generator 780 and/or the error generator 770 can be utilized as a monitor when the patient (or user) is not in a traditional therapy session. For example, if the agent 125-1 is operable in a smart phone of a user, the agent 125-1 by means of the rate-meter generator 780 can be activated and monitor the speech rate of a user using a conversation with another person (e.g., a telephone conversation). When the rate is not according to a threshold of a predefined speech rate (e.g., too slow or fast), a notification may be provided to the user. When the rate is not according to a threshold of a normal speech rate (e.g., too slow or fast), a notification may be provided to the user. The form notification may any form known in the art (e.g., text message, audio message, an image, etc.).
As another limiting example, if the agent 125-1 is operable in a tablet computer of a user, the agent 125-1 by means of the error generator 770 can be activated and monitor the speech respective of any fluency shaping technique previously practiced by the user. The agent acting as a monitor can detect errors during a conversation of a user with another person (e.g., a telephone conversation). The user can be notified during the conversation about these errors. The different type of errors are discussed in above. In an embodiment, such errors are presented as instructive indications (e.g., the indications 440).
In an embodiment, when the conversation ends, the agent 125-1 may be configured to invite the user to practice an exercise or exercises respective of the detected errors. In certain non-limiting implementations, spectrograms 790 can be utilized to analyze the voice productions. Specifically, the spectrograms 790 can be used to identify spoken words phonetically. In a particular embodiment, the spectrograms 790 can be used to identify vowels and consonants in the voice production and to compare the identified vowels and consonants to known vowels and consonants. In an embodiment, the identified vowels and consonants can be utilized in an analysis of at least one stimulus production in comparison to a known template.
The various elements discussed with reference to
At S910, a network communication channel is established between a patient device and a therapist device. The network communication channel can be established as a peer-to-peer connection. In an embodiment, the communication channel is established after authenticating the patient and optionally also the therapist.
At S920, the parameters of a current therapy session are set on the patient device. Such parameters may be retrieved from a database and include at least exercises to be practiced and their respective target templates. The exercises may further include difficulty settings. Each difficulty setting may be associated with an exercise. In an embodiment, the parameters include customized content to be uploaded by the patient and/or the therapist. As an example, the customized content may include text to be read by the patient. The customized content can be uploaded before, during, and/or after the current session. It should be noted that the ability to practice customized content allows the patient to conduct therapy sessions at his/her convenience.
At S930, the patient device is calibrated. In an embodiment, the energy level (Es) during a silence period (during which the patient is prompted to remain quiet) is measured or otherwise computed. The energy level (En) during a normal speaking period (during which the patient is prompted to talk) is measured or otherwise computed. The measurement or computation of an energy level is discussed above. Finally, a calibration energy level (ECAL) is computed as a function of the En and Es. For example, the function can be an average, a weighted average, and so on. In certain embodiments, a calibration factor received from a different device in the proximity of the patient device can be utilized in the determined ECAL.
At S940, as the patient performs each selected exercise and a visual representation of the patient's performance is generated and displayed on the patient and therapist devices. As discussed in detail above, the visual representation includes coloring the voice production, displaying the voice production with respect to a target template, displaying the boundaries when to start and finish a voice production, displaying error and instructive indications, displaying breading indicators, and/or displaying a speech rate-meter.
Optionally, at S950, a video chat is established between the patient and therapist devices. During the video chat, the therapist can demonstrate or instruct the patient how to correctly perform an exercise. Alternatively or collectively, an instructive video clip can be displayed to the user. It should be noted that the therapist can demonstrate or instruct the patient how to correctly perform an exercise using any means of digital content. This includes, for example, text files, images, audio clips, and so on. As noted above, the therapist can further change the difficulty level of the exercises to make them easier or harder.
At S960, the patient performance during the therapy session is logged and sent to a database (e.g., database 140). As a non-limiting example, this allows off-line processing with regard to past performance, determining the progress of the patient, modifying current exercises for the user, adding new exercises, and/or determining the frequency that the user is practicing and the length of each practice session.
It should be appreciated that the qualitative analysis of the patient's performance of the various exercises allows determination of the types of errors and difficulties that the patient repeatedly has. This determination allows for creation of a personalized treatment program that would encourage review of content as needed and match the stimuli in the exercise to the specific difficulties the user is experiencing.
In an embodiment, the visual feedback disclosed herein can be provided through the use of electronic games through which acoustical energy is transformed to visual output (through the analysis of intensity and frequency), by referring to different parameters that are related to speech fluency shaping. Learning and practice using electronic games encourages motivation and cooperation among, for example, children, and provides them with access to the various important elements required to produce fluent speech, which thereby allows for better learning and assimilation.
The various disclosed embodiments have been discussed with a reference to providing visual feedbacks in response to a patient's performance. It should be noted that feedbacks generated in response to a patient's performance can be generated in the form of an auditory feedback, a haptic feedback, and the like.
It should be noted that
Furthermore, the steps of the method 900 are shown in a specific order merely for simplicity purposes and without limitation on the disclosed embodiments. The method steps can be performed in different sequences without departing from the scope of the disclosure. Any or all of the steps of the method 900 may be repeated, preferably in response to user inputs indicating a desire to revisit one or more of the steps.
In the non-limiting screenshot shown in
In the non-limiting screenshot shown in
In S1210, a feedback is received respective of a voice production of a patient. The feedback may be received, in real-time, during a speech therapy session, or may be received respective of a previous speech therapy session. The feedback may be based on an analysis of the voice production including processing of the voice production to evaluate a correct execution of the voice production. The feedback may be utilized to update a user profile of the patient.
In S1220, progress indicators of the patient are determined. The progress indicators may be included in the user profile of the patient and may indicate a current fluency of the patient and/or particular areas of difficulty for the patient. The progress indicators may include, but are not limited to, the patient's current progress level, previous successes, difficulties, errors, and so on.
In S1230, recommendations for stimuli to be practiced during a speech therapy session are generated. The recommendations may indicate practice areas for maintaining the patient's fluency. The recommendations may be generated based on the progress indicators. The recommendations may include suggestions for stimuli including, but not limited to, syllable counts of words; particular letters, syllables, or words; patterns of letters or syllables; transitions between words (e.g., plosives, continuants, clusters, etc.); and so on. The determination may include, but is not limited to, identifying errors respective of the feedback, identifying an area of difficulty respective of the user profile, and so on. In an embodiment, the recommendations may be sent to a therapist via, for example, a therapist device (e.g., the therapist device 160).
In S1240, one or more fluency practice exercise parameters are configured respective of the recommendations. The configuration may include setting parameters for stimuli to be practiced during the exercises. In an embodiment, the fluency practice exercises may be configured automatically based on the recommendations. In another embodiment, the fluency practice exercises may be configured manually by, e.g., a therapist as described further herein above with respect to
In S1250, based on the configuration, one or more stimuli to be practiced during the fluency practice exercises are retrieved. The stimuli may be retrieved from a database (e.g., the database 140) populated with stimuli. In an embodiment, any of the retrieved stimuli may be prepopulated in the database automatically as described further herein below with respect to
In S1260, it is checked whether additional feedback has been received and, if so, execution continues with S1210; otherwise, execution terminates. Checking for additional feedback allows for providing new speech stimuli in real-time responsive to current difficulties of the patient as indicated by the additional feedback.
As a non-limiting example, a patient begins a speech therapy session via a patient device. During the speech therapy session, the patient device is configured to analyze voice productions of the patient and generates a feedback respective thereof. The generated feedback is received in real-time and used to update a user profile of the patient. Progress indicators in the updated user profiled indicating that the patient is having difficulty with longer words including the vowel “U” are identified. Based on the identified progress indicators, recommendations of stimuli to be practiced by the patient are generated. The recommendations include providing stimuli via words having 3 to 4 syllables and focusing exclusively on the vowel “U.” Fluency parameter exercise parameters are configured to include the recommended provisions. Based on the configuration, fluency practice exercises including words such as “superlative,” “united,” “beautiful” and “insurance” are retrieved from a database. The fluency practice exercises being provided to the patient are monitored for additional feedback. Based on the additional feedback, new speech stimuli may be generated to modify the speech stimuli provided to the patient.
The phonetic identifiers may be sent to a stimulus generator 1350. Based on the phonetic identifiers and one or more predetermined filtering rules, the stimulus generator 1350 filters the audio and text samples into one or more stimuli for use during fluency practice exercises. The filtering rules may indicate phonetic identifiers of audio and/or text samples to be filtered out. The filtering rules may be set, e.g., automatically based on feedback, manually via a user interface (for example, via the user interface as described herein above with respect to
In a further embodiment, the stimulus generator 1350 may further match the phonetic identifiers received from the speech recognition unit 1320 to the phonetic identifiers received from the phonetics recognition unit 1340 to verify an accuracy of the samples. As an example, if one audio sample and one text sample are received, the phonetic identifiers for the audio sample indicate that the audio sample contains 3 syllables, and the phonetic identifiers for the text sample indicates that the text sample contains 4 syllables, the audio and text samples may be identified as unverified. In a further embodiment, unverified samples may be filtered out during filtering.
It should be noted that
The various elements discussed with reference to
It should be noted that the embodiments disclosed herein are described respective of a patient and speech therapy sessions merely for simplicity purposes and without limitation on the disclosed embodiments. The disclosed embodiments may be utilized by any person who is practicing his or her fluency during any fluency practice sessions without departing from the scope of the disclosure.
The various embodiments disclosed herein can be implemented as hardware, firmware, software or any combination thereof. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage unit, a non-transitory computer readable medium, or a non-transitory machine-readable storage medium that can be in a form of a digital circuit, an analog circuit, a magnetic medium, or combination thereof. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (“CPUs”), a memory, and input/output interfaces. The computer platform may also include an operating system and microinstruction code. The various processes and functions described herein may be either part of the microinstruction code or part of the application program, or any combination thereof, which may be executed by a CPU, whether or not such computer or processor is explicitly shown. In addition, various other peripheral units may be connected to the computer platform such as an additional data storage unit and a printing unit. Furthermore, a non-transitory computer readable medium is any computer readable medium except for a transitory propagating signal.
While the disclosed embodiments have been described at some length and with some particularity with respect to the several described embodiments, it is not intended that it should be limited to any such particulars or embodiments or any particular embodiment, but it is to be construed with references to the appended claims so as to provide the broadest possible interpretation of such claims in view of the prior art and, therefore, to effectively encompass the intended scope of the disclosure. Furthermore, the foregoing describes the disclosure in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the disclosed embodiments, not presently foreseen, may nonetheless represent equivalents thereto.
Claims
1. A method for automatically generating speech stimuli for treatment of speech disorders, comprising:
- receiving, from a fluency practice device, a feedback respective of a fluency practice session;
- generating at least one recommendation based in part on the feedback;
- configuring a fluency practice exercise based on the at least one recommendation;
- retrieving at least one stimulus based on the configuration; and
- providing the retrieved at least one stimulus for practicing during the fluency practice exercise.
2. The method of claim 1, wherein the fluency practice device is configured to generate the feedback based on the fluency practice session, wherein generating the feedback includes receiving at least one sound sample respective of the fluency practice session and causing, by a feedback generator system, an analysis of the at least one sound sample.
3. The method of claim 2, wherein the analysis of the at least one sound sample includes processing a voice production of the at least one sound sample to evaluate a correct execution of the voice production.
4. The method of claim 1, further comprising:
- determining progress indicators based at least in part on the feedback.
5. The method of claim 4, further comprising:
- monitoring, respective of the speech stimuli, the fluency practice session to detect a change in the progress indicators; and
- upon detecting the change in the progress indicators, generating at least one modified speech stimulus based on the changed progress indicators.
6. The method of claim 4, wherein each of the progress indicators is any of: a current progress level, previous successes, difficulties, and errors.
7. The method of claim 1, wherein each recommendation is for a stimulus defining at least one parameter, wherein each parameter is any of: a number of syllables per word, a letter, a syllable, a word, a pattern of syllables, a pattern of letters, and a transition between words.
8. The method of claim 1, wherein the at least one stimulus is retrieved from a database.
9. The method of claim 1, wherein the at least one stimulus is part of at least one of: a word, and a sentence.
10. The method of claim 1, wherein the fluency practice exercise is configured in real-time as a user practices.
11. The method of claim 1, wherein the at least one stimulus is based at least in part on at least one audio sample and at least one text sample.
12. The method of claim 11, wherein the at least one stimulus is further based on at least one filtering rule, wherein the at least one filtering rule is applied to each of the at least one audio sample and each of the at least one text sample based on phonetic identifiers in the samples.
13. The method of claim 12, wherein the phonetic identifiers in the at least one audio sample are matched to the phonetic identifiers in the at least one text sample to verify accuracies of the samples.
14. A non-transitory computer readable medium having stored thereon instructions for executing the method according to claim 1.
15. A system for automatically generating speech stimuli for treatment of speech disorders, comprising:
- a processing unit; and
- a memory, the memory containing instructions that, when executed by the processing unit, configure the system to:
- receive, from a fluency practice device, a feedback respective of a fluency practice session;
- generate at least one recommendation based in part on the feedback;
- configure a fluency practice exercise based on the at least one recommendation;
- retrieve at least one stimulus based on the configuration; and
- provide the retrieved at least one stimulus for practicing during the fluency practice exercise.
16. The system of claim 15, wherein the fluency practice device is configured to generate the feedback based on the fluency practice session, wherein generating the feedback includes receiving at least one sound sample respective of the fluency practice session and causing, by a feedback generator system, an analysis of the at least one sound sample.
17. The system of claim 16, wherein the analysis of the at least one sound sample includes processing a voice production of the at least one sound sample to evaluate a correct execution of the voice production.
18. The system of claim 15, wherein the system is further configured to:
- determine progress indicators based at least in part on the feedback.
19. The system of claim 18, wherein the system is further configured to:
- monitor, respective of the speech stimuli, the fluency practice session to detect a change in the progress indicators; and
- upon detecting the change in the progress indicators, generate at least one modified speech stimulus based on the changed progress indicators.
20. The system of claim 18, wherein each of the progress indicators is any of: a current progress level, previous successes, difficulties, and errors.
21. The system of claim 14, wherein each recommendation is for a stimulus defining at least one parameter, wherein each parameter is any of: a number of syllables per word, a letter, a syllable, a word, a pattern of syllables, a pattern of letters, and a transition between words.
22. The system of claim 15, wherein the at least one stimulus is retrieved from a database.
23. The system of claim 15, wherein the at least one stimulus is part of at least one of: a word, and a sentence.
24. The system of claim 15, wherein the fluency practice exercise is configured in real-time as a user practices.
25. The system of claim 15, wherein the at least one stimulus is based at least in part on at least one audio sample and at least one text sample.
26. The system of claim 25, wherein the at least one stimulus is further based on at least one filtering rule, wherein the at least one filtering rule is applied to each of the at least one audio sample and each of the at least one text sample based on phonetic identifiers in the samples.
27. The system of claim 26, wherein the phonetic identifiers in the at least one audio sample are matched to the phonetic identifiers in the at least one text sample to verify accuracies of the samples.
Type: Application
Filed: Dec 28, 2015
Publication Date: Jun 30, 2016
Applicant: Novotalk, Ltd. (Givat Shmuel)
Inventors: Moshe Rot (Ramat Gan), Lilach Rothschild (Tel Aviv), Smadar Lerner (Kefar Saba)
Application Number: 14/981,072