Miscellaneous Analysis Or Detection Of Speech Characteristics (epo) Patents (Class 704/E11.001)
-
Publication number: 20090216538Abstract: A computer implemented method facilitates a user interaction via a speech-based user interface. The method acquires spoken input from a user in a form of a phrase of one or more words. It further determines, using a plurality of different domains) whether the phrase is a query or a command. If the phrase is the query the method retrieves and presents relevant items from a plurality of databases. If the phrase is a command, the method performs an operation.Type: ApplicationFiled: February 25, 2008Publication date: August 27, 2009Inventors: Garrett Weinberg, Bhiksha Ramakrishnan, Bent Schmidt-Nielsen, Bret A. Harsham
-
Publication number: 20090204395Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.Type: ApplicationFiled: January 22, 2008Publication date: August 13, 2009Inventors: Yumiko Kato, Takahiro Kamai
-
Publication number: 20090199101Abstract: A system (20) for inputting graphical data into a graphical input field includes a graphical input device (22) for inputting the graphical data into the graphical input field, and a processor-executable voice-form module (28) responsive to an initial presentation of graphical data to the graphical input device. The voice-form module (28) causes a determination of whether the inputting of the graphical data into the graphical input field is complete. A method for inputting graphical data into a graphical input field includes initiating an input of graphical data via a graphical input device into the graphical input field, and actuating a voice-form module in response to initiating the input of graphical data into the graphical input field.Type: ApplicationFiled: January 30, 2009Publication date: August 6, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Charles W. Cross, JR., David Jaramillo, Marc White
-
Publication number: 20090192798Abstract: A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.Type: ApplicationFiled: January 25, 2008Publication date: July 30, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sara H. Basson, Dimitri Kanevsky, Edward E. Kelley, Bhuvana Ramabhadran
-
Publication number: 20090192799Abstract: Speech enhancement in a breathing apparatus is provided using a primary sensor mounted near a breathing mask user's mouth, at least one reference sensor mounted near a noise source, and a processor that combines the signals from these sensors to produce an output signal with an enhanced speech component. The reference sensor signal may be filtered and the result may be subtracted from the primary sensor signal to produce the output signal with an enhanced speech component. A method for detecting the exclusive presence of a low air alarm noise may be used to determine when to update the filter. A triple filter adaptive noise cancellation method may provide improved performance through reduction of filter maladaptation. The speech enhancement techniques may be employed as part of a communication system or a speech recognition system.Type: ApplicationFiled: January 29, 2008Publication date: July 30, 2009Applicant: Digital Voice Systems, Inc.Inventors: Daniel W. Griffin, John C. Hardwick
-
Publication number: 20090187399Abstract: The invention allows phonetic text input without any knowledge of phonetics. As an assist to the user of computer text entry systems, the invention makes possible an alternative method of Chinese character entry by entering a Chinese character assumed by the user to be a homophone of the character the user desires to enter. Entry methods for such homophone alternative entry include non-phonetic entry of Chinese characters using keyboard stroke input and single stroke, cursive and semi-cursive entry on an electronic surface. Direct correction of some misspellings of Chinese characters during phonetic entry also is made possible. The invention is not only helpful for entry of difficult Chinese characters but also provides an approach to the use of supplementing input methods for most if not all written languages.Type: ApplicationFiled: January 22, 2008Publication date: July 23, 2009Inventor: Robert B. O'Dell
-
Publication number: 20090164220Abstract: A sound recording and playback apparatus and associated method, comprising: an audio storage medium; a microphone; a speaker; and a plurality of direct message access buttons, each direct message access button simultaneously associated both with a particular pre-recorded sound sequence stored in the storage medium, and with a particular new sound sequence capable of being recorded into the storage medium; wherein: when a particular direct message access button is depressed in a manner which respectively designates pre-recorded playback, new sound sequence recording, or new sound sequence playback, the particular pre-recorded sound sequence associated with the particular direct message access button is respectively audibly played over the speaker or recorded into the storage medium, as appropriate.Type: ApplicationFiled: February 23, 2009Publication date: June 25, 2009Inventor: Howard M. Katz
-
Publication number: 20090144131Abstract: In an apparatus that creates and distributes voice advertisements to end users, the apparatus includes a voice advertising portal and a network coupled to the voice advertising portal. A voice application server determines which portions of a voice advertisement will be cached locally at the voice advertising portal for subsequent local retrieval during user interaction with the voice advertising portal.Type: ApplicationFiled: July 28, 2008Publication date: June 4, 2009Inventors: Leo Chiu, Donald R. Steul, Arumugam Appadurai
-
Publication number: 20090138269Abstract: A method for enabling voice driven interactions among multiple interactive voice response (IVR) systems begins by receiving a telephone call from a user of a first IVR system to begin a transaction; and, automatically contacting, by the first IVR system, at least one additional IVR system. Specifically, the contacting of the additional IVR system includes assigning tasks to the additional IVR system. The tasks require input from the user and the additional IVR system is secure and separate from the first IVR system. Moreover, the tasks can include a transfer of currency and a transfer of local information.Type: ApplicationFiled: November 28, 2007Publication date: May 28, 2009Inventors: Sheetal K. Agarwal, Dipanjan Chakraborty, Arun Kumar, Amit A. Nanavati, Nitendra Rajput
-
Publication number: 20090132256Abstract: A first communication path for receiving a communication is established. The communication includes speech, which is processed. A speech pattern is identified as including a voice-command. A portion of the speech pattern is determined as including the voice-command. That portion of the speech pattern is separated from the speech pattern and compared with a second speech pattern. If the two speech patterns match or resemble each other, the portion of the speech pattern is accepted as the voice-command. An operation corresponding to the voice-command is determined and performed. The operation may perform an operation on a remote device, forward the voice-command to a remote device, or notify a user. The operation may create a second communication path that may allow a headset to join in a communication between another headset and a communication device, several headsets to communicate with each other, or a headset to communicate with several communication devices.Type: ApplicationFiled: January 25, 2008Publication date: May 21, 2009Applicant: Embarq Holdings Company, LLCInventors: Erik Geldbach, Kelsyn D. Rooks, SR., Shane M. Smith, Mark Wilmoth
-
Publication number: 20090119092Abstract: A language package system that prevents undesirable behaviors resulting from an incompatibility between a core package of a software product and its language packages is provided. The language package system executes when a user starts the execution of the core package on a computing device. The language package system retrieves a language package version number from the core package that indicates the version number of compatible language packages and an indication of the preferred language of the user. The language package system then determines whether the computing device has a compatible language package that is available. When the computing device has a compatible language package, the software product uses that language package. When the computing device has no compatible language package, the language package system then performs processing that factors in the unavailability of a compatible language package.Type: ApplicationFiled: November 1, 2007Publication date: May 7, 2009Applicant: Microsoft CorporationInventors: Balaji Balasubramanyan, Dmitri Davydok
-
Publication number: 20090112578Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software that is operable to disambiguate compound text input. The device is able to assemble language objects in the memory to generate compound language solutions. The device is able to prioritize compound language solutions according to various criteria, including the degree of completeness of the text components of a compound language solution.Type: ApplicationFiled: December 30, 2008Publication date: April 30, 2009Inventors: Vadim Fux, Michael G. Elizarov
-
Publication number: 20090112601Abstract: A human interface device for assisting the verbally challenged to record custom messages and play back the custom and pre-recorded messages through a sequence of simple finger movements. A data glove containing Hall Effect and Bend Resistor is warn by the user and connected to the Voice Module. The data glove is designed to capture and translate the sequence of finger movements into actions, then transmits the actions to the Voice Module. When a pause is sensed in the actions, the Voice Module links these actions to the custom or pre-recorded messages. These messages are then played on the Voice Module allowing people in close proximity to hear. A Remote Voice Monitor may also be wirelessly connected to the Voice Module to allow remote monitoring.Type: ApplicationFiled: October 25, 2007Publication date: April 30, 2009Inventor: Larry Don Fullmer
-
Publication number: 20090099848Abstract: The present invention is an innovative system and method for passive diagnosis of dementias. The disclosed invention enables early diagnosis of and assessments of the efficacy of medications for neural disorders which are characterized by progressive linguistic decline and circadian speech-rhythm disturbances. Clinical and psychometric indicators of dementias are automatically identified by longitudinal statistical measurements and track the nature of language change and/or patient audio features change using mathematical methods. According to embodiments of the present invention the disclosed system and method include multi-layer processing units wherein initial processing of the recorded audio data is processed in a local unit. Processed and required raw data is also transferred to a central unit which performs in-depth analysis of the audio data.Type: ApplicationFiled: October 16, 2007Publication date: April 16, 2009Inventors: Moshe Lerner, Ofer Bahar
-
Publication number: 20090089066Abstract: A system and method for automatic user training in speech-to-speech translation includes integrating an automatic user response system configured to be responsive to a plurality of training items and selecting a training item from the plurality of training items. For the selected training item, in response to an utterance in a first language, the utterance is translated into a second language, and a response to the utterance in the second language is generated. A simulated action corresponding with the response in accordance with a user speaking the second language is also generated. The response and simulated action are output as a learning exercise for learning operations of the automatic user response system.Type: ApplicationFiled: October 2, 2007Publication date: April 2, 2009Inventors: YUQING GAO, Liang Gu, Wei Zhang
-
Publication number: 20090089062Abstract: A public speaking self-evaluation tool that helps a user practice public speaking in terms of avoiding undesirable words or sounds, maintaining a desirable speech rhythm, and ensuring that the user is regularly glancing at the audience. The system provides a user interface through which the user is able to define the undesirable words or sounds that are to be avoided, as well as a maximum frequency of occurrence threshold to be used for providing warning signals based on detection of such filler or undesirable words or sounds. The user interface allows a user to define a speech rhythm, e.g. in terms of spoken syllables per minute, that is another maximum threshold for providing a visual warning indication. The disclosed system also provides a visual indication when the user fails to glance at the audience at least as often as defined by a predefined minimum threshold.Type: ApplicationFiled: October 1, 2007Publication date: April 2, 2009Inventor: Fang Lu
-
Publication number: 20090089050Abstract: A device and a method for frame lost concealment are disclosed. A pitch period of a current lost frame is obtained on the basis of a pitch period of the last good frame before the current lost frame. An excitation signal of the current lost frame is recovered on the basis of the pitch period of the current lost frame and an excitation signal of the last good frame before the lost frame. Thereby, the hearing contrast of a receiver is reduced, and the quality of speech is improved. Further, in the present invention, a pitch period of continual lost frames is adjusted on the basis of the change trend of the pitch period of the last good frame before the lost frame. Therefore, a buzz effect produced by the continual lost frames is avoided, and the quality of speech is further improved.Type: ApplicationFiled: December 8, 2008Publication date: April 2, 2009Applicant: Huawei Technologies Co., Ltd.Inventors: Yunneng Mo, Yulong Li, Fanrong Tang
-
Publication number: 20090083029Abstract: A word coinciding with a key word input by speech and a word related to the word are set as retrieval candidate words based on a word dictionary in which words representing formal names and aliases of the formal names are registered in association with a family attribute indicating a familiar relation among the words. Content related to any one of retrieval words selected out of the retrieval candidate words and a word related to the retrieval word is retrieved.Type: ApplicationFiled: February 29, 2008Publication date: March 26, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Miwako Doi, Kaoru Suzuki, Toshiyuki Koga, Koichi Yamamoto
-
Publication number: 20090076827Abstract: A system for controlling or operating a plurality of target systems via spoken commands is provided. The system includes a first plurality of target systems, a second plurality of controllers for controlling or operating target systems via spoken commands, a speech recognition system that stores interface information that is specific to a target system or a group of target systems that are to be controlled or operated. A first controller in the second plurality of controllers includes a microphone for picking up audible signals in the vicinity of the first controller and a device for transmitting the audible signals to a speech recognition system. The speech recognition system is operable to analyze the interface information to recognize spoken commands issued for controlling or operating said target system.Type: ApplicationFiled: September 11, 2008Publication date: March 19, 2009Inventors: Clemens Bulitta, Robert Kagermeier, Dietmar Sierk
-
Publication number: 20090076806Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity sort-duration speech features in said signal.Type: ApplicationFiled: October 28, 2008Publication date: March 19, 2009Inventors: Andrew E. Vandali, Graeme M. Clark
-
Publication number: 20090076808Abstract: A method for performing a frame erasure concealment for a higher-band signal involves calculating a periodic intensity of the higher-band signal with respect to pitch period information of a lower-band signal; comparing the periodic intensity to a preconfigured threshold and, if the periodic intensity is greater or equal to the preconfigured threshold, performing the frame erasure concealment with a pitch period repetition based method. If the periodic intensity is less than the preconfigured threshold, performing the frame erasure concealment with a previous frame data repetition based method. A device for performing a frame erasure concealment includes a periodic intensity calculation module, a pitch period repetition module, and a previous frame data repetition module.Type: ApplicationFiled: November 18, 2008Publication date: March 19, 2009Applicant: Huawei Technologies Co., Ltd.Inventors: Jianfeng Xu, Lei Miao, Chen Hu, Qing Zhang, Lijing Xu, Wei Li, Zhengzhong Du, Yi Yang, Fengyan Qi, Wuzhou Zhan, Dongqi Wang
-
Publication number: 20090063139Abstract: For determining a long-term-prediction delay parameter characterizing a long term prediction in a technique using signal modification for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, a feature of the sound signal is located in a previous frame, a corresponding feature of the sound signal is located in a current frame, and the long-term-prediction delay parameter is determined for the current frame while mapping, with the long term prediction, the signal feature of the previous frame with the corresponding signal feature of the current frame. In a signal modification method for implementation into a technique for digitally encoding a sound signal, the sound signal is divided into a series of successive frames, each frame of the sound signal is partitioned into a plurality of signal segments, and at least a part of the signal segments of the frame are warped while constraining the warped signal segments inside the frame.Type: ApplicationFiled: October 21, 2008Publication date: March 5, 2009Inventors: Mikko Tammi, Milan Jelinek, Claude LaFlamme, Vesa Ruoppila
-
Publication number: 20090058611Abstract: A wearable device is worn by a person participating in an event in which a plurality of other people are participating and wearing other wearable devices. The wearable device includes a request unit for transmitting a request signal to other wearable devices that are in a predetermined range, and receiving a response to the request signal from each of the other wearable devices, and a communication unit for determining, with use of the received responses, one or more of the other wearable devices to be a communication partner, and performing data communication with the determined one or more other wearable devices. The data received in the communication is data collected by the one or more other wearable devices determined to be communication partners, and the data is used as a profile component when creating a profile of the event.Type: ApplicationFiled: February 21, 2007Publication date: March 5, 2009Inventors: Takashi Kawamura, Masayuki Misaki, Ryouichi Kawanishi, Masaki Yamauchi
-
Publication number: 20090063138Abstract: Methods, digital systems, and computer readable media are provided for determining a predominant fundamental frequency of a frame of an audio signal by finding a maximum absolute signal value in history data for the frame, determining a number of bits for downshifting based on the maximum absolute signal value, computing autocorrelations for the frame using signal values downshifted by the number of bits, and determining the predominant fundamental frequency using the computed autocorrelations.Type: ApplicationFiled: August 4, 2008Publication date: March 5, 2009Inventors: Atsuhiro Sakurai, Steven David Trautmann
-
Publication number: 20090063125Abstract: Techniques are provided for globalizing handling of service management items. The techniques include obtaining a service management item in a language convenient to a first of two or more actors, translating the service management item into a language-neutral format to obtain a language-neutral service management item, applying one or more annotators to the service management item, translating the language-neutral service management item into a language convenient to a second of two or more actors acting on the service management item, and routing the translated service management item to the second of two or more actors. Techniques are also provided for generating a database of service management items in a language-neutral format.Type: ApplicationFiled: August 28, 2007Publication date: March 5, 2009Applicant: International Business Machines CorporationInventors: Alexander Faisman, Genady Grabarnik, Jonathan Lenchner, Larisa Shwartz
-
Publication number: 20090055168Abstract: Methods, systems, and apparatus, including computer program products, in which data from web documents are partitioned into a training corpus and a development corpus are provided. First word probabilities for words are determined for the training corpus, and second word probabilities for the words are determined for the development corpus. Uncertainty values based on the word probabilities for the training corpus and the development corpus are compared, and new words are identified based on the comparison.Type: ApplicationFiled: August 23, 2007Publication date: February 26, 2009Applicant: GOOGLE INC.Inventors: Jun Wu, Tang Xi Liu, Feng Hong, Yonggang Wang, Bo Yang, Lei Zhang
-
Publication number: 20090043569Abstract: There is provided a pitch lag predictor for use by a speech decoder to generate a predicted pitch lag parameter. The pitch lag predictor comprises a summation calculator configured to generate a first summation based on a plurality of previous pitch lag parameters, and a second summation based on a plurality of previous pitch lag parameters and a position of each of the plurality of previous pitch lag parameters with respect to the predicted pitch lag parameter; a coefficient calculator configured to generate a first coefficient using a first equation based on the first summation and the second summation, and a second coefficient using a second equation based on the first summation and the second summation, wherein the first equation is different than the second equation; and a predictor configured to generate the predicted pitch lag parameter based on the first coefficient and the second coefficient.Type: ApplicationFiled: October 8, 2008Publication date: February 12, 2009Inventor: Yang Gao
-
Publication number: 20090043568Abstract: An accent type is determined by outputting mora synchronized signals, extracting a pitch pattern which is a variation pattern of a voice height (fundamental frequency) from a speech signal entered by a user, generating mora synchronized pattern from the pitch pattern and the mora synchronized signal, storing typical patterns for respective accent types, collating the mora synchronized pattern and reference accent pattern, calculating matching of the mora synchronized patterns with respect to the respective accent types, referring the matching and determining the accent type.Type: ApplicationFiled: February 20, 2008Publication date: February 12, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Takehiko Kagoshima
-
Publication number: 20090030694Abstract: A gift card is provided which integrally combines a voice storage/playback unit with a stored value card and on two separate portions of a base substrate, with the two portions separated by a releasable connection portion. Upon purchasing the gift card, a gift giver records a personal voice message and provides the gift card to the recipient. Upon receiving the gift card, a recipient may immediately playback the recorded personal voice message through simple manipulation of the card. Thereafter, the two parts of the card may be separated by manipulation of the connection portion, and the stored value portion may be used in the matter of the conventional stored value cards. The voice storage/playback portion may be stored for safekeeping and played back by the gift recipient at will.Type: ApplicationFiled: July 10, 2008Publication date: January 29, 2009Applicant: Voice Express CorporationInventor: Geoffrey S. Stern
-
Publication number: 20090024394Abstract: A CPU of a speech ECU acquires vehicle position information. If it is determined from the position information and map data stored in a memory that the vehicle has moved between areas where different languages are spoken as dialects or official languages, the CPU determines a language corresponding to the vehicle position information and transmits a request signal to a speech information center to transmit speech information in the language. By receiving the speech information from the speech information center, the CPU updates speech information pre-stored in the memory with the speech information transmitted from the speech information center.Type: ApplicationFiled: June 30, 2008Publication date: January 22, 2009Applicant: DENSO CORPORATIONInventors: Kazuhiro Nakashima, Toshio Shimomura, Kenichi Ogino, Kentaro Teshima
-
Publication number: 20090018841Abstract: A method and apparatus allow for remotely playing back recorded personalized and non-personalized audio messages to a listener. A first recorded message, which has been personalized for an intended listener, is stored in a first memory of an audio player. A second recorded message, which has not been personalized for an intended listener, is stored in a second memory of the audio player. Responsive to receiving a control command from a remote control device, the audio player plays the messages according to a predetermined arrangement, such as in a predetermined order or at predetermined intervals (e.g., the personalized message may be played a number of times before the non-personalized message is played). In one embodiment, the personalized message contains audio intended to be a calming and/or instructional influence on a small child. In another embodiment, the non-personalized message may include information associated with a sponsoring business.Type: ApplicationFiled: July 8, 2008Publication date: January 15, 2009Inventors: Marshall T. Leeds, Susan M. Camacho, Steven C. Jacobs
-
Publication number: 20090015567Abstract: A standalone real time device to process handwritten text for further applications. The system includes a means for making visible markings on writing surface, accompanied with a motion detector for detecting the handwritten text. It also comprises a microprocessor for storing appropriate data and commands, an enhanced memory to provide storage space for information and data, and a power supply. The system further includes a display to provide visual feedback of processed data. Also, it further includes an audio reproduction device to provide audio feedback; and further includes wired or wireless communication means to transmit data to targeted devices via a transmission link in real time.Type: ApplicationFiled: March 20, 2007Publication date: January 15, 2009Inventors: Marwa Abdelbaki, Firas Zeineddine
-
Publication number: 20090012794Abstract: System for giving intelligibility feedback to a speaker (1), speaking for an audience (2), comprising a first microphone (3) at the speaker's side and a second microphone (4) at the audience's side. Both microphones are connected to processing means (5) which are arranged to compute an intelligibility value based on both microphones' signals. Signalling means (6), preferably at the side of the audience, are arranged to generate an intelligibility feedback signal depending on the calculated intelligibility value. The signalling means being arranged to generate said intelligibility feedback signal in an optical form, visible for the speaker concerned. Wireless connection means (19) may interconnect the microphones, the processing means and the signalling means.Type: ApplicationFiled: February 8, 2007Publication date: January 8, 2009Applicant: Nerderlandse Organisatie voor toegepast- natuurwetenschappelijk Onderzoek TNOInventors: Sander Jeroen van Wijngaarden, Jan Adrianus Verhave
-
Publication number: 20090012780Abstract: In a speech signal decoding method, information containing at least a sound source signal, gain, and filter coefficients is decoded from a received bit stream. Voiced speech and unvoiced speech of a speech signal are identified using the decoded information. Smoothing processing based on the decoded information is performed for at least either one of the decoded gain and decoded filter coefficients in the unvoiced speech. The speech signal is decoded by driving a filter having the decoded filter coefficients by an excitation signal obtained by multiplying the decoded sound source signal by the decoded gain using the result of the smoothing processing. A speech signal decoding apparatus is also disclosed.Type: ApplicationFiled: August 27, 2008Publication date: January 8, 2009Inventor: Atsushi Murashima
-
Publication number: 20080319763Abstract: A dialog manager and spoken dialog service having a dialog manager generated according to a method comprising selecting a top level flow controller based on application type, selecting available reusable subdialogs for each application part, developing a subdialog for each application part not having an available subdialog and testing and deploying the spoken dialog service using the selected top level flow controller, selected reusable subdialogs and developed subdialogs. The dialog manager capable of handling context shifts in a spoken dialog with a user. Application dependencies are established in the top level flow controller thus enabling the subdialogs to be reusable and to be capable of managing context shifts and mixed initiative dialogs.Type: ApplicationFiled: August 29, 2008Publication date: December 25, 2008Applicant: AT&T Corp.Inventors: Giuseppe Di Fabbrizio, Charles Alfred Lewis
-
Publication number: 20080319762Abstract: The present invention discloses a system and a method for creating and editing speech-enabled WIKIs. A WIKI editor can be served to client-side Web browsers so that end-users can utilize WIKI editor functions, which include functions to create and edit speech-enabled WIKI applications. A WIKI server can serve speech-enabled WIKI applications created via the WIKI editor. Each of the speech-enabled WIKI applications can include a link to at least one speech processing engine located in a speech processing system remote from the WIKI server. The speech processing engine can provide a speech processing capability for the speech-enabled WIKI application when served by the WIKI server. In one embodiment, the speech-enabled applications can include an introspection document, an entry collection of documents, and a resource collection of documents in accordance with standards specified by an ATOM PUBLISHING PROTOCOL (APP).Type: ApplicationFiled: June 20, 2007Publication date: December 25, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: WILLIAM V. DA PALMA, VICTOR S. MOORE, WENDI L. NUSBICKEL
-
Publication number: 20080319757Abstract: A speech processing system can include a client, a speech for Web 2.0 system, and a speech processing system. The client can access a speech-enabled application using at least one Web 2.0 communication protocol. For example, a standard browser of the client can use a standard protocol to communicate with the speech-enabled application executing on the speech for Web 2.0 system. The speech for Web 2.0 system can access a data store within which user specific speech parameters are included, wherein a user of the client is able to configure the specific speech parameters of the data store. Suitable ones of these speech parameters are utilized whenever the user interacts with the Web 2.0 system. The speech processing system can include one or more speech processing engines. The speech processing system can interact with the speech for Web 2.0 system to handle speech processing tasks associated with the speech-enabled application.Type: ApplicationFiled: June 20, 2007Publication date: December 25, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: William V. Da Palma, Victor S. Moore, Wendi L. Nusbickel
-
Publication number: 20080319761Abstract: The present invention discloses a method of performing speech processing operations based upon Web 2.0 type interfaces with speech engines. The method can include a step of interfacing with a Web 2.0 server from a standard browser. A speech-enabled application served by the Web 2.0 server can be accessed. The browser can render markup of the speech-enabled application. Speech input can be received from a user of the browser. A RESTful protocol, such as the ATOM Publishing Protocol (APP), can be utilized to access a remotely located speech engine. The speech engine can accept GET, PUT, POST, and DELETE commands. The speech processing engine can process the speech input and can provide results to the Web 2.0 server. The Web 2.0 server can perform a programmatic action based upon the provided results, which results in different content being presented in the browser.Type: ApplicationFiled: June 20, 2007Publication date: December 25, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: William V. Da Palma, Victor S. Moore, Wendi L. Nusbickel
-
Publication number: 20080319758Abstract: The present invention discloses a speech-enabled application that includes two or more linked markup documents that together form a speech-enabled application served by a Web 2.0 server. The linked markup documents can conform to an ATOM PUBLISHING PROTOCOL (APP) based protocol. Additionally, the linked markup documents can include an entry collection of documents and a resource collection of documents. The resource collection can include at least one speech resource associated with a speech engine disposed in a speech processing system remotely located from the Web 2.0 server. The speech resource can add a speech processing capability to the speech-enabled application. In one embodiment, end-users of the speech-enabled application can be permitted to introspect, customize, replace, add, re-order, and remove at least a portion of the linked markup documents.Type: ApplicationFiled: June 20, 2007Publication date: December 25, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: William V. Da Palma, Victor S. Moore, Wendi L. Nusbickel
-
Publication number: 20080312933Abstract: A method for interfacing an application server with a resource can include the step of associating a plurality of Enterprise Java Beans (EJBs) to a plurality of resources, where a one-to-one correspondence exists between EJB and resource. An application server can receive an application request and can determine a resource for handling the request. An EJB associated with the determined resource can interface the application server to the determined resource. The request can be handled with the determined resource.Type: ApplicationFiled: August 28, 2008Publication date: December 18, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Thomas E. Creamer, Victor S. Moore, Wendi L. Nusbickel, Ricardo Dos Santos, James J. Sliwa
-
Publication number: 20080312936Abstract: Provided are an apparatus and method for estimating a voice data value corresponding to a silent period produced in a key resynchronization process using the sine waveform characteristic of voice when encrypted digital voice data is transmitted in one-way wireless communication environment. The apparatus includes a transmitter that generates a key resynchronization frame containing key resynchronization information and vector information on voice data inserted thereinto and transmits the key resynchronization frame, and a receiver that receives the key resynchronization frame from the transmitter, extracts the vector information inserted in the key resynchronization frame, and estimates a voice data value corresponding to the key resynchronization period. Based on change ratio between slopes calculated using received voice data, it is possible to estimate the voice data corresponding to a silent period, which improves communication quality.Type: ApplicationFiled: March 14, 2008Publication date: December 18, 2008Inventors: Taek Jun NAM, Byeong-Ho AHN, Seok RYU, Sang-Yi YI
-
Publication number: 20080306743Abstract: A system and method are disclosed for switching contexts within a spoken dialog between a user and a spoken dialog system. The spoken dialog system utilizes modular subdialogs that are invoked by at least one flow controller that is a finite state model and that associated with a dialog manager. The spoken dialog system includes a dialog manager with a flow controller and a reusable subdialog module. The method includes, while the spoken dialog is being controlled by the subdialog module that was invoked by the flow controller, receiving context-changing input associated with speech from a user that changes a dialog context and comparing the context-changing input to at least one context shift. And, if any of the context shifts are activated by the comparing step, then passing control of the spoken dialog to the flow controller with context shift message and destination state.Type: ApplicationFiled: August 19, 2008Publication date: December 11, 2008Applicant: AT&T Corp.Inventors: Giuseppe Di Fabbrizio, Charles Alfred Lewis
-
Publication number: 20080306739Abstract: A system capable of separating sound source signals with high precision while improving a convergence rate and convergence precision. A process of updating a current separation matrix Wk to a next separation matrix Wk+1 such that a next value J(Wk+1) of a cost function is closer to a minimum value J(W0) than a current value J(Wk) is iteratively performed. An update amount ?Wk of the separation matrix is increased as the current value J(Wk) of the cost function is increased and is decreased as a current gradient ?J(Wk)/?W of the cost function is rapid. On the basis of input signals x from a plurality of microphones Mi and an optimal separation matrix W0, it is possible to separate sound source signals y(=W0·x) with high precision while improving a convergence rate and convergence precision.Type: ApplicationFiled: June 5, 2008Publication date: December 11, 2008Applicant: HONDA MOTOR CO., LTD.Inventors: Hirofumi Nakajima, Kazuhiro Nakadai, Yuji Hasegawa, Hiroshi Tsujino
-
Publication number: 20080290987Abstract: In one embodiment, a method includes receiving a signal from a communication device via a communication channel. The method also includes determining, based on the signal, a parameter value used for identification of a product of interest. A duration of the receiving is modified when a threshold condition is unsatisfied based on a probability value calculated based on the parameter value. The probability value is associated with identification of the product of interest.Type: ApplicationFiled: April 22, 2008Publication date: November 27, 2008Inventor: Lehmann Li
-
Publication number: 20080288260Abstract: Provided is an input/output apparatus based on voice recognition, and a method thereof. An object of the apparatus is to improve a user interface by making pointing input and command execution such as application program control possible according to a voice command of a user possible based on a voice recognition technology without individual pointing input device such as a mouse and a touch pad, and a method thereof. The apparatus includes: a voice recognizer for recognizing a voice command inputted from outside; a pointing controller for calculating a pointing location on a screen which corresponds to a voice recognition result transmitted from the voice recognizer; a displayer for displaying a screen; and a command controller for processing diverse commands related to a current pointing location.Type: ApplicationFiled: September 11, 2006Publication date: November 20, 2008Inventors: Kwan-Hyun Cho, Mun-Sung Han, Jun-Seok Park, Young-Giu Jung
-
Publication number: 20080288246Abstract: There is provided a method of using a processing circuitry for selecting a preferential pitch lag value from a plurality of pitch lag values, including a first pitch lag value and a second pitch lag value, for coding an input speech signal. The method comprises determining a first timing relationship between a previous pitch lag value and at least one of the plurality of pitch lag values; determining a second timing relationship between the first pitch lag value and the second pitch lag value; favoring one of the first pitch lag value and the second pitch lag value based on the first timing relationship and the second timing relationship to select one of the first pitch lag value and the second pitch lag value as the preferential pitch lag value; and converting the input speech signal into an encoded speech using the preferential pitch lag value.Type: ApplicationFiled: July 23, 2008Publication date: November 20, 2008Applicants: Mindspeed Technologies, Inc.Inventors: Huan-Yu Su, Yang Gao
-
Publication number: 20080281586Abstract: A “speech onset detector” provides a variable length frame buffer in combination with either variable transmission rate or temporal speech compression for buffered signal frames. The variable length buffer buffers frames that are not clearly identified as either speech or non-speech frames during an initial analysis. Buffering of signal frames continues until a current frame is identified as either speech or non-speech. If the current frame is identified as non-speech, buffered frames are encoded as non-speech frames. However, if the current frame is identified as a speech frame, buffered frames are searched for the actual onset point of the speech. Once that onset point is identified, the signal is either transmitted in a burst, or a time-scale modification of the buffered signal is applied for compressing buffered frames beginning with the frame in which onset point is detected. The compressed frames are then encoded as one or more speech frames.Type: ApplicationFiled: July 28, 2008Publication date: November 13, 2008Applicant: MICROSOFT CORPORATIONInventors: Dinei A. Florencio, Philip A. Chou
-
Publication number: 20080281599Abstract: A method of processing audio data including obtaining (202) audio data; analysing (206) the audio data to determine at least one characteristic of the audio data; generating (206) data describing the at least one characteristic of the analysed audio data, and/or modifying (412) an audio recording process based on the at least one characteristic of the analysed audio data.Type: ApplicationFiled: April 15, 2008Publication date: November 13, 2008Inventor: Paul Rocca
-
Publication number: 20080275695Abstract: A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.Type: ApplicationFiled: April 25, 2008Publication date: November 6, 2008Inventors: Anssi Ramo, Jani Nurminen, Sakari Himanen, Ari Heikkinen
-
Publication number: 20080275697Abstract: An audio processing apparatus for processing two sampled audio signals to detect a temporal position of one of the audio signals with respect to the other. The apparatus detects audio power characteristics of each signal in respect of successive continuous temporal portions of each of the two signals, the portions having identical lengths and each portion including at least two audio samples, and correlates the detected audio power characteristics in respect of the two audio signals to establish a most likely temporal offset between the two audio signals.Type: ApplicationFiled: October 27, 2006Publication date: November 6, 2008Applicant: SONY UNITED KINGDOM LIMITEDInventors: William Edmund Cranstoun Kentish, Nicolas John Haynes