Patents by Inventor Jani Nurminen

Jani Nurminen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for modeling speech spectra

Patent number: 8489392

Abstract: A system and method for modeling speech in such a way that both voiced and unvoiced contributions can co-exist at certain frequencies. In various embodiments, three spectral bands (or bands of up to three different types) are used. In one embodiment, the lowest band or group of bands is completely voiced, the middle band or group of bands contains both voiced and unvoiced contributions, and the highest band or group of bands is completely unvoiced. The embodiments of the present invention may be used for speech coding and other speech processing applications.

Type: Grant

Filed: September 13, 2007

Date of Patent: July 16, 2013

Assignee: Nokia Corporation

Inventors: Jani Nurminen, Sakari Himanen
Method and system for pitch contour quantization in audio coding

Patent number: 8380496

Abstract: A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.

Type: Grant

Filed: April 25, 2008

Date of Patent: February 19, 2013

Assignee: Nokia Corporation

Inventors: Anssi Rämö, Jani Nurminen, Sakari Himanen, Ari Heikkinen
Method, apparatus and computer program product for providing improved voice conversion

Patent number: 8131550

Abstract: An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.

Type: Grant

Filed: October 4, 2007

Date of Patent: March 6, 2012

Assignee: Nokia Corporation

Inventors: Jani Nurminen, Elina Helander
Dynamic quantizer structures for efficient compression

Patent number: 8086057

Abstract: A method and system are introduced that provide dynamic quantizer structures which are configurable during run time. A quantizer configuration and data are stored in a binary format. The dynamic quantizer data is represented as a bitstream, and the bitstream in turn is used as additional input during initialization (or re-initialization/re-configuration) of a speech coder. A configuration header fully specifies the structure and configuration of the dynamic quantizer for each quantized parameter, and the dynamic quantizer data and configurations are fully and dynamically allocated into the speech coder memory. This enables easy re-configuration of a codec associated with the quantizer structures for different scenarios. The use of dynamic quantizer structures in turn enhances compression efficiency of an input signal. The dynamic quantizer structures can also be applied to other compression applications that allow lossy compression.

Type: Grant

Filed: September 14, 2007

Date of Patent: December 27, 2011

Assignee: Nokia Corporation

Inventors: Jani Nurminen, Sakari Himanen
Method, Apparatus and Computer Program Product for Providing Improved Voice Conversion

Publication number: 20090094027

Abstract: An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.

Type: Application

Filed: October 4, 2007

Publication date: April 9, 2009

Inventors: Jani Nurminen, Elina Helander
Soft alignment based on a probability of time alignment

Patent number: 7505950

Abstract: Systems and methods are provided for performing soft alignment in Gaussian mixture model (GMM) based and other vector transformations. Soft alignment may assign alignment probabilities to source and target feature vector pairs. The vector pairs and associated probabilities may then be used calculate a conversion function, for example, by computing GMM training parameters from the joint vectors and alignment probabilities to create a voice conversion function for converting speech sounds from a source speaker to a target speaker.

Type: Grant

Filed: April 26, 2006

Date of Patent: March 17, 2009

Assignee: Nokia Corporation

Inventors: Jilei Tian, Jani Nurminen, Victor Popa
Method and system for pitch contour quantization in audio coding

Publication number: 20080275695

Abstract: A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.

Type: Application

Filed: April 25, 2008

Publication date: November 6, 2008

Inventors: Anssi Ramo, Jani Nurminen, Sakari Himanen, Ari Heikkinen
MEMORY-EFFICIENT METHOD FOR HIGH-QUALITY CODEBOOK BASED VOICE CONVERSION

Publication number: 20080147385

Abstract: An improved system method for enabling and implementing codebook-based voice conversion that both significantly reduces the memory footprint and improves the continuity of the output. In various embodiments, the paired source-target codebook is implemented as a multi-stage vector quantizer. During the conversion, N best candidates in a tree search are taken as the output from the quantizer. The N candidates for each vector to be converted are used in a dynamic programming-based approach that finds a smooth but accurate output sequence.

Type: Application

Filed: December 15, 2006

Publication date: June 19, 2008

Inventors: Jani Nurminen, Jilei Tian, Victor Popa
DYNAMIC QUANTIZER STRUCTURES FOR EFFICIENT COMPRESSION

Publication number: 20080107348

Abstract: A method and system are introduced that provide dynamic quantizer structures which are configurable during run time. A quantizer configuration and data are stored in a binary format. The dynamic quantizer data is represented as a bitstream, and the bitstream in turn is used as additional input during initialization (or re-initialization/re-configuration) of a speech coder. A configuration header fully specifies the structure and configuration of the dynamic quantizer for each quantized parameter, and the dynamic quantizer data and configurations are fully and dynamically allocated into the speech coder memory. This enables easy re-configuration of a codec associated with the quantizer structures for different scenarios. The use of dynamic quantizer structures in turn enhances compression efficiency of an input signal. The dynamic quantizer structures can also be applied to other compression applications that allow lossy compression.

Type: Application

Filed: September 14, 2007

Publication date: May 8, 2008

Inventors: Jani Nurminen, Sakari Himanen
SYSTEM AND METHOD FOR MODELING SPEECH SPECTRA

Publication number: 20080109218

Abstract: A system and method for modeling speech in such a way that both voiced and unvoiced contributions can co-exist at certain frequencies. In various embodiments, three spectral bands (or bands of up to three different types) are used. In one embodiment, the lowest band or group of bands is completely voiced, the middle band or group of bands contains both voiced and unvoiced contributions, and the highest band or group of bands is completely unvoiced. The embodiments of the present invention may be used for speech coding and other speech processing applications.

Type: Application

Filed: September 13, 2007

Publication date: May 8, 2008

Inventors: Jani Nurminen, Sakari Himanen
SOFT ALIGNMENT IN GAUSSIAN MIXTURE MODEL BASED TRANSFORMATION

Publication number: 20070256189

Abstract: Systems and methods are provided for performing soft alignment in Gaussian mixture model (GMM) based and other vector transformations. Soft alignment may assign alignment probabilities to source and target feature vector pairs. The vector pairs and associated probabilities may then be used calculate a conversion function, for example, by computing GMM training parameters from the joint vectors and alignment probabilities to create a voice conversion function for converting speech sounds from a source speaker to a target speaker.

Type: Application

Filed: April 26, 2006

Publication date: November 1, 2007

Applicant: NOKIA CORPORATION

Inventors: Jilei Tian, Jani Nurminen, Victor Popa
Method, apparatus and computer program product for providing content dependent media content mixing

Publication number: 20070245375

Abstract: A method of providing content dependent media content mixing includes automatically determining an emotional property of a first media content input, determining a specification for a second media content in response to the determined emotional property, and producing the second media content in accordance with the specification.

Type: Application

Filed: March 21, 2006

Publication date: October 18, 2007

Inventors: Jilei Tian, Jani Nurminen
Method, apparatus, mobile terminal and computer program product for providing efficient evaluation of feature transformation

Publication number: 20070239634

Abstract: An apparatus for providing efficient evaluation of feature transformation includes a training module and a transformation module. The training module is configured to train a Gaussian mixture model (GMM) using training source data and training target data. The transformation module is in communication with the training module. The transformation module is configured to produce a conversion function in response to the training of the GMM. The training module is further configured to determine a quality of the conversion function prior to use of the conversion function by calculating a trace measurement of the GMM.

Type: Application

Filed: April 7, 2006

Publication date: October 11, 2007

Inventors: Jilei Tian, Jani Nurminen, Victor Popa
Method, apparatus, mobile terminal and computer program product for providing data clustering and mode selection

Publication number: 20070233625

Abstract: An apparatus for providing data clustering and mode selection includes a training element and a transformation element. The training element is configured to receive a first training data set, a second training data set and auxiliary data extracted from the same material as the first training data set. The training element is also configured to train a classifier to group the first training data set into M clusters based on the auxiliary data and the first training data set and train M processing schemes corresponding to the M clusters for transforming the first training data set into the second training data set. The transformation element is in communication with the training element and is configured to cluster the second training data set into M clusters based on features associated with the second training data set.

Type: Application

Filed: April 3, 2006

Publication date: October 4, 2007

Inventors: Jilei Tian, Jani Nurminen, Victor Popa
Compression and decompression of data vectors

Publication number: 20070094019

Abstract: For an enhanced sequential compression of data vectors in a respective compression pass, a current data vector is mapped to at least one current code vector of at least one codebook in at least one quantization stage. The at least one codebook is reordered taking account of at least one intermediate result from the current compression pass and at least one intermediate result from a preceding compression pass. At least one codebook index that is associated in the at least one reordered codebook to the at least one current code vector is then provided for further use. For a decompression of compressed data vectors represented by such codebook indices, at least one codebook index is mapped to at least one code vector of at least one equally reordered codebook.

Type: Application

Filed: October 21, 2005

Publication date: April 26, 2007

Inventor: Jani Nurminen
Correcting a pronunciation of a synthetically generated speech object

Publication number: 20070016421

Abstract: This invention relates to a method, a device and a software application product for correcting a pronunciation of a speech object. The speech object is synthetically generated from a text object in dependence on a segmented representation of the text object. It is determined if an initial pronunciation of the speech object, which initial pronunciation is associated with an initial segmented representation of the text object, is incorrect. Furthermore, in case it is determined that the initial pronunciation of the speech object is incorrect, a new segmented representation of the text object is determined, which new segmented representation of the text object is associated with a new pronunciation of the speech object.

Type: Application

Filed: July 12, 2005

Publication date: January 18, 2007

Inventors: Jani Nurminen, Hannu Mikkola, Jilei Tian
Supporting a concatenative text-to-speech synthesis

Publication number: 20070011009

Abstract: The invention relates to a support of a concatenative TTS synthesis. In order to generate a speech database as a basis for the TTS synthesis, first, a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech is performed, which results in compressed parameterized speech segments. Then, the compressed parameterized speech segments are assembled in a speech database. In order to synthesize output speech, compressed parameterized speech segments are selected from the speech database based on an available text and decompressed to regain parameterized speech segments. The parameterized speech segments are then concatenated in a parameter domain. The output speech is synthesized based on these concatenated parametric speech segments.

Type: Application

Filed: July 8, 2005

Publication date: January 11, 2007

Inventors: Jani Nurminen, Sakari Himanen, Anssi Ramo, Janne Vainio
Framework for voice conversion

Publication number: 20060235685

Abstract: This invention relates to a framework for converting a source speech signal associated with a source voice into a target speech signal that is a representation of the source speech signal associated with a target voice. The source speech signal is encoded into samples of encoding parameters, wherein the encoding comprises the step of segmenting the source speech signal into segments based on characteristics of the source speech signal. The samples of the encoding parameters, or a converted representation of the samples of the encoding parameters are then decoded to obtain the target speech signal. Therein, in the encoding, the decoding or in a separate step, samples of parameters related to the source speech signal are converted into samples of parameters related to the target speech signal. Therein, at least one of the encoding and the converting depends on the segments of the source speech signal.

Type: Application

Filed: April 15, 2005

Publication date: October 19, 2006

Inventors: Jani Nurminen, Jilei Tian, Imre Kiss
Memory usage in a text-to-speech system

Publication number: 20060229877

Abstract: In the concatenative text-to-speech system, high compression rate of duration data in the prosodic template is achieved by extracting statistical parameters describing behavior of actual duration values of instances of each given syllable, phoneme, half-phoneme, diphone, triphone or any other basic speech unit employed, and storing only the extracted statistical parameters, instead of the original duration values. Entries of each given basic unit in the prosodic template is sorted and indexed in the order of increasing duration value. Consequently, the amount of duration data can be significantly reduced, while keeping the error statistically under acceptable range.

Type: Application

Filed: April 6, 2005

Publication date: October 12, 2006

Inventors: Jilei Tian, Jani Nurminen
System and method for optimizing run-time memory usage for a lexicon

Publication number: 20060167680

Abstract: A system and method of extracting information from a lexicon and using the information with a computer software program. Lexicon data is arranged for a particular language using Unicode values or other uniquely defined code values for each character of word of the language. A location array is then created for the lexicon data arranged by Unicode value or other uniquely defined code value. Upon a request to search for a word, words that have the same initial character as the searched-for word are identified using the location array. The identified words are then searched for an identified word that matches the searched-for word. Therefore, the amount of data loaded into run-time memory is minimized, and searches for a given word are completely more quickly than in conventional systems.

Type: Application

Filed: January 25, 2005

Publication date: July 27, 2006

Inventors: Jilei Tian, Jani Nurminen

1 2 next