Patents by Inventor Jani Nurminen

Jani Nurminen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8489392
    Abstract: A system and method for modeling speech in such a way that both voiced and unvoiced contributions can co-exist at certain frequencies. In various embodiments, three spectral bands (or bands of up to three different types) are used. In one embodiment, the lowest band or group of bands is completely voiced, the middle band or group of bands contains both voiced and unvoiced contributions, and the highest band or group of bands is completely unvoiced. The embodiments of the present invention may be used for speech coding and other speech processing applications.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: July 16, 2013
    Assignee: Nokia Corporation
    Inventors: Jani Nurminen, Sakari Himanen
  • Patent number: 8380496
    Abstract: A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.
    Type: Grant
    Filed: April 25, 2008
    Date of Patent: February 19, 2013
    Assignee: Nokia Corporation
    Inventors: Anssi Rämö, Jani Nurminen, Sakari Himanen, Ari Heikkinen
  • Patent number: 8131550
    Abstract: An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: March 6, 2012
    Assignee: Nokia Corporation
    Inventors: Jani Nurminen, Elina Helander
  • Patent number: 8086057
    Abstract: A method and system are introduced that provide dynamic quantizer structures which are configurable during run time. A quantizer configuration and data are stored in a binary format. The dynamic quantizer data is represented as a bitstream, and the bitstream in turn is used as additional input during initialization (or re-initialization/re-configuration) of a speech coder. A configuration header fully specifies the structure and configuration of the dynamic quantizer for each quantized parameter, and the dynamic quantizer data and configurations are fully and dynamically allocated into the speech coder memory. This enables easy re-configuration of a codec associated with the quantizer structures for different scenarios. The use of dynamic quantizer structures in turn enhances compression efficiency of an input signal. The dynamic quantizer structures can also be applied to other compression applications that allow lossy compression.
    Type: Grant
    Filed: September 14, 2007
    Date of Patent: December 27, 2011
    Assignee: Nokia Corporation
    Inventors: Jani Nurminen, Sakari Himanen
  • Publication number: 20090094027
    Abstract: An apparatus for providing improved voice conversion includes a sub-feature generator and a transformation element. The sub-feature generator may be configured to define sub-feature units with respect to a feature of source speech. The transformation element may be configured to perform voice conversion of the source speech to target speech based on the conversion of the sub-feature units to corresponding target speech sub-feature units using a conversion model trained with respect to converting training source speech sub-feature units to training target speech sub-feature units.
    Type: Application
    Filed: October 4, 2007
    Publication date: April 9, 2009
    Inventors: Jani Nurminen, Elina Helander
  • Patent number: 7505950
    Abstract: Systems and methods are provided for performing soft alignment in Gaussian mixture model (GMM) based and other vector transformations. Soft alignment may assign alignment probabilities to source and target feature vector pairs. The vector pairs and associated probabilities may then be used calculate a conversion function, for example, by computing GMM training parameters from the joint vectors and alignment probabilities to create a voice conversion function for converting speech sounds from a source speaker to a target speaker.
    Type: Grant
    Filed: April 26, 2006
    Date of Patent: March 17, 2009
    Assignee: Nokia Corporation
    Inventors: Jilei Tian, Jani Nurminen, Victor Popa
  • Publication number: 20080275695
    Abstract: A method and device for improving coding efficiency in audio coding. From the pitch values of a pitch contour of an audio signal, a plurality of simplified pitch contour segments are generated to approximate the pitch contour, based on one or more pre-selected criteria. The contour segments can be linear or non-linear with each contour segment represented by a first end point and a second end point. If the contour segments are linear, then only the information regarding the end points, instead of the pitch values, are provided to a decoder for reconstructing the audio signal. The contour segment can have a fixed maximum length or a variable length, but the deviation between a contour segment and the pitch values in that segment is limited by a maximum value.
    Type: Application
    Filed: April 25, 2008
    Publication date: November 6, 2008
    Inventors: Anssi Ramo, Jani Nurminen, Sakari Himanen, Ari Heikkinen
  • Publication number: 20080147385
    Abstract: An improved system method for enabling and implementing codebook-based voice conversion that both significantly reduces the memory footprint and improves the continuity of the output. In various embodiments, the paired source-target codebook is implemented as a multi-stage vector quantizer. During the conversion, N best candidates in a tree search are taken as the output from the quantizer. The N candidates for each vector to be converted are used in a dynamic programming-based approach that finds a smooth but accurate output sequence.
    Type: Application
    Filed: December 15, 2006
    Publication date: June 19, 2008
    Inventors: Jani Nurminen, Jilei Tian, Victor Popa
  • Publication number: 20080109218
    Abstract: A system and method for modeling speech in such a way that both voiced and unvoiced contributions can co-exist at certain frequencies. In various embodiments, three spectral bands (or bands of up to three different types) are used. In one embodiment, the lowest band or group of bands is completely voiced, the middle band or group of bands contains both voiced and unvoiced contributions, and the highest band or group of bands is completely unvoiced. The embodiments of the present invention may be used for speech coding and other speech processing applications.
    Type: Application
    Filed: September 13, 2007
    Publication date: May 8, 2008
    Inventors: Jani Nurminen, Sakari Himanen
  • Publication number: 20080107348
    Abstract: A method and system are introduced that provide dynamic quantizer structures which are configurable during run time. A quantizer configuration and data are stored in a binary format. The dynamic quantizer data is represented as a bitstream, and the bitstream in turn is used as additional input during initialization (or re-initialization/re-configuration) of a speech coder. A configuration header fully specifies the structure and configuration of the dynamic quantizer for each quantized parameter, and the dynamic quantizer data and configurations are fully and dynamically allocated into the speech coder memory. This enables easy re-configuration of a codec associated with the quantizer structures for different scenarios. The use of dynamic quantizer structures in turn enhances compression efficiency of an input signal. The dynamic quantizer structures can also be applied to other compression applications that allow lossy compression.
    Type: Application
    Filed: September 14, 2007
    Publication date: May 8, 2008
    Inventors: Jani Nurminen, Sakari Himanen
  • Publication number: 20070256189
    Abstract: Systems and methods are provided for performing soft alignment in Gaussian mixture model (GMM) based and other vector transformations. Soft alignment may assign alignment probabilities to source and target feature vector pairs. The vector pairs and associated probabilities may then be used calculate a conversion function, for example, by computing GMM training parameters from the joint vectors and alignment probabilities to create a voice conversion function for converting speech sounds from a source speaker to a target speaker.
    Type: Application
    Filed: April 26, 2006
    Publication date: November 1, 2007
    Applicant: NOKIA CORPORATION
    Inventors: Jilei Tian, Jani Nurminen, Victor Popa
  • Publication number: 20070245375
    Abstract: A method of providing content dependent media content mixing includes automatically determining an emotional property of a first media content input, determining a specification for a second media content in response to the determined emotional property, and producing the second media content in accordance with the specification.
    Type: Application
    Filed: March 21, 2006
    Publication date: October 18, 2007
    Inventors: Jilei Tian, Jani Nurminen
  • Publication number: 20070239634
    Abstract: An apparatus for providing efficient evaluation of feature transformation includes a training module and a transformation module. The training module is configured to train a Gaussian mixture model (GMM) using training source data and training target data. The transformation module is in communication with the training module. The transformation module is configured to produce a conversion function in response to the training of the GMM. The training module is further configured to determine a quality of the conversion function prior to use of the conversion function by calculating a trace measurement of the GMM.
    Type: Application
    Filed: April 7, 2006
    Publication date: October 11, 2007
    Inventors: Jilei Tian, Jani Nurminen, Victor Popa
  • Publication number: 20070233625
    Abstract: An apparatus for providing data clustering and mode selection includes a training element and a transformation element. The training element is configured to receive a first training data set, a second training data set and auxiliary data extracted from the same material as the first training data set. The training element is also configured to train a classifier to group the first training data set into M clusters based on the auxiliary data and the first training data set and train M processing schemes corresponding to the M clusters for transforming the first training data set into the second training data set. The transformation element is in communication with the training element and is configured to cluster the second training data set into M clusters based on features associated with the second training data set.
    Type: Application
    Filed: April 3, 2006
    Publication date: October 4, 2007
    Inventors: Jilei Tian, Jani Nurminen, Victor Popa
  • Publication number: 20070094019
    Abstract: For an enhanced sequential compression of data vectors in a respective compression pass, a current data vector is mapped to at least one current code vector of at least one codebook in at least one quantization stage. The at least one codebook is reordered taking account of at least one intermediate result from the current compression pass and at least one intermediate result from a preceding compression pass. At least one codebook index that is associated in the at least one reordered codebook to the at least one current code vector is then provided for further use. For a decompression of compressed data vectors represented by such codebook indices, at least one codebook index is mapped to at least one code vector of at least one equally reordered codebook.
    Type: Application
    Filed: October 21, 2005
    Publication date: April 26, 2007
    Inventor: Jani Nurminen
  • Publication number: 20070016421
    Abstract: This invention relates to a method, a device and a software application product for correcting a pronunciation of a speech object. The speech object is synthetically generated from a text object in dependence on a segmented representation of the text object. It is determined if an initial pronunciation of the speech object, which initial pronunciation is associated with an initial segmented representation of the text object, is incorrect. Furthermore, in case it is determined that the initial pronunciation of the speech object is incorrect, a new segmented representation of the text object is determined, which new segmented representation of the text object is associated with a new pronunciation of the speech object.
    Type: Application
    Filed: July 12, 2005
    Publication date: January 18, 2007
    Inventors: Jani Nurminen, Hannu Mikkola, Jilei Tian
  • Publication number: 20070011009
    Abstract: The invention relates to a support of a concatenative TTS synthesis. In order to generate a speech database as a basis for the TTS synthesis, first, a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech is performed, which results in compressed parameterized speech segments. Then, the compressed parameterized speech segments are assembled in a speech database. In order to synthesize output speech, compressed parameterized speech segments are selected from the speech database based on an available text and decompressed to regain parameterized speech segments. The parameterized speech segments are then concatenated in a parameter domain. The output speech is synthesized based on these concatenated parametric speech segments.
    Type: Application
    Filed: July 8, 2005
    Publication date: January 11, 2007
    Inventors: Jani Nurminen, Sakari Himanen, Anssi Ramo, Janne Vainio
  • Publication number: 20060235685
    Abstract: This invention relates to a framework for converting a source speech signal associated with a source voice into a target speech signal that is a representation of the source speech signal associated with a target voice. The source speech signal is encoded into samples of encoding parameters, wherein the encoding comprises the step of segmenting the source speech signal into segments based on characteristics of the source speech signal. The samples of the encoding parameters, or a converted representation of the samples of the encoding parameters are then decoded to obtain the target speech signal. Therein, in the encoding, the decoding or in a separate step, samples of parameters related to the source speech signal are converted into samples of parameters related to the target speech signal. Therein, at least one of the encoding and the converting depends on the segments of the source speech signal.
    Type: Application
    Filed: April 15, 2005
    Publication date: October 19, 2006
    Inventors: Jani Nurminen, Jilei Tian, Imre Kiss
  • Publication number: 20060229877
    Abstract: In the concatenative text-to-speech system, high compression rate of duration data in the prosodic template is achieved by extracting statistical parameters describing behavior of actual duration values of instances of each given syllable, phoneme, half-phoneme, diphone, triphone or any other basic speech unit employed, and storing only the extracted statistical parameters, instead of the original duration values. Entries of each given basic unit in the prosodic template is sorted and indexed in the order of increasing duration value. Consequently, the amount of duration data can be significantly reduced, while keeping the error statistically under acceptable range.
    Type: Application
    Filed: April 6, 2005
    Publication date: October 12, 2006
    Inventors: Jilei Tian, Jani Nurminen
  • Publication number: 20060167680
    Abstract: A system and method of extracting information from a lexicon and using the information with a computer software program. Lexicon data is arranged for a particular language using Unicode values or other uniquely defined code values for each character of word of the language. A location array is then created for the lexicon data arranged by Unicode value or other uniquely defined code value. Upon a request to search for a word, words that have the same initial character as the searched-for word are identified using the location array. The identified words are then searched for an identified word that matches the searched-for word. Therefore, the amount of data loaded into run-time memory is minimized, and searches for a given word are completely more quickly than in conventional systems.
    Type: Application
    Filed: January 25, 2005
    Publication date: July 27, 2006
    Inventors: Jilei Tian, Jani Nurminen