Abstract: Sample data and metadata related to spatial regions in images may be received from a coded video signal. It is determined whether specific spatial regions in the images correspond to a specific region of luminance levels. In response to determining the specific spatial regions correspond to the specific region of luminance levels, signal processing and video compression operations are performed on sets of samples in the specific spatial regions. The signal processing and video compression operations are at least partially dependent on the specific region of luminance levels.
Abstract: Input audio data, including first microphone audio signals and second microphone audio signals output by a pair of coincident, vertically-stacked directional microphones, may be received. An azimuthal angle corresponding to a sound source location may be determined, based at least in part on an intensity difference between the first microphone audio signals and the second microphone audio signals. An elevation angle corresponding to a sound source location may be determined, based at least in part on a temporal difference between the first microphone audio signals and the second microphone audio signals. Output audio data, including at least one audio object corresponding to a sound source, may be generated. The audio object may include audio object signals and associated audio object metadata. The audio object metadata may include at least audio object location data corresponding to the sound source location.
Abstract: The present document describes a method (700) for encoding a multi-channel input signal (201). The method (700) comprises determining (701) a plurality of downmix channel signals (203) from the multi-channel input signal (201) and performing (702) energy compaction of the plurality of downmix channel signals (203) to provide a plurality of compacted channel signals (404). Furthermore, the method (700) comprises determining (703) joint coding metadata (205) based on the plurality of compacted channel signals (404) and based on the multi-channel input signal (201), wherein the joint coding metadata (205) is such that it allows upmixing of the plurality of compacted channel signals (404) to an approximation of the multi-channel input signal (201). In addition, the method (700) comprises encoding (704) the plurality of compacted channel signals (404) and the joint coding metadata (205).
July 2, 2019
June 3, 2021
DOLBY INTERNATIONAL AB, DOLBY LABORATORIES LICENSING CORPORATION
David S. MCGRATH, Michael ECKERT, Heiko PURNHAGEN, Stefan BRUHN
Abstract: In a method to improve backwards compatibility when decoding high-dynamic range images coded in a wide color gamut (WCG) space which may not be compatible with legacy color spaces, hue and/or saturation values of images in an image database are computed for both a legacy color space (say, YCbCr-gamma) and a preferred WCG color space (say, IPT-PQ). Based on a cost function, a reshaped color space is computed so that the distance between the hue values in the legacy color space and rotated hue values in the preferred color space is minimized HDR images are coded in the reshaped color space. Legacy devices can still decode standard dynamic range images assuming they are coded in the legacy color space, while updated devices can use color reshaping information to decode HDR images in the preferred color space at full dynamic range.
Abstract: A method of processing of a sequence of video frames from a camera capturing a writing surface for subsequent transmission to at least one of a remote videoconferencing client and a remote videoconferencing server. The method comprises receiving the sequence of video frames from the camera; and selecting an image area of interest in the video frames, comprising selecting one of a sub-area of the video frames and an entire area of the video frames. The method also comprises, for each current video frame of the sequence of video frames, generating a pen stroke mask by applying adaptive thresholding to the image area of interest. The method also comprises generating an output video frame using the pen stroke mask. Corresponding systems and computer readable media are disclosed.
Abstract: Overlapped block disparity estimation and compensation is described. Compensating for images with overlapped block disparity compensation (OBDC) involves determining if OBDC is enabled in a video bit stream, and determining if OBDC is enabled for one or more macroblocks that neighbor a first macroblock within the video bit stream. The neighboring macroblocks may be transform coded. If OBDC is enabled in the video bit stream and for the one or more neighboring macroblocks, predictions may be made for a region of the first macroblock that has an edge adjacent with the neighboring macroblocks. OBDC can be causally applied. Disparity compensation parameters or modes may be shared amongst views or layers. A variety of predictions may be used with causally-applied OBDC.
Abstract: A method for creating an output soundfield signal from an input soundfield signal, the method including the steps of: (a) forming at least one delayed signals from the input soundfield signal, (b) for each of the delayed signals, creating an acoustically transformed delayed signal, by an acoustic transformation process, and (c) combining together the acoustically transformed delayed signals and the input soundfield signal to produce the output soundfield signal.
Abstract: Methods for encoding and decoding high-dynamic range signals are presented. The signals are encoded in a high frame rate and are accompanied by frame-rate conversion metadata defining a preferred set of frame-rate down-conversion parameters, which are determined according to the maximum luminance of a target display, display playback priority modes, or judder control modes. A decoder uses the frame-rate conversion metadata to apply frame-rate down-conversion to the input high-frame-rate signal according to at least the maximum luminance of the target display and/or the characteristics of the signal itself. Frame-based and pixel-based frame-rate conversions, and judder models for judder control via metadata are also discussed.
Abstract: Example embodiments disclosed herein relate to audio signal processing. A method of indicating a presence of a nuisance in an audio signal is disclosed. The method includes determining a probability of the presence of the nuisance in a frame of the audio signal based on a feature of the audio signal, the nuisance representing an unwanted sound made by a user, in response to the probability of the presence of the nuisance exceeding a threshold, tracking the audio signal based on a metric over a plurality of frames following the frame, determining, based on the tracking, that the presence of the nuisance is to be indicated to the user, and in response to the determination, presenting to the user a notification of the presence of the nuisance. Corresponding system and computer program product are also disclosed.
Abstract: Media input audio data corresponding to a media stream and microphone input audio data from at least one microphone may be received. A first level of at least one of a plurality of frequency bands of the media input audio data, as well as a second level of at least one of a plurality of frequency bands of the microphone input audio data, may be determined. Media output audio data and microphone output audio data may be produced by adjusting levels of one or more of the first and second plurality of frequency bands based on the perceived loudness of the microphone input audio data, of the microphone output audio data, of the media output audio data and the media input audio data. One or more processes may be modified upon receipt of a mode-switching indication.
Abstract: Multiple virtual source locations may be defined for a volume within which audio objects can move. A set-up process for rendering audio data may involve receiving reproduction speaker location data and pre-computing gain values for each of the virtual sources according to the reproduction speaker location data and each virtual source location. The gain values may be stored and used during “run time,” during which audio reproduction data are rendered for the speakers of the reproduction environment. During run time, for each audio object, contributions from virtual source locations within an area or volume defined by the audio object position data and the audio object size data may be computed. A set of gain values for each output channel of the reproduction environment may be computed based, at least in part, on the computed contributions. Each output channel may correspond to at least one reproduction speaker of the reproduction environment.
May 7, 2020
Date of Patent:
May 25, 2021
Dolby Laboratories Licensing Corporation, Dolby International AB
Antonio Mateos Sole, Nicolas R. Tsingos
Abstract: Methods and systems for mapping images from a first dynamic range to a second dynamic range using a set of reference color-graded images and neural networks are described. Given a first and a second image representing the same scene but at a different dynamic range, a neural network (NN) model is selected from a variety of NN models to determine an output image which approximates the second image based on the first image and the second image. The parameters of the selected NN model are derived according to an optimizing criterion, the first image and the second image, wherein the parameters include node weights and/or node biases for nodes in the layers of the selected NN model. Example HDR to SDR mappings using global-mapping and local-mapping representations are provided.
Abstract: Smaller halftone tiles are implemented on a first modulator of a dual modulation projection system. This technique uses multiple halftones per frame in the pre-modulator synchronized with a modified bit sequence in the primary modulator to effectively increase the number of levels provided by a given tile size in the halftone modulator. It addresses the issue of reduced contrast ratio at low light levels for small tile sizes and allows the use of smaller PSFs which reduce halo artifacts in the projected image and may be utilized in 3D projecting and viewing.
Abstract: Embodiments are described for a high-frequency waveguide that improves the performance of large-scale surround sound and immersive audio environments. A horn waveguide is configured to be asymmetric about one of a vertical axis and horizontal axis of the waveguide to form an asymmetric horn waveguide. A spherical enclosure surrounds the asymmetric horn waveguide to form a horn speaker, and a three-axis mounting system is configured to fix the horn speaker to one of a wall or ceiling surface of the venue, wherein the mounting system facilitates rotating the horn speaker to a location that provides maximum coverage of the venue within the passband of the asymmetric horn waveguide.
Abstract: A standard dynamic range (SDR) image is received. Composer metadata is generated for mapping the SDR image to an enhanced dynamic range (EDR) image. The composer metadata specifies a backward reshaping mapping that is generated from SDR-EDR image pairs in a training database. The SDR-EDR image pairs comprise SDR images that do not include the SDR image and EDR images that corresponds to the SDR images. The SDR image and the composer metadata are encoded in an output SDR video signal. An EDR display operating with a receiver of the output SDR video signal is caused to render an EDR display image. The EDR display image is derived from a composed EDR image composed from the SDR image based on the composer metadata.
Abstract: Systems and methods to transmit data over multiple communication channels in parallel with forward error correction. An optimized number is determined to partition a data segment of a given size into the optimized number of original packets of the same size, by reducing the cost of transmitting dummy data added to the original packets due to the partition, the data fields added to communication packets to support decoding, and redundant packets that are expected to be transmitted via a plurality of parallel channels before the termination of the transmission, as well as the computation cost that increases as a function of the number of original packets. Copies of packets are generated by distributing the original packets to the copies as initial packets and generating each subsequent channel-encoded packet by rejecting useless channel-encoded packets in view of packets assumed to have been received prior to the transmission of the subsequent channel-encoded packet.
Abstract: An existing metadata set that is specific to a color volume transformation model is transformed to a metadata set that is specific to a distinctly different color volume transformation model. For example, source content metadata for a first color volume transformation model is received. This source metadata determines a specific color volume transformation, such as a sigmoidal tone map curve. The specific color volume transformation is mapped to a color volume transformation of a second color volume transformation model, e.g., a Bézier tone map curve. Mapping can be a best fit curve, or a reasonable approximation. Mapping results in metadata values used for the second color volume transformation model (e.g., one or more Bézier curve knee points and anchors). Thus, devices configured for the second color volume transformation model can reasonably render source content according to received source content metadata of the first color volume transformation model.
Abstract: An optical filter to increase contrast of an image generated with a spatial light modulator includes a lens for spatially Fourier transforming modulated light from the spatial light modulator, and an optical filter mask positioned at a Fourier plane of the lens to filter the modulated light. The modulated light has a plurality of diffraction orders, and the optical filter mask transmits at least one of the diffraction orders of the modulated light and block a remaining portion of the modulated light. A method that improves contrast of an image generated with a spatial light modulator includes spatially Fourier transforming modulated light from the spatial light modulator onto a Fourier plane, and filtering the modulated light by transmitting at least one diffraction order of the modulated light at the Fourier plane and blocking a remaining portion of the modulated light at the Fourier plane.
Abstract: There are two representations for Higher Order Ambisonics denoted HOA: spatial domain and coefficient domain. The invention generates from a coefficient domain representation a mixed spatial/coefficient domain representation, wherein the number of said HOA signals can be variable. An aspect of the invention further relates to methods and apparatus decoding multiplexed and perceptually encoded HOA signals, including transforming a vector of PCM encoded spatial domain signals of the HOA representation to a corresponding vector of coefficient domain signals by multiplying the vector of PCM encoded spatial domain signals with a transform matrix and de-normalizing the vector of PCM encoded and normalized coefficient domain signals, wherein said de-normalizing comprises.
Lucas E. Saule, Vincent Voron, Peter Michaelian, Guangyu Jin, Kevin J. Kilpatrick, Branko Lukic, Steven Ryutaro Takayama, Grayson H. Byrd, Adam Scott Koniak, Ariel Lauren Fischer, Robert Edward Borchers