Abstract: Techniques are provided to encode and decode image data comprising a tone mapped (TM) image with HDR reconstruction data in the form of luminance ratios and color residual values. In an example embodiment, luminance ratio values and residual values in color channels of a color space are generated on an individual pixel basis based on a high dynamic range (HDR) image and a derivative tone-mapped (TM) image that comprises one or more color alterations that would not be recoverable from the TM image with a luminance ratio image. The TM image with HDR reconstruction data derived from the luminance ratio values and the color-channel residual values may be outputted in an image file to a downstream device, for example, for decoding, rendering, and/or storing. The image file may be decoded to generate a restored HDR image free of the color alterations.
Abstract: Example embodiments disclosed herein relate to impulsive noise suppression. A method of impulsive noise suppression in an audio signal is disclosed. The method includes determining an impulsive noise related feature from a current frame of the audio signal. The method also includes detecting an impulsive noise in the current frame based on the impulsive noise related feature, and in response to detecting the impulsive noise in the current frame, applying a suppression gain to the current frame to suppress the impulsive noise. Corresponding system and computer program product of impulsive noise suppression in an audio signal are also disclosed.
Abstract: M audio input channels, each associated with a spatial direction, are translated to N audio output channels, each associated with a spatial direction, wherein M and N are positive whole integers, M is three or more, and N is three or more, by deriving the N audio output channels from the M audio input channels, wherein one or more of the M audio input channels is associated with a spatial direction other than a spatial direction with which any of the N audio output channels is associated, and at least one of the one or more of the M audio input channels is mapped to a respective set of at least three of the N output channels. At least three output channels of a set may be associated with contiguous spatial directions.
Abstract: Systems and methods are described for automatically framing participants in a video conference using a single camera of a video conferencing system. A camera of a video conferencing system may capture video images of a conference room. A processor of the video conferencing system may identify a potential region of interest within a video image of the captured video images, the potential region of interest including an identified participant. Feature detection may be executed on the potential region of interest, and a region of interest may be computed based on the executed feature detection. The processor may then automatically frame the identified participant within the computed region of interest, the automatic framing including at least one of cropping the video image to match the computed region of interest and rescaling the video image to a desired resolution.
Abstract: Some implementations involve controlling a jitter buffer size during a teleconference according to a jitter buffer size estimation algorithm based, at least in part, on a cumulative distribution function (CDF). The CDF may be based, at least in part, on a network jitter parameter. The CDF may be initialized according to a parametric model. At least one parameter of the parametric model may be based, at least in part, on legacy network jitter information.
Abstract: Described are techniques in video coding and/or decoding that allow for selectively breaking prediction and/or in loop filtering across segment boundaries between different segments of a video picture. A high layer syntax element, such as a parameter set or a slice header, may contain one or more indications signalling to an encoder and/or decoder whether an associated prediction or loop filtering tool may be applied across the segment boundary. In response to such one or more indications, the encoder and/or decoder may then control the prediction or loop filtering tool accordingly.
Abstract: In some embodiments, a method for processing an audio signal in an audio processing apparatus is disclosed. The method includes receiving an audio signal and a parameter, the parameter indicating a location of an auditory event boundary. An audio portion between consecutive auditory event boundaries constitutes an auditory event. The method further includes applying a modification to the audio signal based in part on an occurrence of the auditory event. The parameter may be generated by monitoring a characteristic of the audio signal and identifying a change in the characteristic.
Abstract: For an efficient encoding of subband configuration data the first, penultimate and last subband groups are treated differently than the other subband groups. Further, subband group bandwidth difference values are used in the encoding. The number of subband groups NSB is coded using a fixed number of bits representing NSB?1. The bandwidth value BSB of the first subband group is coded using a unary code representing BSB?1. No bandwidth value BSB[g] is coded for the last subband g=NSB. For subband groups g=2, . . . , NSB?2 bandwidth difference values ?BSB[g]=BSB[g]?BSB[g?1] are coded using a unary code, and the bandwidth difference value ?BSB[NSB?1] for subband group g=NSB?1 is coded using a fixed number of bits.
Abstract: A spatial direction of a wearable device that represents an actual viewing direction of the wearable device is determined. The spatial direction of the wearable device is used to select, from a multi-view image comprising single-view images, a set of single-view images. A display image is caused to be rendered on a device display of the wearable device. The display image represents a single-view image as viewed from the actual viewing direction of the wearable device. The display image is constructed based on the spatial direction of the wearable device and the set of single-view images.
Abstract: Dual and multi-modulator projector display systems and techniques are disclosed. In one embodiment, a projector display system comprises a light source; a controller, a first modulator, receiving light from the light source and rendering a halftone image of said the input image; a blurring optical system that blurs said halftone image with a Point Spread Function (PSF); and a second modulator receiving the blurred halftone image and rendering a pulse width modulated image which may be projected to form the desired screen image. Systems and techniques for forming a binary halftone image from input image, correcting for misalignment between the first and second modulators and calibrating the projector system—e.g. over time—for continuous image improvement are also disclosed.
Abstract: A projector system comprising a laser light source, a collimating lens, a fly-eye lens, an integrating rod and a first modulator is disclosed. The light from a laser light source/fiber illuminates a collimator to substantially collimate the light and then is transmitted through a fly's-eye lens. The fly's-eye lens provides a desired angular/spatial light distribution for further processing to a first modulator of the projector system.
Abstract: A wearable device comprises a left view optical stack for a viewer to view left view cinema display images rendered on a cinema display and a right view optical stack for the viewer to view right view cinema display images rendered on the cinema display. The left view cinema display images and the right view cinema display images form stereoscopic cinema images. The wearable device further comprises a left view imager that renders left view device display images, to the viewer, on a device display, and a right view imager that renders right view device display images, to the viewer, on the device display. The left view device display images and the right view device display images form stereoscopic device images complementary to the stereoscopic cinema images.
Abstract: Projection systems and/or methods comprising a blurring element are disclosed In one embodiment, a blurring element may comprise a first plate having a pattern on a first surface and second plate. The first plate and the second plate may comprise material having a slight difference in their respective index of refraction. In another embodiment, a blurring element may comprise a first plate having a pattern thereon and a second immersing material. The blurring element may be placed in between two modulators in a dual or multi-modulator projector system. The blurring element may be configured to give a desired shape to the light transmitted from a first modulator to a second modulator.
Abstract: At a first time point, a first light capturing device at a first spatial location in a three-dimensional (3D) space captures first light rays from light sources located at designated spatial locations on a viewer device in the 3D space. At the first time point, a second light capturing device at a second spatial location in the 3D space captures second light rays from the light sources located at the designated spatial locations on the viewer device in the 3D space. Based on the first light rays captured by the first light capturing device and the second light rays captured by the second light capturing device, at least one of a spatial position and a spatial direction, at the first time point, of the viewer device is determined.
Abstract: Diffuse or spatially large audio objects may be identified for special processing. A decorrelation process may be performed on audio signals corresponding to the large audio objects to produce decorrelated large audio object audio signals. These decorrelated large audio object audio signals may be associated with object locations, which may be stationary or time-varying locations. For example, the decorrelated large audio object audio signals may be rendered to virtual or actual speaker locations. The output of such a rendering process may be input to a scene simplification process. The decorrelation, associating and/or scene simplification processes may be performed prior to a process of encoding the audio data.
June 14, 2018
October 11, 2018
DOLBY LABORATORIES LICENSING CORPORATION, DOLBY INTERNATIONAL AB
Dirk Jeroen BREEBAART, Lie LU, Nicolas R. TSINGOS, Antonio MATEOS SOLE
Abstract: A computer implemented system for rendering captured audio soundfields to a listener comprises apparatus to deliver the audio soundfields to the listener. The delivery apparatus delivers the audio soundfields to the listener with first and second audio elements perceived by the listener as emanating from first and second virtual source locations, respectively, and with the first audio element and/or the second audio element delivered to the listener from a third virtual source location. The first virtual source location and the second virtual source location are perceived by the listener as being located to the front of the listener, and the third virtual source location is located to the rear or the side of the listener.
Abstract: Embodiments are described for a soundfield system that receives a transmitting soundfield, wherein the transmitting soundfield includes a sound source at a location in the transmitting soundfield. The system determines a rotation angle for rotating the transmitting soundfield based on a desired location for the sound source. The transmitting soundfield is rotated by the determined angle and the system obtains a listener's soundfield based on the rotated transmitting soundfield. The listener's soundfield is transmitted for rendering to a listener.
Abstract: Teleconference audio data including a plurality of individual uplink data packet streams, may be received during a teleconference. Each uplink data packet stream may corresponding to a telephone endpoint used by one or more teleconference participants. The teleconference audio data may be analyzed to determine a plurality of suppressive gain coefficients, which may be applied to first instances of the teleconference audio data during the teleconference, to produce first gain-suppressed audio data provided to the telephone endpoints during the teleconference. Second instances of the teleconference audio data, as well as gain coefficient data corresponding to the plurality of suppressive gain coefficients, may be sent to a memory system as individual uplink data packet streams. The second instances of the teleconference audio data may be less gain-suppressed than the first gain-suppressed audio data.
Abstract: Embodiments of the present invention relate to signal processing. Methods for enhancing intelligibility of speech content in an audio signal are disclosed. One of the methods comprises obtaining reference loudness of the audio signal. The method further comprises enhancing the intelligibility of the speech content by adjusting partial loudness of the audio signal based on the reference loudness and a degree of the intelligibility. Corresponding systems and computer program products are also disclosed.