Hybrid Permanent/Reversible Dynamic Range Control System

Info

Publication number: 20100286988
Type: Application
Filed: May 6, 2010
Publication Date: Nov 11, 2010
Inventors: Tim J. Carroll (Lancaster, PA), Leif Claesson (Lancaster, PA)
Application Number: 12/775,319

Abstract

A technique for controlling audio dynamic range in a manner that can be permanent, reversible, or anywhere in between, and can accomplish this goal in the baseband PCM or encoded domains.

Description

Description

This application claims priority to U.S. Provisional Application No. 61/175,853, filed May 6, 2009, the disclosure of which is incorporated by reference herein in its entirety.

BACKGROUND OF THE INVENTION

This patent application describes a novel technique for controlling audio dynamic range in a manner that can be permanent, reversible, or anywhere in between, and can accomplish this goal in the baseband PCM or encoded domains.

Modern distribution of audio signals to consumers necessarily involves the use of data rate reduction or audio compression techniques to lower the required amount of data required to deliver these audio signal to consumers while causing minimal impact to the original audio quality. Systems including AC-3, DTS, MPEG-2 AAC and HE AAC are examples of common audio data reduction techniques. For the purposes of this invention, only the AC-3 system will be used as an example, but the invention is applicable to any coding system and is applicable to television, radio, internet, or any other means of program distribution or transmission.

Audio metadata, also known as data about the audio data, is also included with these systems to describe the encoded audio. This data is multiplexed in with the compressed audio data and delivered to consumers where it is extracted and applied to the audio in a user-adjustable manner.

One such metadata parameter is called dialnorm and is intended to control average loudness of a program

Other parameters such as dynrng and compr, collectively referred to as DRC, are intended to control program dynamic range.

Programs are in many cases produced with loudness and dynamic range that varies to convey emotion or the level of excitement in a given scene, while interstitial or commercial material is very often produced to convey a message and may be at a constant loudness.

In some cases these program and commercial elements can differ substantially in average loudness and dynamic range and many consumer environments are not conducive to large changes in loudness or dynamic range.

Artistic intent while perhaps appropriate in more carefully controlled situations can cause audibility problems and result in viewer or listener complaints. This is commonly referred to as the “loud commercial problem” but can be caused as much by excessive dynamic range as mismatched loudness.

An additional complicating factor is the desire and sometimes the legal requirement for maintaining the integrity of the original audio as some viewers and even regulatory bodies may require that the program audio not be changed in any way. Because of this processes applied to the audio should be reversible.

Prior art has described two general types of systems capable of controlling audio dynamic range: AGC-type systems that detect and adjust the level of applied audio signals in a permanent and non-reversible manner, effectively controlling loudness shifts and dynamic range to a degree acceptable to most consumers. An example of this type of system is a standard transmission processor commonly found in analog broadcast facilities and details of which are common knowledge to those skilled in the art.

Systems that use side-chain data or metadata to allow the original audio to be carried to consumers and be modified by the metadata to match the requirements of individual consumers allowing a reasonable degree of control to be applied to the reproduced audio signal, or allowing the audio signal to be reproduced in its original form with no control applied. An example of the latter system can be found in the AC-3 system.

The current invention offers a hybrid of the two approaches, allowing a continuously variable choice of which method is being applied from permanent to reversible.

SUMMARY OF THE INVENTION

The current invention described in this application describes a method whereby the dynamic range of an input audio signal can be modified in a permanent or reversible manner, or an infinitely adjustable hybrid between permanent and reversible.

In one embodiment, the invention discloses a method for controlling the dynamic range of an audio signal in a hybrid permanent/reversible manner, the method comprising:

applying original audio to a detector and generating a control signal;

applying the same original audio to a first gain control element;

producing a permanently controlled output signal by varying this first gain control element with the control signal to raise or lower the level of the signal so that the loudest and quietest parts are brought closer to a target level;

applying the same control signal to a block formatter to match the capabilities of an audio encoder;

creating an inverse of this block formatted gain control signal;

passing this inverse block formatted signal through a control element to allow all, some, or none of the inverse block formatted signal to pass;

producing “remainder audio” by applying the permanently controlled output signal to a second gain control element to “un-apply” the actions of the original gain control within the boundaries of the block formatted signal;

applying this remainder audio to an audio encoder; delaying the non-inverse block based control signal; and

using this delayed version of the non-inverse block based control signal as part of the encoding process representing one or more metadata elements;

when delivered to a corresponding decoder along with the remainder audio, reversible gain control can then be applied, somewhat applied, or not applied at all.

Other prior work has described methods where the dynamic range of an applied audio signal can be directly and permanently adjusted by detecting the level of the audio signal and generating a control signal that is used to adjust the gain of the audio higher if it is lower than some reference or to adjust the gain of the audio signal lower if it is higher than some reference, a process commonly known as Automatic Gain Control (AGC).

Still other prior work has described methods where the dynamic range of an applied audio signal can be indirectly and reversibly adjusted by detecting the level of the audio signal and generating a control signal that is passed as metadata along with the original audio to some receiving or decoding device where the control signal can be applied directly to adjust the gain of the audio higher if it is lower than some reference or to adjust the gain of the audio signal lower if it is higher than some reference. This control signal can also be scaled before application to produce less or more control of the audio signal, or the control signal can be ignored thus resulting in no change to the original audio. One use of this process is described in ATSC Standard A/52: Digital Audio Compression (AC-3).

The current invention is fundamentally different from other prior work in that it is a hybrid between permanent change to applied audio and change that is reversible and allows selection of any combination of the two approaches thus providing a minimum and maximum dynamic range on a continuously adjustable basis.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate various example systems, methods, and so on, that illustrate various example embodiments of aspects of the invention. It will be appreciated that the illustrated element boundaries (e.g., boxes, groups of boxes, or other shapes) in the figures represent one example of the boundaries. One of ordinary skill in the art will appreciate that one element may be designed as multiple elements or that multiple elements may be designed as one element. An element shown as an internal component of another element may be implemented as an external component and vice versa. Furthermore, elements may not be drawn to scale.

FIG. 1 illustrates an example traditional AGC System.

FIG. 2 illustrates an example metadata-based AGC system.

FIG. 3 illustrates a block diagram of an example system for controlling dynamic range in hybrid permanent/reversible manner.

FIG. 4 illustrates a block diagram of an example system for controlling dynamic range in hybrid permanent/reversible manner.

FIG. 5 illustrates a block diagram of an example system for controlling dynamic range in hybrid permanent/reversible manner.

FIG. 6 illustrates a block diagram of an example system for controlling dynamic range in hybrid permanent/reversible manner.

FIG. 7 illustrates a block diagram of an example system for controlling dynamic range in hybrid permanent/reversible manner.

FIG. 8 illustrates a block diagram of an example multiband AGC system.

DETAILED DESCRIPTION

FIG. 1 depicts a traditional AGC system where input audio (1) is passed to a detector (2) and to a variable gain element (3), and the detector creates a control signal (4) which is fed to the control input of the variable gain element to lower the level of the input signal if the level is higher than some reference, or to raise the level of the input signal if the level is lower than some reference therefore producing an output (5) where the lowest and highest levels are closer to each other, thus lowering the dynamic range. It should be noted that this type of AGC is commonly known as a feed-forward AGC, and that an alternate version where the control signal is detected after the gain element and fed back to the gain control element is commonly known as a feed-back AGC. Either of these methods should be seen as systems that permanently change the audio.

FIG. 2 depicts a simplified metadata-based AGC system where input audio (1) is detected (2) and the control data (3) multiplexed (4) with the data. This composite data stream (5) is sent to a demultiplexer (6) which outputs the audio data and control data. The multiplexer and demultiplexer are generally known to be parts of systems such as digital television encoders and decoders. The control data can then be selectively used to vary a gain element (7) to adjust the level of the audio signal and control the dynamic range. This control signal can also be scaled to apply more or less control or can be ignored completely (8) allowing the original audio to be reproduced unmodified and this method can be considered one that is reversible.

FIG. 3 depicts one embodiment of the current invention. Input audio (1), which can be a single channel, stereo, or as shown 5.1 channels is applied to an AGC means (2). The AGC means operates by detecting the input audio and generating a control signal that is used to vary a gain element to lower the level of the input signal if the level is higher than some reference, or to raise the level of the input signal if the level is lower than some reference therefore moving the lowest and highest levels closer to each other and thus outputting audio (3) with an adjusted dynamic range to a variable gain element (4). The control signal developed for the AGC process is also output (5) and applied to a block formatter (6) which will create gain control values on a block basis, matching the capabilities of the final encoder. These gain control blocks are then applied to a means that creates an inverse of these gain control blocks (7) and applies them to the control input of the gain element after passing through a control element (8) to allow all, some, or none of the block gain control signal to pass. It should be noted that the block formatting process can be applied as shown, as part of the variable gain element, or a combination of both. This inverse application of the block gain control signal by the gain element to audio that had already been changed by the non-inverse version of the original control signal results in the “un-application” of the control signal within the accuracy of the block formatting process. This so-called “remainder” audio (9) has the useful property of being able to be returned to its processed state or back to its unprocessed state within the boundaries of the block processing by applying all, some, or none of the block-based control signal. This audio is applied to an encoder (10), such as one described in ATSC A/52 and the block-based gain control signal (11) is first delayed (12) and then is multiplexed (13) into the encoded bitstream as gain control words such as compr, dynrng and/or dialnorm as described in ATSC A/52.

FIG. 4 depicts another embodiment of the current invention. Input audio (1), which can be a single channel, stereo, or as shown 5.1 channels is applied to an AGC means which operates by detecting the input audio and generating a control signal that is used to vary a gain element to lower the level of the input signal if the level is higher than some reference, or to raise the level of the input signal if the level is lower than some reference therefore moving the lowest and highest levels closer to each other and thus outputting audio (3) with an adjusted dynamic range to a variable gain element (4). The control signal developed for the AGC process is also output (5) and applied to a block formatter (6) which will create gain control values on a block basis, matching the capabilities of the final encoder. These gain control blocks are then applied to a means that creates an inverse of these gain control blocks (7) and applies them to the control input of the gain element after passing through a control element (8) to allow all, some, or none of the block gain control signal to pass. It should be noted that the block formatting process can be applied as shown, as part of the variable gain element, or a combination of both. This inverse application of the block gain control signal by the gain element to audio that had already been changed by the non-inverse version of the original control signal results in the “un-application” of the control signal within the accuracy of the block formatting process. This so-called “remainder” audio (9) has the useful property of being able to be returned to its processed state or back to its unprocessed state within the boundaries of the block processing by applying all, some, or none of the block-based control signal. This audio is applied to an encoder (10), such as one described in ATSC A/52 and the block-based gain control signal (11) is first delayed (12) then input to the encoder as a metadata signal (13).

FIG. 5 depicts yet another embodiment of the current invention. Input audio (1) is in the AC-3 encoded form and is first applied to an AC-3 decoder (2) to produce decoded PCM audio signals (3) which can be mono, stereo or 5.1 channels as shown. These audio signals are then applied to an AGC means (4) which operates by detecting the input audio and generating a control signal that is used to vary a gain element to lower the level of the input signal if the level is higher than some reference, or to raise the level of the input signal if the level is lower than some reference therefore moving the lowest and highest levels closer to each other and thus outputting audio (5) with an adjusted dynamic range to a variable gain element (6). The control signal developed for the AGC process is also output (7) and applied to a block formatter (8) which will create gain control values on a block basis, matching the capabilities of the final encoder. These gain control blocks are then applied to a means that creates an inverse of these gain control blocks (9) and applies them to the control input of the gain element after passing through a control element (10) to allow all, some, or none of the block gain control signal to pass. It should be noted that the block formatting process can be applied as shown, as part of the variable gain element, or a combination of both. This inverse application of the block gain control signal by the gain element to audio that had already been changed by the non-inverse version of the original control signal results in the “un-application” of the control signal within the accuracy of the block formatting process. This so-called “remainder” audio (11) has the useful property of being able to be returned to its processed state or back to its unprocessed state within the boundaries of the block processing by applying all, some, or none of the block-based control signal. This audio is applied to an encoder (12), such as one described in ATSC A/52 and the block-based gain control signal (13) is first delayed (14) and then is multiplexed (15) into the encoded bitstream as gain control words such as compr, dynrng and/or dialnorm as described in ATSC A/52.

FIG. 6 depicts yet another embodiment of the current invention. Input audio (1) is in the AC-3 encoded form and is applied both to a delay means (2) and to an AC-3 decoder (3) to produce decoded PCM audio signals (4) which can be mono, stereo or 5.1 channels as shown. These audio signals are then applied to an AGC means (5) which operates by detecting the input audio and generating a control signal that is used to vary a gain element to lower the level of the input signal if the level is higher than some reference, or to raise the level of the input signal if the level is lower than some reference therefore moving the lowest and highest levels closer to each other and thus outputting audio (6) with an adjusted dynamic range to a variable gain element (7). The control signal developed for the AGC process is also output (8) and applied to a block formatter (9) which will create gain control values on a block basis, matching the capabilities of the final encoder. These gain control blocks are then applied to a means that creates an inverse of these gain control blocks (10) and applies them to the control input of the gain element after passing through a control element (11) to allow all, some, or none of the block gain control signal to pass. It should be noted that the block formatting process can be applied as shown, as part of the variable gain element, or a combination of both. This inverse application of the block gain control signal by the gain element to audio that had already been changed by the non-inverse version of the original control signal results in the “un-application” of the control signal within the accuracy of the block formatting process. This so-called “remainder” audio (12) has the useful property of being able to be returned to its processed state or back to its unprocessed state within the boundaries of the block processing by applying all, some, or none of the block-based control signal. This audio is applied to an AC-3 encoder (13), such as one described in ATSC A/52 and the block-based gain control signal (14) is first delayed (15) and then is sent with the delayed original AC-3 input signal (16) and the newly created AC-3 signal (17) to the multiplexer (18). It is then possible to compare and modify the original encoded audio data blocks to more closely match the newly encoded data blocks to allow for a more accurate representation of the so-called remainder audio, essentially allowing audio modification without fully decoding and re-encoding.

FIG. 7 depicts still yet another embodiment of the current invention. Input audio (1) is in the AC-3 encoded form and is applied both to a delay means (2) and to an AC-3 decoder (3) to produce decoded PCM audio signals (4) which can be mono, stereo or 5.1 channels as shown. These audio signals are then applied to an AGC means (5) that detects the input audio and generates a control signal (6) that is applied to a block formatter (7) which will create gain control values on a block basis, matching the capabilities of the final encoder. This block formatted control (8) signal is applied with the delayed original AC-3 input signal (9) to the multiplexer (10) where existing compr, dynrng and/or dialnorm control words will be replaced. This method allows for insertion of gain control information into a previously encoded bitstream without the need to decode and re-encode the signal.

FIG. 8 depicts a more sophisticated AGC means where the input audio (1) is first adjusted in average level by Input AGC (2), then is split into a multiplicity of bands by crossovers (3), shown here as five bands but can be any number of bands, and each band then has its own AGC (4) specifically optimized for the range of frequencies it is controlling. Each band of frequencies is then applied to its own limiter (5) and then the bands are summed (6) and applied to an overall peak limiter (7). Each of these sections (2), (4), (5), and (6), also outputs a control signal, all of which are summed into a final composite control signal (8). The functionality of this drawing can be inserted as the AGC means shown on any of the other drawings in the description of this invention.

It should be noted that the invention described here can work alone or in tandem with additional audio processing, and can operate in the baseband PCM or compressed domains such as AC-3, DTS, MPEG, and others via standard gain adjustments or metadata manipulation.

It should be noted that this process can operate in real-time, faster than real-time in a software or hardware or hybrid software/hardware implementation, or slower than real time in a software or hardware or hybrid software/hardware implementation.

It should be noted that unlike prior art, implementation of this invention allows for control of dynamic range in a reversible manner, in a permanent manner, or anywhere in between reversible and permanent. In the reversible manner, adjustments made to the audio are done via control data sent alongside the original audio in the form of metadata which can be applied fully, in a scaled manner, or not at all but where the original audio is delivered separately and intact. In the permanent manner, the audio is fully processed before encoding and control data sent alongside the original audio is fixed at a constant value such that there will be no difference between applying it fully or not applying it at all. In the hybrid case, part of the adjustment of the audio is done in a permanent manner, while the remaining part is done in a reversible manner allowing partial reversibility.

While example systems, methods, and so on, have been illustrated by describing examples, and while the examples have been described in considerable detail, it is not the intention of the applicants to restrict or in any way limit scope to such detail. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the systems, methods, and so on, described herein. Additional advantages and modifications will readily appear to those skilled in the art. Therefore, the invention is not limited to the specific details, the representative apparatus, and illustrative examples shown and described. Thus, this application is intended to embrace alterations, modifications, and variations that fall within the scope of the appended claims. Furthermore, the preceding description is not meant to limit the scope of the invention. Rather, the scope of the invention is to be determined by the appended claims and their equivalents.

To the extent that the term “includes” or “including” is employed in the detailed description or the claims, it is intended to be inclusive in a manner similar to the term “comprising” as that term is interpreted when employed as a transitional word in a claim. Furthermore, to the extent that the term “or” is employed in the detailed description or claims (e.g., A or B) it is intended to mean “A or B or both”. When the applicants intend to indicate “only A or B but not both” then the term “only A or B but not both” will be employed. Thus, use of the term “or” herein is the inclusive, and not the exclusive use. See, Bryan A. Garner, A Dictionary of Modern Legal Usage 624 (2d. Ed. 1995).

Claims

1. A method for controlling the dynamic range of an audio signal in a hybrid permanent/reversible manner, the method comprising:

applying original audio to a detector and generating a control signal;

applying the same original audio to a first gain control element;

producing a permanently controlled output signal by varying this first gain control element with the control signal to raise or lower the level of the signal so that the loudest and quietest parts are brought closer to a target level;

applying the same control signal to a block formatter to match the capabilities of an audio encoder;

creating an inverse of this block formatted gain control signal;

passing this inverse block formatted signal through a control element to allow all, some, or none of the inverse block formatted signal to pass;

producing “remainder audio” by applying the permanently controlled output signal to a second gain control element to “un-apply” the actions of the original gain control within the boundaries of the block formatted signal;

applying this remainder audio to an audio encoder;

delaying the non-inverse block based control signal; and

using this delayed version of the non-inverse block based control signal as part of the encoding process representing one or more metadata elements.