SYSTEM AND METHOD FOR CREATING CROSSTALK CANCELED ZONES IN AUDIO PLAYBACK

Info

Publication number: 20190110152
Type: Application
Filed: Oct 11, 2018
Publication Date: Apr 11, 2019
Patent Grant number: 10531218
Inventors: Wai-Shan Lam (Hong Kong), Daniel Weiss (Uster), Tiziano Leidi (Dino), Alberto Vancheri (Bellinzona)
Application Number: 16/157,330

Abstract

A system of crosstalk cancelled zone creation in audio playback comprising: main transducers emitting stereo soundwaves of an audio playback; a local system comprising at least two or more close-proximity-transducers (CPTs), each is arranged proximal to one of left and right-side ear canals of a listener. Each of the CPTs comprises: a position tracking device for tracking the relative positions of the main transducers to the CPT and the other CPTs; a control unit for receiving the relative position data from the position tracking device and generating control signal according to the relative position data for the generation of crosstalk cancellation (XTC) soundwaves. Each of the CPTs is configured to generate XTC soundwaves corresponding to the stereo soundwaves arriving at the corresponding ear of the listener. The generated XTC soundwaves are synchronized with the audio playback and with respect to the relative positions.

Description

Description

CROSS-REFERENCE WITH RELATED APPLICATION(S)

The present application claims priority to U.S. Provisional Application No. 62/571,234 filed Oct. 11, 2017, the disclosure of which is incorporated herein by reference in its entirety.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material, which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.

FIELD OF THE INVENTION

This invention generally pertains to the field of reproduction of 3D realistic sound, and particularly to crosstalk cancellation (XTC) methods and systems.

BACKGROUND

Normal humans are able to hear and localize sounds coming from all directions and distances because the soundwaves reaching the left and right ears each on one side of a human head have time delays, which are known as Interaural Time Differences (ITDs), and/or volume differences, which are known as Interaural Level Differences (ILDs). The brain can interpret and determine the sound spatial origin with these auditory cues and perceive sound in three-dimensions (3D).

Based on this concept, binaural recording of sound uses two microphones arranged in way mimicking a pair of normal human left and right ears to generate a sound recording embedded with 3D audio cues with the intent to create a 3D audio experience for the listener of the playback of the sound recording (also known as “dummy head recording”). The problem, however, is in the playback or reproduction of the 3D audio recording using commonly available stereo transducers. Even when the recorded left and right audio channel signals are playback separately from the left and right transducers respectively, the soundwaves corresponding to the left audio channel signal cannot be assured to reach only the listener's left ear, and vice versa for the right audio channel signal. As the time delay and/or volume differences information recorded with the original sound cannot be reproduced perfectly at the listener's left and right ears the listener cannot experience the 3D sound effect. This phenomenon is called crosstalk. FIG. 1 illustrates this crosstalk phenomenon.

A number of existing techniques have been proposed to cancel this crosstalk so to reproduce an uncorrupted 3D audio experience for a listener. Crosstalk Cancellation (XTC) can be achieved by playing back binaural material over speakers (BAL) or headphones (BAH). Most of the BAL techniques involve effecting XTC by manipulating the time domain and/or audio frequency spectrum of the input audio signals, essentially creating a XTC filter. The audio frequency spectrum manipulation can be done by adjusting variables of the XTC filter to match the response of a sound reproduction system, which includes a pair of transducers, the room within which the reproduction is made, the location of the listener in the room, and in some cases even the size and shape of the listener's head. In some implementations, the adjustment is done automatically by first measuring the response of the sound reproduction system. Then, using the inversion of this system response to convolve with the input audio signals to the transducers to remove the system response. FIG. 2 provides a simplified illustration of the working of the XTC filter in a sound reproduction system.

The biggest challenge with BAL is the influence of the listening room. Early reflections and reflections in general, will all deteriorate the level of crosstalk cancellation that an XTC algorithm can achieve in real life. One can try to mitigate the issue of reflections by either deadening the room with broadband absorbers, or using speakers with a narrow dispersion pattern (significant level drop-off off-axis). In many real-life implementations, neither solution is practical. Then there is the problem of a single sweet spot. Even though XTC can be used in combination with listener head-tracking, it is essentially still a single sweet spot. There is really no freedom of movement for the listener to speak of. Multiple XTC sweet spots is possible by using Phase Array or beam forming techniques, but the design becomes extremely complex and very costly to implement. Such system may be able to provide a few sweet spots, but not feasible in an environment such as a movie theatre.

The BAH techniques involve a general or individualized Head Related Transfer Function (HRTF) being convolved with the audio signal in order to trick the human brain into perceiving sound in 3D. However, the 3D sound experience in BAH is still not as convincing as BAL. Visual cues are often necessary as aid to trick the brain into believing that the sound is in true 3D. The effect generated by BAH techniques ultimately lack the ‘physicality’ of sound that one can experience with BAL. BAH is also extremely difficult to implement due to the highly individualized HRTF.

FIG. 3 illustrates an exemplary embodiment of a sound reproduction system with XTC filter. However, one common drawback of these XTC techniques in practice is that they require the listener to be at a single location that is unobstructed from the transducers (sweet-spot) and remain stationary, or the location of the listener must be known to or tracked by the system throughout the whole audio playback in order to achieve the ideal 3D audio experience.

SUMMARY OF THE INVENTION

The present invention provides a method and a system that provide one or more localized crosstalk-canceled zones for 3D audio reproduction. It is an objective of the present invention that such method and system can be applied to small audio reproduction environments such as home, as well as large scale audio reproduction environments such as indoor and outdoor theatres such that multiple audiences can experience the same ideal 3D sound effect in different location of the theatre.

In accordance to one aspect, one or more transducers separate from the primary transducers are used to generate standalone XTC sound signals that are synchronized with the primary sound signals generated from the primary transducers when reaching the listener's ears.

In accordance to one embodiment of the present invention, provided is a realistic 3D sound reproduction using close-proximity-transducers (CPTs) associated to each listener that allows multiple crosstalk cancellation zones in a stereo sound reproduction environment. The CPTs are XTC soundwave-generating transducers that are specifically made compact transducer that the listener wears near or suspended over her ears (one transducer for each ear) and arranged in a way that does not impede the listener listening to the primary sound from the primary transducers in the stereo sound reproduction environment. In this stereo sound reproduction environment, listeners can receive ipsilateral channel of a stereo signal freely, such to experience a realistic 3D audio scene. Optionally, as the CPTs are wore on the listener, the listener's position can be tracked during playback. This way, the response of the system can be measured continuously and the XTC soundwaves can be adjusted accordingly. As such, the listener is not required to be fixed and stationary throughout the audio reproduction.

In accordance to one embodiment, provided is a system of crosstalk cancelled zone creation in audio playback that comprises two or more main transducers emitting stereo soundwaves of an audio playback; a local system comprising at least one or more CPTs configured proximal to both left and right-side ear canals of a listener, wherein each of the CPTs comprises: a position tracking device tracking the relative positions of main transducers to the CPT and other CPTs; a control unit for receiving the relative position data from the position tracking device; wherein the control unit is configured to process the relative position data and cause the CPT to generate the XTC soundwaves corresponding to the stereo soundwaves arriving at the corresponding listener's ear; wherein the XTC soundwaves generated is synchronized with the audio playback and with respect to the relative position.

In accordance to one embodiment, the position tracking device further tracks the relative position of other local systems; that the position tracking device adopts one or more wireless communication technologies and standards including, but not limited to, Bluetooth and WiFi, and specifically the associated signal triangulation techniques in tracking the relative positions; that the control unit additionally causes the CPT to emit correction signals; and that the CPT set is installed or integrated in furniture.

In accordance to an alternative embodiment, one or more of the CPT is connected to a microphone that is placed near the corresponding listener's ear. The microphone is configured to receive and measure the soundwaves of the audio playback and generate the measurement data input signal for the CPT's control unit. This configuration may optionally replace the position tracking device and the use of the relative position data in the processing and generation of the XTC soundwaves.

BRIEF DESCRIPTION OF DRAWINGS

Embodiments of the invention are described in more detail hereinafter with reference to the drawings, in which:

FIG. 1 illustrates the condition of a listener listening conventional stereo audio reproduced using two loudspeakers without XTC;

FIG. 2 illustrates the condition of a listener listening conventional XTC audio reproduced using two loudspeakers;

FIG. 3 depicts an exemplary embodiment of a conventional audio system with XTC filter;

FIG. 4 illustrates the arrangement of a listener listening to an audio reproduction using two loudspeakers and two XTC transducers in accordance to one embodiment of the present invention;

FIG. 5 provides an illustration of the localized XTC zones; and

FIG. 6 provides a close-up view of the illustration of FIG. 5.

DETAILED DESCRIPTION

In the following description, systems and methods for creating crosstalk cancelled zones in audio playback and the likes are set forth as preferred examples. It will be apparent to those skilled in the art that modifications, including additions and/or substitutions may be made without departing from the scope and spirit of the invention. Specific details may be omitted so as not to obscure the invention; however, the disclosure is written to enable one skilled in the art to practice the teachings herein without undue experimentation.

The present invention provides a method and a system that provide one or more localized crosstalk-canceled zones (LXCZ) for 3D audio reproduction. It is an objective of the present invention that such method and system can be applied to small audio reproduction environments such as home, as well as large scale audio reproduction environments such as indoor and outdoor theatres such that multiple audiences can experience the same ideal 3D sound effect in different location of the theatre.

In accordance to one aspect, one or more transducers separate from the primary transducers are used to generate standalone XTC sound signals that are synchronized with the primary sound signals generated from the primary transducers when reaching the listener's ears. FIG. 4 provides a simplified illustration of this concept.

In one embodiment, the XTC soundwave-generating transducers are specifically made compact transducer that the listener wears near or suspended over her ears (one transducer for each ear) and arranged in a way that does not impede the listener listening to the primary sound from the primary transducers. Optionally, as the XTC soundwave-generating transducers are wore on the listener, the listener's position can be tracked using a position tracking device embedded in the XTC soundwave-generating transducer during playback. This way, the response of the system can be measured continuously and the XTC soundwaves can be adjusted accordingly. As such, the listener is not required to be stationary throughout the audio reproduction.

In accordance to an alternative embodiment, one or more of the XTC soundwave-generating transducer is connected to a microphone that is placed near the corresponding listener's ear. The microphone is configured to receive and measure the primary sound and generate the measurement data input signal for the CPT's control unit. This configuration may optionally replace the position tracking device and the use of the position information of the listener in the processing and generation of the XTC soundwaves.

In the following, the various systems and methods of present invention are described by mathematical formulae, where ideal localized crosstalk cancellation zone creation and the relationships are defined.

Fundamental Formulation of the System

Consider an acoustic environment Q containing n local systems Q_j, 1≤j≤n and m point acoustic sources S_i, 1≤i≤m, where both i and j are integers equal to or greater than 1.

The acoustic environment Ω can be either a closed room or an open space with different walling and environmental structures. Each local system Q_jcomprises: a set of receivers, wherein the position of k-th receiver of the system Q_jis by {right arrow over (r)}_jk^(rec)(t) at time t, and wherein examples of receivers include the listener's ears and microphones; a set of local proximity transducers (CPT) that emit a local sound field, wherein the position of l-th transducer of the system Q_jis by {right arrow over (r)}_jl^(rec)(t) at time t, and wherein examples of transducers include over-ear, on-ear, and in-ear headphones, ear-buds, other types of wearable speakers, fixed and portable loudspeakers.

All acoustic sources S_i, 1≤i≤m, produce an acoustic field p({right arrow over (r)}, t), {right arrow over (r)}∈Ω. The acoustic pressure signal at the position of the k-th receivers of the system Q_jis p_jk(t)=p({right arrow over (r)}_jk^(rec)(t), t). The acoustic pressure signals p_jk(t) for the different values of k will determine the acoustic experience (in the case of a human user) reproduced by the system Q_j. The realistic 3D sound reproduction defined as a set of target signals {tilde over (p)}_jk(t) is to be received by the receiver. The target signals {tilde over (p)}_jk(t) can also be defined as the acoustic pressure signals received in a referential situation (e.g. a concert hall) that are emulated with the audio sources S_i. The target signals {tilde over (p)}_jk(t) can represent a real acoustic environment (e.g. listening to a live orchestra in the concert hall), or manipulated audio (e.g. real recordings with modified or added features) or completely artificial sound. Thus, the differences between the target signals {tilde over (p)}_jk(t) and the acoustic pressure signals p_jk(t) are the correction signals Δp_jk(t) which is represented by:

Δp_jk(t)={tilde over (p)}_jk(t)−p_jk(t)

The correction signals are obtained by means of the CPTs. The l-th CPT associated to the system Q_jemit a signal x_jl(t) such that the correction signal Δp_jk(t) is received at the k-th receiver.

Configuration Parameters

The signals x_jl(t) emitted by the CPTs generally depend on the relative position, represented by {right arrow over (r)}_jk^(rec)(t)−{right arrow over (r)}_jl^(tr)(t), of the receiver with respect to the transducers and the acoustic properties of the environment, including the positions of other systems and the component body of the current system. All quantities are time-dependent. For these reasons, each system Q_jcomputes a vector q_j(t) of the time-dependent internal variables in order to compute the signals x_jl(t) to be emitted. These variables includes: the degree of freedom describing the spatial configuration of the body of the system Q_j; other internal parameters of the system, for example, in a time-independent framework for human users, the Head Related Transfer Function (HRTF); and environmental data that influence the propagation of sound from the audio sources S_ias, in a time-independent framework, the environmental transfer functions. These variables enable the reconstruction of at least the relative positions {right arrow over (r)}_jk^(rec)(t)−{right arrow over (r)}_jl^(tr)(t) of the listener with respect to the transducers. The data collected by the sensors associated with the system enable the real time computation of the vector q_j(t).

Generation of the Correction Signals

Each local system Q_jis associated with a multiple-input and multiple-output (MIMO) linear time-variant system (LTV) L_jthat computes the output signal x_jl(t) of the corresponding transducers needed to obtain the desired correction signals Δp_jk(t). Time variance is required as the system works in time-varying conditions. Hence, the input and output signals of the LTV L are the correction signals Δp_jk(t) and the signals x_jl(t) to be generated by the transducers respectively. Here, the indexes k and l run over the set of receiver (listener(s)' ear(s)) and the set of transducers respectively of a single system Q_j. If a multichannel signal Δp_j(t) with one channel for each listener j and a multichannel signal x_j(t) with one channel for each listener j, the functional relation between input and output can be described as:

x_j(t)=L_j[Δp_j(t);q_j(t)]

where q_j(t) is the vector of the time-dependent parameters defined above.

Locality of the Cancellation Process

The functional relation defined above, together with the restrictions on the parameters q_j(t) described, imply that the process is local. This means the target signal {tilde over (p)}_jk(t) imposed disregards the crosstalk produced by the correction signals of a local system from other local systems. Here, the term local means that each local system Q_jmakes decisions about the cancellation signals to be sent independently from other local systems. This enables the design of independent LTV for each subsystem. Optionally, the LTVs can include additional system to detect inter-users disturbances when needed, which can then be attenuated.

In one embodiment, a set of sensors can be included in a local system Q_j. For example, sensors for tracking the head movement for adjusting the HRTF, and the surrounding environment including the positions of other local systems that approaching or leaving away such that preloaded inter-user disturbance attenuation can be applied in advance.

In accordance to one embodiment, a separate pair of transducers (close-proximity-transducers (CPTs)) is provided and located in close proximity to the listener. The primary acoustic source remains to be a pair of main external stereo loudspeakers in front of the listeners, with the CPTs providing the crosstalk-cancelling signals. The use of CPTs to perform XTC is to provide listeners with their individualized XTC zones/bubbles. FIG. 5 provides an illustration of the individualized XTC zones/bubbles, and FIG. 6 provides its close-up view.

The CPTs provide the XTC soundwaves to cancel the crosstalk coming from the main external speakers. This allows the listeners to have a much higher degree of freedom in terms of movement. Not only will each individual have freedom of movement, but since CPTs are individual based or localized, there can be many listeners sharing the same listening experience from the same set of main speakers.

The CPTs of a system could produce inter-user crosstalk towards other systems. This may happen when CPT different from open headphones are used while users come too close. The definition of correction signal aforesaid does not include such non-significant effects in general. Optionally, the CPTs may comprise additional functions to handle such inter-user disturbances.

Optionally, the XTC soundwaves generated by the CPTs include coloration reduction, equalization, and/or user presets of sound effects.

In accordance to another embodiment, the CPTs can be a pair of open-back headphones (where external sound can travel through reaching the listener's ears), or a pair of headphones like the Sony PFR-V1 or the Bose Soundwear. The CPTs, however, are not limited to wearables. For example, in a movie theater application, it may be possible to embed CPTs into the headrest of the chairs. The advantage of having CPTs as wearables is that the physical relationship between the CPT and the listener can be fixed, but it is also possible to embed CPTs into headrests, all subject to the tolerance level of the algorithm for computing the crosstalk-cancelling signals.

Although the present document describes the CPTs of the present invention as applied primarily to headphones, an ordinarily skilled person in the art will be able adapt its various embodiments to be applied to other types of proximity devices such as, without limitation, embeddable devices to stationary objects, for example a chair, a sofa, or a neck cushion without undue experimentation.

The location of the listeners in relation with the main speakers will have an impact on the effectiveness of the level of XTC achieved. Various technologies can be implemented to determine the location of the listeners. For example, Bluetooth based triangulation technology can be used to determine the location. Other wireless technologies can also provide very accurate positioning information. The positioning information can be used to calculate the delay required for the L and R channels of the CPTs.

CPTs can be wired or wireless devices. The main goal here is to separate the XTC zone from a traditional BAL setup from the main speakers. Instead, we create local XTC zones for each individual.

The embodiments disclosed herein may be implemented using general purpose or specialized computing devices, mobile communication devices, computer processors, or electronic circuitries including but not limited to digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA), and other programmable logic devices configured or programmed according to the teachings of the present disclosure. Computer instructions or software codes running in the general purpose or specialized computing devices, mobile communication devices, computer processors, or programmable logic devices can readily be prepared by practitioners skilled in the software or electronic art based on the teachings of the present disclosure.

In some embodiments, the present invention includes computer storage media having computer instructions or software codes stored therein which can be used to program computers or microprocessors to perform any of the processes of the present invention. The storage media can include, but are not limited to, floppy disks, optical discs, Blu-ray Disc, DVD, CD-ROMs, and magneto-optical disks, ROMs, RAMs, flash memory devices, or any type of media or devices suitable for storing instructions, codes, and/or data.

The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art.

The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.

Claims

1. A system of crosstalk cancelled zone creation in audio playback comprising:

one or more main transducers emitting stereo soundwaves of an audio playback;

a local system comprising at least two or more close-proximity-transducers (CPTs);

wherein each of the CPTs is arranged proximal to one of left and right-side ear canals of a listener;

wherein each of the CPTs comprises: a position tracking device for tracking the relative positions of the main transducers to the CPT and the other CPTs; a control unit for receiving the relative position data from the position tracking device and generating control signal according to the relative position data for the generation of XTC soundwaves;

wherein each of the CPTs is configured to generate XTC soundwaves corresponding to the stereo soundwaves arriving at the corresponding ear of the listener; and

wherein the generated XTC soundwaves are synchronized with the audio playback and with respect to the relative positions.

2. The system of claim 1, wherein the position tracking device further tracks the relative position of other local systems.

3. The system of claim 1, wherein the position tracking device includes wireless communication triangulation device for tracking the relative positions.

4. The system of claim 1, wherein the CPTs additionally emit one or more correction signals.

5. The system of claim 1, wherein the CPTs include one or more of over-ear, on-ear, and in-ear headphones, ear-buds, other types of wearable speakers, fixed and portable loudspeakers.

6. A system of crosstalk cancelled zone creation in audio playback comprising:

one or more main transducers emitting stereo soundwaves of an audio playback;

a local system comprising at least two or more close-proximity-transducers (CPTs) and one or more microphones;

wherein each of the CPTs is arranged proximal to one of left and right-side ear canals of the listener;

wherein each of the microphones is placed proximal to a listener's ears and configured to receive and measure the stereo soundwaves of the audio playback;

wherein each of the CPTs comprises: a control unit for receiving measurement data of the stereo soundwaves of the audio playback from the microphones and generating control signal according to the measurement data for the generation of XTC soundwaves;

wherein each of the CPTs is configured to generate XTC soundwaves corresponding to the stereo soundwaves arriving at the corresponding ear of the listener; and

wherein the generated XTC soundwaves are synchronized with the audio playback and with respect to the relative positions.

7. The system of claim 1, wherein the CPTs additionally emit one or more correction signals.

8. The system of claim 1, wherein the CPTs include one or more of over-ear, on-ear, and in-ear headphones, ear-buds, other types of wearable speakers, fixed and portable loudspeakers.