SOFTWARE BASED AUDIO TIMING AND SYNCHRONIZATION
Synchronization of plural outputs of data transported by a wireless network is facilitated by bandlimiting a sample clock signal controlling a rate at which data is processed by the network's devices and/or bandlimiting wall time data controlling the real time for presenting a datum.
Latest SUMMIT WIRELESS TECHNOLOGIES, INC. Patents:
This application is a continuation of U.S. Ser. No. 15/863,637, filed Jan. 5, 2018, which is a continuation-in-part of U.S. Ser. No. 15/660,800, filed Jul. 26, 2017, now abandoned, which is a continuation of U.S. Ser. No. 14/186,852, filed Feb. 21, 2014, now issued as U.S. Pat. No. 9,723,580 “Synchronization Of Audio Channel Timing”.
BACKGROUND OF THE INVENTIONIn U.S. patent application Ser. No. 14/186,852, a hardware methodology for recovering and filtering audio timing was presented. The key circuit element in this approach, which is not available on most generic processor SOCs, was a low bandwidth Phase lock Loop (PLL). This PLL was used twice; to filter the beacon based Time Synchronization function (TSF) value and to generate the output audio clock.
For a soft implementation as taught and claimed herein, these two PLL hardware functions are converted to an Estimator 308 and a Sample Rate Converter (SRC) code 310. The present invention again relates to wireless data networks and, more particularly, to a system and method for synchronizing outputs at multiple endpoints in a network which includes a wireless communication link.
While audio and video equipment has historically been connected by analog or digital point-to-point, one-way connections, an increasing portion of multimedia content is distributed over networks. For example, video and uncompressed audio may be streamed from an audio/video source in a media room or closet to a display and multiple speakers of a surround sound system in a remote room or rooms in a residence. Since it is difficult to retrofit finished structures with cabling, in many cases data, including video and audio data, is transmitted from a source to a display, speakers or other output devices over a network that includes a wireless communication link(s) utilizing low cost radio technologies such as frequency modulation and spread spectrum modulation to transport packetized digital data.
Synchronization of outputs and minimization of system latency are critical requirements for high quality audio whether or not combined with video. The human ear is sensitive to phase delay or channel-to-channel latency and multi-channel audio output with channel-to-channel latency greater than 50 microsecond (μs) is commonly described as disjointed or blurry sound. On the other hand, source-to-output delay or latency (“lip-sync”) greater than 50 milliseconds (ms) is commonly considered to be noticeable in audio-video systems. In a digital network, such as an audio/video system, a source of digital data transmits a stream of data packets to the network's end points where the data is presented. Typically, a pair of clocks at each node of the network controls the time at which a particular datum is presented and the rate at which data is processed, for examples, an analog signal is digitized or digital data is converted to an analog signal for presentation. The actual or real time that an activity, such as presentation of a video datum, is to occur is determined by “wall time,” the output of a “wall clock” at the node. A sample or media clock controls the rate at which data is processed, for example, the rate at which blocks of digital audio data introduced to a digital to analog converter.
Audio video bridging (AVB) is the common name of a set of technical standards developed by the Institute of Electrical and Electronics Engineers (IEEE) and providing specifications directed to time-synchronized, low latency, streaming services over networks. The Precision Time Protocol (PTP) specified by “IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems,” IEEE Std. 1588-2008 and adopted in IEEE 802.1AS-2011—“IEEE Standard for Local and Metropolitan Area Networks—Timing and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks” describes a system enabling distributed wall clocks to be synchronized within 1 μs over seven network hops.
A master clock to which the remaining distributed clocks, the slave clocks, are to be synchronized is selected either by a “best master clock” algorithm or manually. Periodically, the device comprising the master clock (the “master device”) and the device(s) comprising the slave clock(s) (the “slave device(s)”) exchange messages which include timestamps indicating the master clock's “wall time” when the respective message was either transmitted or received by the master device. The slave device notes the local wall times when the respective messages were received or transmitted by it and calculates the offset of the slave clock relative to the master clock and the network delay, the time required for the messages to traverse the network from the master device to the slave device. With repeated measurements, the frequency drift of the slave clock relative to the master clock can also be determined enabling the slave clock to be synchronized with the master clock by adjusting the slave clock's wall time for the offset and the network delay and adjusting the slave clock's frequency for any frequency drift relative to the master clock.
PTP can synchronize wall clocks of an extensive network or even plural networks, but the accuracy of PTP can be strongly influenced by the loading and exposure to interference of the wireless communication link(s). An alternative to PTP for synchronizing the wall time at plural devices of a wireless network is the Time Synchronization Function (TSF) specified in IEEE 802.11, “IEEE Standard for Information Technology—Telecommunications and Information Exchange Between Systems Local and Metropolitan Area Networks.” Every 802.11 compliant device in a network known as a basic service set (BSS) includes a TSF counter. Periodically, during a beacon interval, devices of the BSS transmit a beacon frame containing a timestamp indicating the local wall time at the transmitting device and other control information. A receiving node or slave device receiving the beacon frame synchronizes its local time by accepting the timing information in the beacon frame and setting its TSF counter to the value of the received timestamp if the timestamp indicates a wall time later than the node's TSF counter.
However, neither PTP nor TSF provide for synchronization of the media or sample clocks which control the rate at which application data is processed by the processing elements of the network's devices. The Audio Video Bridging Transport Protocol (AVBTP) of “IEEE 1722-2011: Layer 2 Transport Protocol for Time Sensitive Applications in a Bridged Local Area Network” provides that each network end point (a device that receives or transmits data) is to recover the sample clock from data in the packetized data stream transmitted by the data source. Each data packet comprises plural application data samples, for example, audio data samples, and a time stamp indicating the wall time at which presentation of the application data in the packet is to be initiated. At each network end point, for example, an audio speaker unit, a sample clock is generated which oscillates at a frequency that enables the plural application data samples in a data packet to be presented for processing within the time interval represented by successive timestamps.
While PTP, TSF and AVBTP provide means for synchronizing distributed clocks, not all packets transmitted by a network data source, particularly packets transmitted wirelessly, reach their destinations. As packets are lost, each network end point, for example, the plural speaker units of a surround sound audio system, receives a respective aliased subsample of the timestamps and over time the clocks of the respective network endpoints will not track. What is desired, therefore, are accurate consistently synchronized sample clocks at a plurality of related network endpoints.
Referring in detail to the drawings where similar parts are identified by like reference numerals, and, more particularly to
Synchronization of the various outputs and minimization of system latency are critical requirements of high quality audio/video systems. Source-to-output delay or latency (“lip-sync”) is important in audio/video systems, such as home theater systems, where a slight difference, on the order of 50 milliseconds (ms), between display of a video sequence and the output of the corresponding audio is noticeable. On the other hand, the human ear is even more sensitive to phase delay or channel-to-channel latency between the corresponding outputs of the different channels of multi-channel audio. Channel-to channel latency greater than 1 microsecond (μs) can result in the perception of disjointed or blurry audio.
Audio video bridging (AVB) is the common name of a set of technical standards developed by the Institute of Electrical and Electronics Engineers (IEEE) and providing specifications for time-synchronized, low latency, streaming services over networks.
“IEEE 802.1AS-2011—IEEE Standard for Local and Metropolitan Area Networks—Timing and Synchronization for Time-Sensitive Applications in Bridged Local Area Networks” describes a system for synchronizing clocks distributed among the nodes of one or more networks of devices. Referring also to
Referring also to
When operation of a network is initiated, a master clock is selected either manually or by a “best master clock” algorithm. Afterward, messages are periodically exchanged between the device comprising the master clock (the “master device”) and the network devices comprising the slave clocks (the “slave devices”) enabling determination of the offset, the time by which a slave clock leads or lags the master clock, and the network delay, the time required for data packets to traverse the network. At defined intervals, by default two second intervals, the master device multicasts a Sync message 106 to the other network devices. The precise master clock wall time of the Sync message's transmission, t1, 108 is determined and included as a timestamp in either the Sync message or in a Follow-up message 110. The slave device determines the local wall time, t2, 112 at which the device received the Sync message. A Delay_Req message 114 is then sent by the slave device to the master device at time, t3, 116. The master clock's time of receipt, t4, 118 of the Delay_Req message 114 is determined and the master device responds with a Delay_Resp message 120 which includes a timestamp indicating t4, 118. The slave device determines the network delay and the slave clock's offset from the four times, t1, t2, t3 and t4:
Delay+Offset=t2−t1 (1)
Delay−Offset=t4−t3 (2)
Delay=((t2−t1)+(t4−t3))/2 (3)
Offset=((t2−t1)−(t4−t3))/2 (4)
Consecutive measurements of the offset also permit compensation for the slave clock's frequency drift. With the time and frequency drift determined, each slave clock is adjusted to match the wall time of the master clock by adding or subtracting the offset to or from the local wall time and adjusting the slave clock's frequency.
IEEE 802.11, “IEEE Standard for Information Technology—Telecommunications and Information Exchange Between Systems Local and Metropolitan Area Networks” provides media access control (MAC) and physical layer (PHY) specifications for implementing wireless local area networks (WLAN) referred to basic service sets (BSS). The devices which are parts of a BSS are identified by a service set identification (SSID) which may be assigned or established by the device which starts the network. Each network device or station includes a local timing synchronization function (TSF) timer, the device's wall clock 56, 218, which is based on a 1 mega-Hertz (MHz) clock which ticks in microseconds. During a beacon period all stations in an independent basic service set (IBSS) compete to transmit a beacon. Each station calculates a random delay interval and sets a delay timer scheduling transmission of a beacon when the timer expires. If a beacon arrives before the delay timer expires, the receiving station cancels its pending beacon transmission. The beacon comprises a beacon frame including a timestamp indicating the TSF timer value, the wall time, of the station that transmitted the beacon. Upon receiving a beacon, if the timestamp is later than the receiving station's TSF timer the receiving station sets its TSF timer, for example the wall clock 218, to the value of the timestamp thus synchronizing the TSF timers, the wall clocks, of the transmitting station and the receiving station.
PTP and TSF are responsible for synchronizing the wall clocks of all nodes in the respective network to the same wall time but not for synchronizing the sample clocks controlling the processing of the various media transported by the network. The sample clocks are recovered from the data stream at each of the network's listeners, endpoints receiving the data stream, enabling different sample clocks for different media to be transported on the same network.
Referring to
Referring also to
In Ser. No. 14/186,852, the sample or media clock recovery provides, in essence, a distributed phase locked loop (PLL) for the network with identical sample clocks generated at each listener 54 processing a respective medium. As taught herein, rather than using PLL hardware, this application accomplishes these functions with an Estimator 308 and a Sample Rate Converter (SRC) code 310.
Ideally, AVB synchronizes the outputs of the network's listeners by delivering data to each end point's media interface, for examples, a controller for a video display or the digital-to-analog converters (DAC) of plural wireless speakers, at the synchronized wall times specified in the timestamps and at a rate determined by synchronic sample clocks.
While ideally the sample clocks regulating the rendering the each medium and the wall clocks controlling presentation time are synchronized, packets are commonly lost in a wireless network and each receiver receives a respective aliased subsample of the data packets and accompanying timestamps conveyed in the data stream. Loses during packet transfer, clock jitter and resulting sample clock variation make it difficult to maintain less than 50 μs channel-to-channel latency which is desired for high quality, multi-channel, surround sound audio. The inventor concluded that synchronicity in presenting related content at plural network endpoints would be promoted by introducing a frequency filtering function in the clock path at a sample clock recoverer enabling recovery of a band limited sample clock which is, in turn, copied to other listeners requiring the same sample clock, for example, plural surround speaker units of an audio/video system.
Referring also to
One of the network's endpoints is a talker 52 that receives application data, for example, audio and/or video data, from a source such as a digital video disk (DVD) player or a television set-top box, and transmits the data in a packetized serial data stream 154 to a plurality of listeners 54, for examples, a wireless video display 156 and a plurality wireless speakers 158, 160, 162A-162C for multichannel surround sound. Six channel surround sound audio systems, known as 5.1 (“five point one”) surround systems, utilize five full bandwidth channels; a front left channel, a front center channel, a front right channel and left and right surround channels, each reproduced by a corresponding speaker. In addition, the 5.1 surround sound system includes one low-frequency effects channel, the point one (0.1) channel, which is reproduced by a subwoofer.
Increasingly, manufacturers of home theater systems are adopting eight channel (7.1) surround sound and high end systems, such as an 11.1 surround sound system, are contemplated.
The talker 52 comprises a multiplexer/buffer (MUX/buffer) 162 which serially, packetizes digitized analog audio/video data 174 output by a coder/decoder (codec) 164 or digital audio/video data 166 obtained from a digital data source. A clock divider 168, driven by phase locked loop (Type-II PLL) 170 and a crystal oscillator 172, outputs a sample clock 58, an alternating signal, which times the sampling of the analog audio/video data 174 by the codec 164. The sample clock 58 is also input to a timestamp generator 64 which based on wall time 178 output by the talker's wall clock 56 produces a presentation timestamp 60 indicating the wall time for initiating presentation of application data in a data packet 62 and signals the MUX to insert the presentation timestamp into the header of the data packet. The sample clock 58 is also input to the MUX/buffer 162 to control the rate at which the MUX captures the data at its inputs and multiplexes the data to serial data packets containing plural application data samples, for example audio data samples. The serialized data packet is buffered and transmitted from the buffer to a radio transceiver/media access controller (MAC) 180. A bus interface clock signal 182 times the transmission of the packetized data from the MUX/buffer 162 to the radio transceiver and MAC. The media access controller (MAC) adds a media access address identifying the device that is to receive the data packet and the transceiver modulates the data packet with a carrier and transmits the radio frequency data stream 154 to appropriate network listeners, for examples, a receiver of the video display 156 and the respective receivers of the surround sound audio speaker units 158, 160, 162A-162C. AVB also provides for transmissions to a “bridge” 184 which may relay data transmitted by the talker 50 to a second network, including listener 186, and which acts as a slave clock to the talker network's grandmaster clock and as a master clock to the network, comprising listener 186, to which it retransmits the data.
Each of the network's listeners, for example, the speaker units 158, 160, 162A-162C of the surround sound system, receive packetized data 154 transmitted by the talker 52 to the listener's respective MAC address. However, particularly in a network comprising a wireless communication link, data packets may be lost so each listener may receive only an aliased subset of the transmission. In the network 150, one of the plural speaker units 158 is designated as the sample clock recoverer for the other speaker units 160, 162A-162C of the surround sound system. The radio transceiver and MAC unit 202 of the sample clock recoverer receives the steam of data packets addressed to speaker unit 158 and transmits them to a demultiplexer/buffer 204 where the data in the data packets are disassembled and buffered. The presentation timestamp 60 for each data packet is transmitted to a timestamp comparer 206.
The time interval 63 represented by successive time stamps is signaled to a Type-II PLL 208 by the timestamp comparer 206. In addition, the number of data blocks in a data packet, as specified in the data block count field in the packet header, is input to a counter 210 in a feedback loop of the PLL 208. Within the time interval represented by the difference between successive timestamps, the counter 210 in the feedback loop causes the PLL 208 to output an alternating signal, a raw recovered sample clock 79, with a respective clock edge 120 for each of the data blocks 66 included in the data packet. The raw recovered sample clock 79 is input to a low bandwidth Type-II PLL 211 which frequency filters the raw sample clock signal to eliminate jitter and produce a cleaner band limited recovered sample clock signal 80. Band limiting the recovered sample clock signal produces a signal with a frequency centered on the mean frequency of the raw signal, substantially reducing jitter in the sample clock signal so that the sample clocks of other listeners utilizing the recovered sample clock, for example, the other surround sound speaker units 160, 162A-162C, are more nearly identical to the recovered sample clock 80.
The bandlimited recovered sample clock 80 is also input to a clock divider 212 which outputs a bus clock signal 214 to the buffer/DEMUX 204. The recovered sample clock signal 80 is transmitted to the buffer/DEMUX 204 and to the digital-to-analog converter (DAC) 216 to control the processing rate for the audio data samples contained in the data packets. The timestamp comparer 206 compares the timestamp in the data packet to the wall time of the slave clock 218 of the speaker unit and appropriately outputs a signal to the DAC when the DAC is to initiate converting the respective digital audio data, at the rate established by the recovered sample clock 80, to an analog signal 86 which controls the operation of the speaker 220.
The bandlimited recovered sample clock 80 is transmitted to the transceiver and MAC 202 of the sample clock recoverer 158 where it is modulated with a carrier and transmitted to other surround sound speaker units 160, 162A-162C. Anticipating packet loss and satisfying the Nyquist sampling criterion, the rate at which the sample clock information is updated at the other speakers is set at least twice the limiting bandwidth of the recovered sample clock. For example, if the timing at a speaker is updated every 100 ms and the peak packet error rate (PER) is 75%, the low bandwidth PLL 211 of the sample clock recoverer 158 is set to no more than 1.25 Hz. The modulated recovered sample clock signal is received by the transceiver and MAC 240 of the other speaker unit(s) where it is input to the respective buffer/DEMUX units 242 and transmitted to the DAC 244 to control the rate at which audio data samples in data packets addressed to the respective MAC by the talker 52 are processed. The timestamps 60 in the data stream are separated from the application data and compared to the synchronized local wall time 246 by the timestamp comparer 248 which signals the buffer/DEMUX to input the application data from the respective data packet to the DAC 244 for presentation by the speaker.
Alternatively, the bandlimited sample clock may be recovered without frequency filtering the raw sample clock output. Referring also to
Alternatively, as illustrated in
By introducing a frequency filtering function in the clock path with the low bandwidth PLL 211 and/or the low bandwidth PLL 219 jitter is removed from the sample clock and/or the wall clock substantially reducing aliasing and improving the synchronization of the outputs of the network.
In an alternate embodiment shown in
From this Estimator, the Frequency and Delay Coefficients are generated which then can be applied to the Counter value at any time to generate the filtered TSF value, as shown in
This filtered TSF value is used at the audio talker to generate the Presentation Time of the block of audio data that is transmitted. The block size is set to be some multiple of the audio data interleaver length.
At the audio listener the Presentation Time and the beacon TSF values are estimated to the local audio clock by the same method used at the audio transmitter source. The audio is then resampled to the local audio clock rather than use a PLL to generate the clock, as shown in
The frequency and delay coefficients are combined (Frequency Coefficients are multiplied and Delay Coefficients are added) to make the Play Time of each block. The Play Time is used to generating the sample timing for the resampler. An example of this sample timing is shown in
At the start of the audio playback the sample spacing is assumed to be ideal, and at the beginning of each new block the Play Time is compared to elapsed time to that point. The residual error from the previous block is then then compensated for in the new block dividing the previous error equally over all samples of the new block or alternatively future blocks recursively. This allows the audio playback to start immediately without having to wait for the next PlayTime to determine the exact sample spacing.
In prior applications, a hardware methodology for recovering and filtering audio timing was presented. The key circuit element in the prior approach, which is not available on most generic processor SOCs, is a low bandwidth Phase lock Loop (PLL). This PLL was used twice; to filter the beacon based Time Synchronization function (TSF) value and to generate the output audio clock (see 208 and 210). For a soft implementation, these two hardware functions are converted to an Estimator (308) and a Sample Rate Converter (SRC) code (310).
In the present application, the TSF PLL is replaced with an Estimator (308) which measures and filters TSF value against the internal Counter/Timer of the audio subsystem processor. The Estimator is presented with TSF and Counter pairs sampled at the same instant so that the frequency and delay relationship between the two can be determined.
An embodiment of this system is illustrated in
From this Estimator (308), the Frequency and Delay Coefficients are generated which then can be applied to the Counter value at any time to generate the filtered TSF value, as shown in
This filtered TSF value is used at the audio talker to generate the Presentation Time of the block of audio data that is transmitted. The block size is set to be some multiple of the audio data interleaver length.
At the audio listener, the Presentation Time and the beacon TSF values are estimated to the local audio clock by the same method used at the audio transmitter source. The audio is then resampled to the local audio clock, rather than to a PLL to generate the clock.
As shown in
At the start of the audio playback, the sample spacing is assumed to be ideal, and at the beginning of each new block, the Play Time is compared to elapsed time to that point. The residual error from the previous block is distributed over the next block or alternatively future blocks recursively. This allows the audio playback to start immediately without having to wait for the next PlayTime to determine the exact sample spacing.
One embodiment of this method of recovering audio timing in a network includes bandlimiting a raw wall time signal about a mean frequency of the raw wall time signal by estimating the frequency and delay of the raw wall time signal to a local transmitter audio clock; including timing datum in a data packet about the mean frequency of the timing signal from the estimate of the frequency and delay of the timing signal and the audio playtime at the transmitter; bandlimiting the raw wall time signal about the mean frequency of the raw wall time signal by estimating the frequency and delay of said raw wall time signal to a local receiver audio clock; combining both wall time estimation and received timing datum estimation from a packet to generate an audio play time at the receiver; using the playtime generate coefficients for a polynomial interpolator that resamples the audio to the receiver audio clock.
In another embodiment, an audio system is provided, where the audio system comprises a first listener having a wall clock maintaining a wall time for occurrence of an event, said wall time updated by data transmitted by a master clock and received by said first listener; a sample clock recoverer retrieving a sample clock from data received by said first listener from a talker, said sample clock regulating a processing rate for a datum included in said data received from said talker; and a frequency filter bandlimiting said sample clock output by said sample clock recoverer, wherein said frequency filter comprises an estimator connected to receive said sample clock output by said sample clock recoverer; and a transmitter transmitting said bandlimited sample clock; and the audio system further comprises a second listener arranged to receive said transmitted bandlimited sample clock and use said bandlimited sample clock to regulate processing data received from said talker. Such a system may include that first listener is a frequency filter attenuating a frequency of said updating data received from said master clock. Such a system may include that the updating data received from said master clock is a timing synchronization function datum or a precision time protocol datum.
The detailed description, above, sets forth numerous specific details to provide a thorough understanding of the present invention. However, those skilled in the art will appreciate that the present invention may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuitry have not been described in detail to avoid obscuring the present invention.
All the references cited herein are incorporated by reference. The terms and expressions that have been employed in the foregoing specification are used as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equivalents of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims that follow.
Claims
1. A method of recovering a sample clock in a network, the method comprising the steps of:
- (a) generating a raw sample clock signal oscillating at a frequency determined by a datum included in a data packet; and
- (b) bandlimiting the raw sample clock signal about the mean frequency of the raw sample clock signal by estimating the raw sample clock signal.
2. A method of recovering a sample clock in a network, the method comprising the steps of:
- (a) bandlimiting a raw wall time signal about a mean frequency of said raw wall time signal by estimating the raw wall time signal; and
- (b) generating a sample clock signal oscillating at a frequency determined by a datum included in a data packet and the bandlimited raw wall time signal.
Type: Application
Filed: Mar 2, 2020
Publication Date: Jun 25, 2020
Applicant: SUMMIT WIRELESS TECHNOLOGIES, INC. (Beaverton, OR)
Inventor: Kenneth A. Boehlke (Portland, OR)
Application Number: 16/806,200