APPARATUS, SYSTEM AND METHOD FOR MERGING CODE LAYERS FOR AUDIO ENCODING AND DECODING
Apparatus, system and method for encoding and decoding ancillary code for digital audio, where multiple encoding layers are merged. The merging allows a greater number of ancillary codes to be embedded into the encoding space, and further introduces efficiencies in the encoding process.
The present disclosure relates to audio encoding and decoding for determining characteristics of media data. More specifically, the present disclosure relates to techniques for embedding data into audio for audience measurement purposes.
BACKGROUND INFORMATIONThere has been considerable interest in monitoring the use of mobile terminals, such as smart phones, tablets, laptops, etc. for audience measurement and/or marketing purposes. In the area of media exposure monitoring, ancillary audio codes have shown themselves to be particularly effective in assisting media measurement entities to determine and establish media exposure data. One technique for encoding and detecting ancillary audio codes is based on Critical Band Encoding Technology (CBET), pioneered by Arbitron Inc., which is currently being used in conjunction with a special-purpose Personal People Meters (PPM™) to detect codes via ambient encoded audio.
Conventional CBET encoding and decoding is based on multiple layers, where message code symbols are encoded into separate parallel encoding layers, resulting in tens of thousands of possible codes that may be used to identify and/or characterize media. While such configurations have proven to be advantageous, thousands of codes may not be sufficient to identify and/or characterize larger media collections, which may number in the millions or billions. Accordingly, techniques are needed to be able to include much larger amounts of code data within audio. Also, techniques are needed to be able to merge or “fold” encoding layers so that more efficient coding may be enabled.
BRIEF SUMMARYUnder one exemplary embodiment, a method is disclosed for encoding audio data with a message structure comprising a sequence of message symbols, the message symbols each comprising a combination of substantially single-frequency components having frequencies selected from a predefined set of substantially single-frequency values and a predefined symbol interval within a time base of the audio data. The disclosed method comprises the steps of providing data defining the message symbols for the message structure, and encoding the audio data with the message symbols such that the message symbols coexist within two encoding layers along the time base of the audio data, wherein the message structure as encoded being arranged within the time base of the audio data so that message symbols in a first of the two encoding layers are synchronized to message symbols in the second of the two encoding layers.
Under another exemplary embodiment, an encoder is disclosed for encoding audio data with a message structure, where the message structure comprises a sequence of message symbols, the message symbols each comprising a combination of substantially single-frequency components having frequencies selected from a predefined set of substantially single-frequency values and a predefined symbol interval within a time base of the audio data. The disclosed encoder comprises a first encoder portion configured to provide data defining the message symbols for the message structure, and a second encoder portion configured to encode the audio data with the message symbols such that the message symbols coexist within two encoding layers along the time base of the audio data, wherein the second encoder portion is configured to arrange the message structure within the time base of the audio data so that message symbols in a first of the two encoding layers are synchronized to message symbols in the second of the two encoding layers.
Under yet another exemplary embodiment, a computer program product is disclosed, comprising a tangible, non-transitory computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to encode audio data with a message structure, said message structure comprising a sequence of message symbols, the message symbols each comprising a combination of substantially single-frequency components having frequencies selected from a predefined set of substantially single-frequency values and a predefined symbol interval within a time base of the audio data. The disclosed computer program product, when executed, comprises the steps of providing data defining the message symbols for the message structure, and encoding the audio data with the message symbols such that the message symbols coexist within two encoding layers along the time base of the audio data, the message structure as encoded being arranged within the time base of the audio data so that message symbols in a first of the two encoding layers are synchronized to message symbols in the second of the two encoding layers.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
A mobile terminal as used herein comprises at least one wireless communications transceiver. Non-limiting examples of the transceivers include a GSM (Global System for Mobile Communications) transceiver, a GPRS (General Packet Radio Service) transceiver, an EDGE (Enhanced Data rates for Global Evolution) transceiver, a UMTS (Universal Mobile Telecommunications System) transceiver, a WCDMA (wideband code division multiple access) transceiver, a PDC (Personal Digital Cellular) transceiver, a PHS (Personal Handy-phone System) transceiver, and a WLAN (Wireless LAN, wireless local area network) transceiver. The transceiver may be such that it is configured to co-operate with a predetermined communications network (infrastructure), such as the transceivers listed above. The network may further connect to other networks and provide versatile switching means for establishing circuit switched and/or packet switched connections between the two end points. Additionally, the device may include a wireless transceiver such as a Bluetooth adapter meant for peer-to-peer communication and piconet/scatternet use. Furthermore, the terminal may include interface(s) for wired connections and associated communication relative to external entities, such as an USB (Universal Serial Bus) interface or a Firewire interface.
Turning to
The audio data then enters encoder 121 from communications interface 120. In encoder 121, in one mode of operation the audio data is encoded with multiple messages that share substantially single-frequency components. In another, the audio data as received by encoder 121 has a message encoded therein and encoder 121 encodes one or more additional messages in the audio data. The encoded audio data is then communicated via a communication interface 122. The communication interface 122 can come in any of multiple forms such as radio broadcasts, television broadcasts, DVDs, MP3s, compact discs, streaming music, streaming video, network data, mini-discs, multimedia presentations, personal address systems or the like. Decoder 123 then receives the communicated encoded audio data. Decoder 123 may be embodied as part of a receiver, a personal people meter, a computer device, or portable processing device, discussed in further detail below.
Decoder 123 is configured to detect encoded messages. As a result of the ability to retrieve the encoded messages, decoder 123 can therefore possess a myriad of functionality such as the relaying of information, e.g. providing the performing artist's name or providing audience estimating information, or controlling access, e.g. an encryption key scheme, or data transport, e.g. using the encoded messages as an alternate communications channel. Decoder 123 can possess the ability to reproduce the audio data but this is not essential. For example, a decoder 123 used for gathering audience estimate data can receive the audio data in acoustic form, in electrical form or otherwise from a separate receiver. In the case of an encryption key scheme, the reproduction of the audio data for an encryption key holder is the objective.
Operation 132 may be configured to assign a plurality of substantially single-frequency code components to each of the message symbols. When the message is encoded, each symbol of the message is represented in the audio data by its corresponding plurality of substantially single-frequency code components. Each of such code components preferably occupies only a narrow frequency band so that it may be distinguished from other such components as well as noise with a sufficiently low probability of error. It is recognized that the ability of an encoder or decoder to establish or resolve data in the frequency domain is limited, so that the substantially single-frequency components are represented by data within some finite or narrow frequency band. Moreover, there are circumstances in which is advantageous to regard data within a plurality of frequency bands as corresponding to a substantially single-frequency component. This technique is useful where, for example, the component may be found in any of several adjacent bands due to frequency drift, variations in the speed of a tape or disk drive, or even as the result of an incidental or intentional frequency variation inherent in the design of a system.
Once block 130 prepares symbols for encoding, they may be arranged as messages that may be separately or simultaneously embedded into audio using multiple layers. Some exemplary processes for embedding such messages is described in U.S. Pat. No. 6,845,360, titled “Encoding Multiple Messages In Audio Data and Detecting Same,” which is assigned to the assignee of the present application and is incorporated by reference in its entirety herein. In certain embodiments, several message parameters may be selected singly or in combination in order to ensure that the first and second messages can be separately decoded. Block 135 represents multiple operations which serve to determine parameters of the message to be encoded either to distinguish it from a message previously encoded in the audio data or from one or more further messages also being encoded therein at the same time. One such parameter is the symbol interval, selected in operation 138 of
Operation 137 of
In one example, decoder 213 may be configured as software tangibly embodied in memory 208, which may communicate with other software in memory 208 and CPU 201, as well as audio circuitry 206, and serves to decode ancillary data embedded in audio signals in order to detect exposure to media. Examples of techniques for encoding and decoding such ancillary data are disclosed in U.S. Pat. No. 6,871,180, titled “Decoding of Information in Audio Signals,” issued Mar. 22, 2005, and are incorporated by reference in its entirety herein. Other suitable techniques for encoding data in audio data are disclosed in U.S. Pat. No. 7,640,141 to Ronald S. Kolessar and U.S. Pat. No. 5,764,763 to James M. Jensen, et al., which are incorporated by reference in their entirety herein. Other appropriate encoding techniques are disclosed in U.S. Pat. No. 5,579,124 to Aijala, et al., U.S. Pat. Nos. 5,574,962, 5,581,800 and 5,787,334 to Fardeau, et al., and U.S. Pat. No. 5,450,490 to Jensen, et al., each of which is assigned to the assignee of the present application and all of which are incorporated herein by reference in their entirety.
An audio signal which may be encoded with a plurality of code symbols may be received via data communication through RF interface 205 via audio circuitry 206, or through any other data interface allowing for the receipt of audio/visual data in digital form. Audio signals may also be received via microphone 222. Furthermore, encoded audio signals may be reproduced on device 200 through digital files stored in memory 208 and executed through one or more applications (214) stored in memory 208 such as a media player that is linked to audio circuitry 206. From the following description in connection with the accompanying drawings, it will be appreciated that decoder 213 is capable of detecting codes in addition to those arranged in the formats disclosed hereinabove. Memory 208 may also include high-speed random access memory (RAM) and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to memory 208 by other components of the device 200, such as processor 203, decoder 213 and peripherals interface 204, may be controlled by the memory controller 202. Peripherals interface 204 couples the input and output peripherals of the device to the processor 203 and memory 208. The one or more processors 203 run or execute various software programs and/or sets of instructions stored in memory 208 to perform various functions for the device 200 and to process data. In some embodiments, the peripherals interface 204, processor(s) 203, decoder 213 and memory controller 202 may be implemented on a single chip, such as a chip 201. In some other embodiments, they may be implemented on separate chips.
The RF (radio frequency) circuitry 205 receives and sends RF signals, also known as electromagnetic signals. The RF circuitry 205 converts electrical signals to/from electromagnetic signals and communicates with communications networks and other communications devices via the electromagnetic signals. The RF circuitry 205 may include well-known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chipset, a subscriber identity module (SIM) card, memory, and so forth. RF circuitry 205 may communicate with networks, such as the Internet, also referred to as the World Wide Web (WWW), an intranet and/or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN), and other devices by wireless communication. The wireless communication may use any of a plurality of communications standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), high-speed downlink packet access (HSDPA), wideband code division multiple access (W-CDMA), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (e.g., IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n), voice over Internet Protocol (VoIP), Wi-MAX, a protocol for email (e.g., Internet message access protocol (IMAP) and/or post office protocol (POP)), instant messaging (e.g., extensible messaging and presence protocol (XMPP), Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE), and/or Instant Messaging and Presence Service (IMPS)), and/or Short Message Service (SMS)), or any other suitable communication protocol, including communication protocols not yet developed as of the filing date of this document.
Audio circuitry 206, speaker 221, and microphone 222 provide an audio interface between a user and the device 200. Audio circuitry 206 may receive audio data from the peripherals interface 204, converts the audio data to an electrical signal, and transmits the electrical signal to speaker 221. The speaker 221 converts the electrical signal to human-audible sound waves. Audio circuitry 206 also receives electrical signals converted by the microphone 221 from sound waves, which may include encoded audio, described above. The audio circuitry 206 converts the electrical signal to audio data and transmits the audio data to the peripherals interface 204 for processing. Audio data may be retrieved from and/or transmitted to memory 208 and/or the RF circuitry 205 by peripherals interface 204. In some embodiments, audio circuitry 206 also includes a headset jack for providing an interface between the audio circuitry 206 and removable audio input/output peripherals, such as output-only headphones or a headset with both output (e.g., a headphone for one or both ears) and input (e.g., a microphone).
I/O subsystem 211 couples input/output peripherals on the device 200, such as touch screen 215 and other input/control devices 217, to the peripherals interface 204. The I/O subsystem 211 may include a display controller 218 and one or more input controllers 220 for other input or control devices. The one or more input controllers 220 receive/send electrical signals from/to other input or control devices 217. The other input/control devices 217 may include physical buttons (e.g., push buttons, rocker buttons, etc.), dials, slider switches, joysticks, click wheels, and so forth. In some alternate embodiments, input controller(s) 220 may be coupled to any (or none) of the following: a keyboard, infrared port, USB port, and a pointer device such as a mouse, an up/down button for volume control of the speaker 221 and/or the microphone 222. Touch screen 215 may also be used to implement virtual or soft buttons and one or more soft keyboards.
Touch screen 215 provides an input interface and an output interface between the device and a user. The display controller 218 receives and/or sends electrical signals from/to the touch screen 215. Touch screen 215 displays visual output to the user. The visual output may include graphics, text, icons, video, and any combination thereof (collectively termed “graphics”). In some embodiments, some or all of the visual output may correspond to user-interface objects. Touch screen 215 has a touch-sensitive surface, sensor or set of sensors that accepts input from the user based on haptic and/or tactile contact. Touch screen 215 and display controller 218 (along with any associated modules and/or sets of instructions in memory 208) detect contact (and any movement or breaking of the contact) on the touch screen 215 and converts the detected contact into interaction with user-interface objects (e.g., one or more soft keys, icons, web pages or images) that are displayed on the touch screen. In an exemplary embodiment, a point of contact between a touch screen 215 and the user corresponds to a finger of the user. Touch screen 215 may use LCD (liquid crystal display) technology, or LPD (light emitting polymer display) technology, although other display technologies may be used in other embodiments. Touch screen 215 and display controller 218 may detect contact and any movement or breaking thereof using any of a plurality of touch sensing technologies now known or later developed, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with a touch screen 215.
Device 200 may also include one or more sensors 216 such as optical sensors that comprise charge-coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) phototransistors. The optical sensor may capture still images or video, where the sensor is operated in conjunction with touch screen display 215. Device 200 may also include one or more accelerometers 207, which may be operatively coupled to peripherals interface 204. Alternately, the accelerometer 207 may be coupled to an input controller 214 in the I/O subsystem 211. The accelerometer is preferably configured to output accelerometer data in the x, y, and z axes.
In some embodiments, the software components stored in memory 208 may include an operating system 209, a communication module 210, a text/graphics module 211, a Global Positioning System (GPS) module 212, audio decoder 213 and applications 214. Operating system 209 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or an embedded operating system such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components. Communication module 210 facilitates communication with other devices over one or more external ports and also includes various software components for handling data received by the RF circuitry 205. An external port (e.g., Universal Serial Bus (USB), Firewire, etc.) may be provided and adapted for coupling directly to other devices or indirectly over a network (e.g., the Internet, wireless LAN, etc.
Text/graphics module 211 includes various known software components for rendering and displaying graphics on the touch screen 215, including components for changing the intensity of graphics that are displayed. As used herein, the term “graphics” includes any object that can be displayed to a user, including without limitation text, web pages, icons (such as user-interface objects including soft keys), digital images, videos, animations and the like. Additionally, soft keyboards may be provided for entering text in various applications requiring text input. GPS module 212 determines the location of the device and provides this information for use in various applications. Applications 214 may include various modules, including address books/contact list, email, instant messaging, video conferencing, media player, widgets, instant messaging, camera/image management, and the like. Examples of other applications include word processing applications, JAVA-enabled applications, encryption, digital rights management, voice recognition, and voice replication.
Returning briefly to the example of
Turning to
Media layer 303 may be configured to provide application layer 304 with audio, video, animation and graphics capabilities. As with the other layers comprising the stack of
Core services layer 302 comprises fundamental system services that all applications use, and also provides interfaces that use object-oriented abstractions for working with network protocols and for providing control over protocols stack and provide simplified use of lower-level constructs such as BSD sockets. Functions of core services layer 302 provide simplified tasks such as communicating with FTP and HTTP servers or resolving DNS hosts. Core OS layer 301 is the deepest layer of the architecture of
Turning to
During one exemplary mode of operation, which will be discussed in greater detail below, the audio portion of media played using media player 401 is stored and/or forwarded to decoder application 402. Using one or more techniques described herein below, decoder 402 processes the audio portion to detect if ancillary codes are present within the audio. If present, the ancillary codes are read, stored, and ultimately transmitted to a remote or central location (114) where the codes may be further processed to determine characteristics (e.g., identification, origin, etc.) of the media and further determine media exposure for a user associated with a device (200) for audience measurement purposes.
With regard to encoding/decoding audio,
When utilizing a multi-layered message, a plurality of layers may be present in an encoded data stream, and each layer may be used to convey different data. Turning to
The second layer 502 of message 500 is illustrated having a similar configuration to layer 501, where each symbol set includes two synchronization symbols 509, 511, a larger number of data symbols 510, 512, and time code symbols 513. The third layer 503 includes two synchronization symbols 514, 516, and a larger number of data symbols 515, 517. The data symbols in each symbol set for the layers (501-503) should preferably have a predefined order and be indexed (e.g., 1, 2, 3). The code components of each symbol in any of the symbol sets should preferably have selected frequencies that are different from the code components of every other symbol in the same symbol set. Under one embodiment, none of the code component frequencies used in representing the symbols of a message in one layer (e.g., Layer1 501) is used to represent any symbol of another layer (e.g., Layer2 502). In another embodiment, some of the code component frequencies used in representing symbols of messages in one layer (e.g., Layer3 503) may be used in representing symbols of messages in another layer (e.g., Layer1 501). However, in this embodiment, it is preferable that “shared” layers have differing formats (e.g., Layer3 503, Layer1 501) in order to assist the decoder in separately decoding the data contained therein.
Sequences of data symbols within a given layer are preferably configured so that each sequence is paired with the other and is separated by a predetermined offset. Thus, as an example, if data 905 contains code 1, 2, 3 having an offset of “2”, data 507 in layer 501 would be 3, 4, 5. Since the same information is represented by two different data symbols that are separated in time and have different frequency components (frequency content), the message may be diverse in both time and frequency. Such a configuration is particularly advantageous where interference would otherwise render data symbols undetectable. Under one embodiment, each of the symbols in a layer have a duration (e.g., 0.2-0.8 sec) that matches other layers (e.g., Layer1 501, Layer2 502). In another embodiment, the symbol duration may be different (e.g., Layer 2 502, Layer 3 503). During a decoding process, the decoder detects the layers and reports any predetermined segment that contains a code.
The merged layer may be thought of as a process for encoding different layers of information at different point in the total audio chain as a unified layer such that multiple different message elements can be distinguished through detection observations. In certain (non-merged) applications, different code layers for audio are encoded at different physical locations (e.g., national broadcaster, local broadcaster, commercial distribution center, etc.) at different times. Since the encoding sites/locations may be widely separated in both time and location, the encoding of the layers is inherently asynchronous: messages on different layers have no set time relationship. By merging or “folding” the layers, multiple layers of information are permitted to exist; instead of using multiple different layers encoded at different points in the total audio chain to convey different message attributes (e.g., station identification), multiple different layers of information are combined in a time synchronous manner to create a message attribute or unified information set in one layer. Also, since the merged layer provides a more diverse platform for inserting codes, the numbers of different codes that may be used expand from tens of thousands of codes to billions. Such a configuration is particularly advantageous for use in non-linear media measurement and “on-demand” media.
In one embodiment, the merged or “folded” layers may comprise one layer similar to Layer 3 503 of
-
- The ability to simultaneously encode/decode multiple layers of information uses the same input processes up through the computationally expensive FFTs. This makes the encoding process more efficient.
- The simultaneous encoding of multiple layers of information is more inaudible than serial encoding since prior layer artifacts may be totally removed.
- Marker redundancy can be reduced allowing more symbols to be dedicated to data and error correction.
- The total number of available identification or characteristic codes can be greatly increased.
- Allows cross folded layer error detection and correction which improves the detection process through the reduction of false positives and erroneous detections.
In the simplified embodiment of
-
- (S4, S5, S6)=(S1, S2, S3)+offset1
- and
- (S10, S11, S12)=(S7, S8, S9)+offset2
- where,
- M1=marker1 (covering 17 data symbols),
- and
- M2=marker 2 (covering 16 data symbols).
For encoding of checksums M3 and M4, the messages are structured such that - Checksum1=S1+S2+S3 (modulo 16)
- and
- Checksum2=S7+S8+S9 (modulo 16)
- where,
- M3=Checksum1+Checksum2 (modulo 16),
- and
- M4=Checksum1−Checksum2 (modulo 16).
Turning to
For received audio signals in the time domain, decoder 350 transforms such signals to the frequency domain by means of function 356. Function 356 preferably is performed by a digital processor implementing a fast Fourier transform (FFT) although a direct cosine transform, a chirp transform or a Winograd transform algorithm (WFTA) may be employed in the alternative. Any other time-to-frequency-domain transformation function providing the necessary resolution may be employed in place of these. It will be appreciated that in certain implementations, function 356 may also be carried out by filters, by an application specific integrated circuit, or any other suitable device or combination of devices. Function 356 may also be implemented by one or more devices which also implement one or more of the remaining functions illustrated in
The frequency domain-converted audio signals are processed in a symbol values derivation function 360, to produce a stream of symbol values for each code symbol included in the received audio signal. The produced symbol values may represent, for example, signal energy, power, sound pressure level, amplitude, etc., measured instantaneously or over a period of time, on an absolute or relative scale, and may be expressed as a single value or as multiple values. Where the symbols are encoded as groups of single frequency components each having a predetermined frequency, the symbol values preferably represent either single frequency component values or one or more values based on single frequency component values. Function 360 may be carried out by a digital processor, which advantageously carries out some or all of the other functions of decoder 350. However, the function 360 may also be carried out by an application specific integrated circuit, or by any other suitable device or combination of devices, and may be implemented by apparatus apart from the means which implement the remaining functions of the decoder 350.
The stream of symbol values produced by the function 360 are accumulated over time in an appropriate storage device on a symbol-by-symbol basis, as indicated by function 366. In particular, function 366 is advantageous for use in decoding encoded symbols which repeat periodically, by periodically accumulating symbol values for the various possible symbols. For example, if a given symbol is expected to recur every X seconds, the function 366 may serve to store a stream of symbol values for a period of nX seconds (n>1), and add to the stored values of one or more symbol value streams of nX seconds duration, so that peak symbol values accumulate over time, improving the signal-to-noise ratio of the stored values. Function 366 may be carried out by a digital processor (or a DSP) which advantageously carries out some or all of the other functions of the decoder. However, the function 366 may also be carried out using a memory device separate from such a processor, or by an application specific integrated circuit, or by any other suitable device or combination of devices, and may be implemented by apparatus apart from the means which implements the remaining functions of the decoder 350.
The accumulated symbol values stored by the function 366 are then examined by the function 370 to detect the presence of an encoded message and output the detected message at an output 376. Function 370 can be carried out by matching the stored accumulated values or a processed version of such values, against stored patterns, whether by correlation or by another pattern matching technique. However, function 370 advantageously is carried out by examining peak accumulated symbol values, checksums and their relative timing, to reconstruct their encoded message from independent or merged layers. This function may be carried out after the first stream of symbol values has been stored by the function 366 and/or after each subsequent stream has been added thereto, so that the message is detected once the signal-to-noise ratios of the stored, accumulated streams of symbol values reveal a valid message pattern using the checksums.
In order to separate the various components, a processor on device 200 repeatedly carries out FFTs on audio signal samples falling within successive, predetermined intervals. The intervals may overlap, although this is not required. In an exemplary embodiment, ten overlapping FFT's are carried out during each second of decoder operation. Accordingly, the energy of each symbol period falls within five FFT periods. The FFT's are preferably windowed, although this may be omitted in order to simplify the decoder. The samples are stored and, when a sufficient number are thus available, a new FFT is performed, as indicated by steps 434 and 438.
In this embodiment, the frequency component values are produced on a relative basis. That is, each component value is represented as a signal-to-noise ratio (SNR), produced as follows. The energy within each frequency bin of the FFT in which a frequency component of any symbol can fall provides the numerator of each corresponding SNR Its denominator is determined as an average of adjacent bin values. For example, the average of seven of the eight surrounding bin energy values may be used, the largest value of the eight being ignored in order to avoid the influence of a possible large bin energy value which could result, for example, from an audio signal component in the neighborhood of the code frequency component. Also, given that a large energy value could also appear in the code component bin, for example, due to noise or an audio signal component, the SNR is appropriately limited. In this embodiment, if SNR>6.0, then SNR is limited to 6.0, although a different maximum value may be selected. The ten SNR's of each FFT and corresponding to each symbol which may be present, are combined to form symbol SNR's which are stored in a circular symbol SNR buffer, as indicated in step 442. In certain embodiments, the ten SNR's for a symbol are simply added, although other ways of combining the SNR's may be employed. The symbol SNR's for each of the twelve symbols, markers and checksums are stored in the symbol SNR buffer as separate sequences, one symbol SNR for each FFT for a sequence of FFT's. After the values produced in the FFT's have been stored in the symbol SNR buffer, new symbol SNR's are combined with the previously stored values, as described below.
When the symbol SNR buffer is filled, this is detected in a step 446. In certain advantageous embodiments, the stored SNR's are adjusted to reduce the influence of noise in a step 452, although this step may be optional. In this optional step, a noise value is obtained for each symbol (row) in the buffer by obtaining the average of all stored symbol SNR's in the respective row each time the buffer is filled. Then, to compensate for the effects of noise, this average or “noise” value is subtracted from each of the stored symbol SNR values in the corresponding row. In this manner, a “symbol” appearing only briefly, and thus not a valid detection, may be averaged out over time.
After the symbol SNR's have been adjusted by subtracting the noise level, the decoder attempts to recover the message by examining the pattern of maximum SNR values in the buffer in a step 456. In certain embodiments, the maximum SNR values for each symbol are located in a process of successively combining groups of five adjacent SNR's, by weighting the values in the sequence in proportion to the sequential weighting (6 10 10 10 6) and then adding the weighted SNR's to produce a comparison SNR centered in the time period of the third SNR in the sequence. This process is carried out progressively throughout the FFT periods of each symbol. For example, a first group of five SNR's for a specific symbol in FFT time periods (e.g., 1-5) are weighted and added to produce a comparison SNR for a specific FFT period (e.g., 3). Then a further comparison SNR is produced using the SNR's from successive FFT periods (e.g., 2-6), and so on until comparison values have been obtained centered on all FFT periods. However, other means may be employed for recovering the message. For example, either more or less than five SNR's may be combined, they may be combined without weighing, or they may be combined in a non-linear fashion.
After the comparison SNR values have been obtained, the decoder algorithm examines the comparison SNR values for a message pattern. Under a preferred embodiment, the synchronization (“marker”) code symbols are located first. Once this information is obtained, the decoder attempts to detect the peaks of the data symbols. The use of a predetermined offset between each data symbol in the first segment and the corresponding data symbol in the second segment provides a check on the validity of the detected message. That is, if both markers are detected and the same offset is observed between each data symbol in the first segment and its corresponding data symbol in the second segment, it is highly likely that a valid message has been received. If this is the case, the message is logged, and the SNR buffer is cleared 466. It is understood by those skilled in the art that decoder operation may be modified depending on the structure of the message, its timing, its signal path, the mode of its detection, etc., without departing from the scope of the present invention. For example, in place of storing SNR's, FFT results may be stored directly for detecting a message.
In another embodiment, the checksums and offsets described above may be used as “soft metrics” to decode merged messages and correct any existing errors. Specifically, a multi-step process is used to calculate the soft metric of each symbol. First, the bin SNR is calculated for a given period of time as described above. Next, the bin SNRs are added to form symbol SNR for a given period of time. Symbol SNRs are then added across multiple periods of time that correspond to a message symbol interval, with weighting to compensate for the effects of the FFT window, and noise subtraction for that symbol within other portions of the message. Each weighted symbol SNR is taken from the previous step in each message position, and divided by the sum of all other weighted symbol SNRs for that message position. These results are then preferably scaled (or optionally squared), resulting in a “ratio of ratios,” which is represents a “score” or value of how strong each symbol is relative to its neighbors within the same message position. Applying these soft metrics, the decoder may find any cases that violate the encoded message structure in
Steps employed in the decoding process illustrated in
Since each five symbol message repeats every 2½ seconds, each symbol repeats at intervals of 2½ seconds or every 25 FFT's. In order to compensate for the effects of burst errors and the like, the SNR's R1 through R150 are combined by adding corresponding values of the repeating messages to obtain 25 combined SNR values SNRn, n=1, 2 . . . 25, as follows:
Accordingly, if a burst error should result in the loss of a signal interval i, only one of the six message intervals will have been lost, and the essential characteristics of the combined SNR values are likely to be unaffected by this event. Once the combined SNR values have been determined, the decoder detects the position of the marker symbol's peak as indicated by the combined SNR values and derives the data symbol sequence based on the marker's position and the peak values of the data symbols. Once the message has thus been formed, as indicated in steps 582 and 583, the message is logged. However, unlike the embodiment of
As in the decoder of
In a further variation which is especially useful in audience measurement applications, a relatively large number of message intervals are separately stored to permit a retrospective analysis of their contents to detect a media content change. In another embodiment, multiple buffers are employed, each accumulating data for a different number of intervals for use in the decoding method of
Turning to
In an alternate embodiment, multiple instances of the decoder may be initialized multiple times using different memory areas. In such a case, the decoder application would be responsible for keeping track of which memory pointers are used in subsequent calls to initialize and retrieve code from the proper decoder.
In the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
Claims
1. A method of encoding audio data with a message structure comprising a sequence of message symbols, the message symbols each comprising a combination of substantially single-frequency components having frequencies selected from a predefined set of substantially single-frequency values and a predefined symbol interval within a time base of the audio data, comprising:
- providing data defining the message symbols for the message structure; and
- encoding the audio data with the message symbols such that the message symbols coexist within two encoding layers along the time base of the audio data,
- the message structure as encoded being arranged within the time base of the audio data so that message symbols in a first of the two encoding layers are synchronized to message symbols in the second of the two encoding layers.
2. The method of claim 1, wherein the message structure comprises at least one marker symbol existing in the first encoding layer, wherein the at least one marker symbol is configured to synchronize message symbols of the message structure with the second encoding layer.
3. The method of claim 2, wherein the message structure comprises at least one checksum symbol in the second encoding layer, wherein the at least one checksum symbol is configured to validate the synchronization of the message symbols of the message structure with the first encoding layer.
4. The method of claim 1, wherein the message structure comprises one of (i) a single message and (ii) a plurality of messages.
5. The method of claim 1, wherein a least one of the message symbols in one of the encoding layers share a frequency bin with another message symbol in the other of the encoding layers.
6. The method of claim 1, wherein the audio data comprises one of radio broadcasts, television broadcasts, DVDs, MP3s, compact discs, streaming music, streaming video, network data, mini-discs, and multimedia presentations.
7. The method of claim 1, wherein the message symbols have at least one predetermined offset in the message structure.
8. An encoder for encoding audio data with a message structure, said message structure comprising a sequence of message symbols, the message symbols each comprising a combination of substantially single-frequency components having frequencies selected from a predefined set of substantially single-frequency values and a predefined symbol interval within a time base of the audio data, comprising:
- a first encoder portion configured to provide data defining the message symbols for the message structure; and
- a second encoder portion configured to encode the audio data with the message symbols such that the message symbols coexist within two encoding layers along the time base of the audio data, wherein the second encoder portion is configured to arrange the message structure within the time base of the audio data so that message symbols in a first of the two encoding layers are synchronized to message symbols in the second of the two encoding layers.
9. The encoder of claim 8, wherein the message structure comprises at least one marker symbol existing in the first encoding layer, wherein the at least one marker symbol is configured to synchronize message symbols of the message structure with the second encoding layer.
10. The encoder of claim 9, wherein the message structure comprises at least one checksum symbol in the second encoding layer, wherein the at least one checksum symbol is configured to validate the synchronization of the message symbols of the message structure with the first encoding layer.
11. The encoder of claim 8, wherein the message structure comprises one of (i) a single message and (ii) a plurality of messages.
12. The encoder of claim 8, wherein a least one of the message symbols in one of the encoding layers share a frequency bin with another message symbol in the other of the encoding layers.
13. The encoder of claim 8, wherein the audio data comprises one of radio broadcasts, television broadcasts, DVDs, MP3s, compact discs, streaming music, streaming video, network data, mini-discs, and multimedia presentations.
14. The encoder of claim 8, wherein the encoder is configured such that the message symbols have at least one predetermined offset in the message structure.
15. A computer program product, comprising a tangible, non-transitory computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to encode audio data with a message structure, said message structure comprising a sequence of message symbols, the message symbols each comprising a combination of substantially single-frequency components having frequencies selected from a predefined set of substantially single-frequency values and a predefined symbol interval within a time base of the audio data, comprising:
- providing data defining the message symbols for the message structure; and
- encoding the audio data with the message symbols such that the message symbols coexist within two encoding layers along the time base of the audio data,
- the message structure as encoded being arranged within the time base of the audio data so that message symbols in a first of the two encoding layers are synchronized to message symbols in the second of the two encoding layers.
16. The computer program product of claim 15, wherein the message structure comprises at least one marker symbol existing in the first encoding layer, wherein the at least one marker symbol is configured to synchronize message symbols of the message structure with the second encoding layer.
17. The computer program product of claim 16, wherein the message structure comprises at least one checksum symbol in the second encoding layer, wherein the at least one checksum symbol is configured to validate the synchronization of the message symbols of the message structure with the first encoding layer.
18. The computer program product of claim 15, wherein the message structure comprises one of (i) a single message and (ii) a plurality of messages.
19. The computer program product of claim 15, wherein a least one of the message symbols in one of the encoding layers share a frequency bin with another message symbol in the other of the encoding layers.
20. The computer program product of claim 15, wherein the audio data comprises one of radio broadcasts, television broadcasts, DVDs, MP3s, compact discs, streaming music, streaming video, network data, mini-discs, and multimedia presentations.
Type: Application
Filed: Sep 10, 2013
Publication Date: Feb 5, 2015
Inventors: Wendell Lynch (East Lansing, MI), John Stavropoulos (Edison, NJ), David Gish (Riverdale, NJ), Alan Neuhauser (Silver Spring, MD)
Application Number: 14/023,221
International Classification: G10L 19/00 (20060101);