Apparatus, System and Method for Reading Codes From Digital Audio on a Processing Device

Info

Publication number: 20150039321
Type: Application
Filed: Jul 31, 2013
Publication Date: Feb 5, 2015
Applicant: Arbitron Inc. (Columbia, MD)
Inventors: Alan Neuhauser (Silver Spring, MD), John Stavropoulos (Edison, NJ), William McKenna (Columbia, MD)
Application Number: 13/955,438

Abstract

Apparatus, system and method for reading ancillary code embedded into digital audio, where a processing device executes a decoder application that includes a decoder application interface that is communicatively coupled to a media player application within one or more frameworks of the processing device. As digital audio is received and sampled, the decoder application is configured to transform the digital audio from a time domain to a frequency domain and to process frequency characteristics to determine the presence of ancillary audio codes.

Description

Description

TECHNICAL FIELD

The present disclosure generally relates to mobile devices and communications networks. In particular the disclosure is directed to monitoring one or more mobile terminals using internal proxies and processing and distributing the related data through customized interfaces utilizing cumulative and centralized intelligence for data handling.

BACKGROUND INFORMATION

There has been considerable interest in monitoring the use of mobile terminals, such as smart phones, tablets, laptops, etc. for audience measurement and/or marketing purposes. In the area of media exposure monitoring, ancillary audio codes have shown themselves to be particularly effective in assisting media measurement entities to determine and establish media exposure data. One technique for encoding and detecting ancillary audio codes is based on Critical Band Encoding Technology (CBET), pioneered by Arbitron Inc., which is currently being used in conjunction with a special-purpose Personal People Meters (PPM™) to detect codes via ambient encoded audio.

Challenges currently exist in the realm of general-purpose “smart” devices to be able to configure them to read ancillary code from digital audio that may be reproduced on the device itself and/or received and reproduced from a computer network, such as the Internet. For one, general purpose devices will not typically come equipped with special-purpose digital signal processors (DSPs), and require special consideration when incorporating decoding algorithms together with the device's software application architecture. Additionally, since the process of decoding or reading ancillary audio codes is sensitive to noise, it may not be desirable to use a device's microphone to receiving audio input.

Currently, techniques exist for collecting data relating to digital media received on mobile terminals, but they typically require the collection of metadata related to the digital media, which must be separately processed. One such example may be found in U.S. patent appliaction Ser. No. 13/341,646, titled “Monitoring Streaming Media Content” to Ramaswamy et al., filed Dec. 30, 2011, which is incorporated by reference herein. In this example, metering data is extracted from media using a transport stream, where the metering data must be transcoded into a second format in order to be decoded. The transcoded metering data is then encoded into a metadata channel, where it is transmitted to a content presentation device for subsequent identification. While it may be effective, such a process is unnecessarily complex and requires multiple processing steps in order to obtain the metered data. It would be desirable to have a configuration where ancillary audio codes may be efficiently read out on a device without resorting to metadata, in order to provide a more simplified system for media audience measurement.

BRIEF SUMMARY

Under one exemplary embodiment, a processing device is disclosed, wherein the processing device comprises an audio input configured to receive digital audio from one of (i) a computer network and (ii) a digital media file; a memory, operatively coupled to the audio input; and a processing apparatus, communicatively coupled to the audio input, wherein the processing apparatus is configured to activate a decoder application on said processing device to transform the received digital audio from a time domain to a frequency domain and store the transformed digital audio in the memory; wherein the decoding application is configured to process the stored digital audio using a decoder application interface to determine the presence of ancillary codes encoded into the digital audio.

Under another exemplary embodiment, a processor-based method is disclosed for configuring a processing device to read ancillary codes embedded in digital audio, comprising the steps of receiving digital audio at an input of the processing device from one of (i) a computer network and (ii) a digital media file; and configuring a processor on the processing device to activate a decoder application on said processing device to transform the received digital audio from a time domain to a frequency domain and store the transformed digital audio in a memory; wherein the decoding application is configured to process the stored digital audio using a decoder application interface to determine the presence of ancillary codes encoded into the digital audio.

In yet another exemplary embodiment, a computer program product is disclosed, comprising a computer usable medium having a computer readable program code tangibly embodied therein, said computer readable program code adapted to be executed to implement a method for reading ancillary code from digital audio, said method comprising: sampling the digital audio from an input of the processing device, said input being configured to receive the digital audio from one of (i) a computer network and (ii) a digital media file; and activating a decoder application on said processing device to transform the received digital audio from a time domain to a frequency domain and store the transformed digital audio in a memory; wherein the decoding application is configured to process the stored digital audio using a decoder application interface to determine the presence of ancillary codes encoded into the digital audio.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 is an exemplary system diagram illustrating communication among mobile terminals to a computer network that is communicatively coupled to at least one server arrangement and external entities;

FIG. 2 is an exemplary mobile terminal portable computing device configured to provide monitoring capabilities on the device;

FIG. 3 is an exemplary software architecture for implementing a decoding application under one embodiment;

FIG. 4 is an exemplary framework for media reproduction and decoding under one embodiment;

FIG. 5 is an exemplary message structure for decoding ancillary codes and/or messages that may be suitable for obtaining supplemental information;

FIG. 6 illustrates an exemplary decoding process under one embodiment;

FIG. 7 is an exemplary flow chart illustrating a methodology for retrieving ancillary code from an encoded audio signal;

FIG. 8 is an exemplary flow chart illustrating another methodology for retrieving ancillary code from an encoded audio signal; and

FIG. 9 is an exemplary flow diagram for executing a decoder application on a processing device under one embodiment.

DETAILED DESCRIPTION

A mobile terminal as used herein comprises at least one wireless communications transceiver. Non-limiting examples of the transceivers include a GSM (Global System for Mobile Communications) transceiver, a GPRS (General Packet Radio Service) transceiver, an EDGE (Enhanced Data rates for Global Evolution) transceiver, a UMTS (Universal Mobile Telecommunications System) transceiver, a WCDMA (wideband code division multiple access) transceiver, a PDC (Personal Digital Cellular) transceiver, a PHS (Personal Handy-phone System) transceiver, and a WLAN (Wireless LAN, wireless local area network) transceiver. The transceiver may be such that it is configured to co-operate with a predetermined communications network (infrastructure), such as the transceivers listed above. The network may further connect to other networks and provide versatile switching means for establishing circuit switched and/or packet switched connections between the two end points. Additionally, the device may include a wireless transceiver such as a Bluetooth adapter meant for peer-to-peer communication and piconet/scatternet use. Furthermore, the terminal may include interface(s) for wired connections and associated communication relative to external entities, such as an USB (Universal Serial Bus) interface or a Firewire interface.

As will be explained in further detail below, mobile terminal events may be monitored, where the events may include, for example, substantially non-user-initiated incidents such as battery status change, not at least directly initiated by the user of the device. The actions may include substantially user-initiated intentional activities and incidents, for example use of the web browser, movements, reading a message, etc. Some incidents may be also considered to conveniently fit both the above incident classes.

Turning to FIG. 1, an exemplary system architecture is illustrated. The exemplary system comprises a audio monitoring part executed in one or more terminals, or portable computing devices 102, 104, 106 of respective users and a server arrangement part 112 comprising one or more server devices (112a, 112b) functionally arranged so as to establish a media server entity. Devices 102-106 are configured to monitor audio media exposure relating to their respective users in accordance with the principles set forth herein. Server 112 is typically connected to a communications network 110 whereto also the mobile terminals 102, 104, 106 are provided with access, e.g. via one or more access networks 108a, 108b, which may be cellular, wired or wireless local area networks, for instance. External entities 114 such as services/servers (114a, 114b) may be connected to the server arrangement 112 via the network 110 for obtaining, storing and processing audio code data received from devices 102-106 and related data derived therefrom and/or for providing supplementary data.

FIG. 2 is an exemplary embodiment of a portable computing device 200 which may function as a terminal (see references 102, 104 and 106 of FIG. 1), and may be a smart phone, tablet computer, laptop or the like. Device 200 may include a central processing unit (CPU) 201 (which may include one or more computer readable storage mediums), a memory controller 202, one or more processors 203, a peripherals interface 204, RF circuitry 205, audio circuitry 206, a speaker 220, a microphone 220, and an input/output (I/O) subsystem 211 having display controller 212, control circuitry for one or more sensors 213 and input device control 214. These components may communicate over one or more communication buses or signal lines in device 200. It should be appreciated that device 200 is only one example of a portable multifunction device 200, and that device 200 may have more or fewer components than shown, may combine two or more components, or a may have a different configuration or arrangement of the components. The various components shown in FIG. 2 may be implemented in hardware or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

In one example, decoder 213 may be configured as software tangibly embodied in memory 208, which may communicate with other software in memory 208 and CPU 201, as well as audio circuitry 206, and serves to decode ancillary data embedded in audio signals in order to detect exposure to media. Examples of techniques for encoding and decoding such ancillary data are disclosed in U.S. Pat. No. 6,871,180, titled “Decoding of Information in Audio Signals,” issued Mar. 22, 2005, and is incorporated by reference in its entirety herein. Other suitable techniques for encoding data in audio data are disclosed in U.S. Pat. No. 7,640,141 to Ronald S. Kolessar and U.S. Pat. No. 5,764,763 to James M. Jensen, et al., which are incorporated by reference in their entirety herein. Other appropriate encoding techniques are disclosed in U.S. Pat. No. 5,579,124 to Aijala, et al., U.S. Pat. Nos. 5,574,962, 5,581,800 and 5,787,334 to Fardeau, et al., and U.S. Pat. No. 5,450,490 to Jensen, et al., each of which is assigned to the assignee of the present application and all of which are incorporated herein by reference in their entirety.

An audio signal which may be encoded with a plurality of code symbols may be received via data communication through RF interface 205 via audio circuitry 206, or through any other data interface allowing for the receipt of audio/visual data in digital form. Also, encoded audio signals may be reproduced on device 200 through digital files stored in memory 208 and executed through one or more applications (214) stored in memory 208 such as a media player that is linked to audio circuitry 206. From the following description in connection with the accompanying drawings, it will be appreciated that decoder 213 is capable of detecting codes in addition to those arranged in the formats disclosed hereinabove. Memory 208 may also include high-speed random access memory (RAM) and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state memory devices. Access to memory 208 by other components of the device 200, such as processor 203, decoder 213 and peripherals interface 204, may be controlled by the memory controller 202. Peripherals interface 204 couples the input and output peripherals of the device to the processor 203 and memory 208. The one or more processors 203 run or execute various software programs and/or sets of instructions stored in memory 208 to perform various functions for the device 200 and to process data. In some embodiments, the peripherals interface 204, processor(s) 203, decoder 213 and memory controller 202 may be implemented on a single chip, such as a chip 201. In some other embodiments, they may be implemented on separate chips.

The RF (radio frequency) circuitry 205 receives and sends RF signals, also known as electromagnetic signals. The RF circuitry 205 converts electrical signals to/from electromagnetic signals and communicates with communications networks and other communications devices via the electromagnetic signals. The RF circuitry 205 may include well-known circuitry for performing these functions, including but not limited to an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chipset, a subscriber identity module (SIM) card, memory, and so forth. RF circuitry 205 may communicate with networks, such as the Internet, also referred to as the World Wide Web (WWW), an intranet and/or a wireless network, such as a cellular telephone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN), and other devices by wireless communication. The wireless communication may use any of a plurality of communications standards, protocols and technologies, including but not limited to Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), high-speed downlink packet access (HSDPA), wideband code division multiple access (W-CDMA), code division multiple access (CDMA), time division multiple access (TDMA), Bluetooth, Wireless Fidelity (Wi-Fi) (e.g., IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n), voice over Internet Protocol (VoIP), Wi-MAX, a protocol for email (e.g., Internet message access protocol (IMAP) and/or post office protocol (POP)), instant messaging (e.g., extensible messaging and presence protocol (XMPP), Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions (SIMPLE), and/or Instant Messaging and Presence Service (IMPS)), and/or Short Message Service (SMS)), or any other suitable communication protocol, including communication protocols not yet developed as of the filing date of this document.

Audio circuitry 206, speaker 220, and microphone 221 provide an audio interface between a user and the device 200. Audio circuitry 206 may receive audio data from the peripherals interface 204, converts the audio data to an electrical signal, and transmits the electrical signal to speaker 221. The speaker 221 converts the electrical signal to human-audible sound waves. Audio circuitry 206 also receives electrical signals converted by the microphone 221 from sound waves, which may include encoded audio, described above. The audio circuitry 206 converts the electrical signal to audio data and transmits the audio data to the peripherals interface 204 for processing. Audio data may be retrieved from and/or transmitted to memory 208 and/or the RF circuitry 205 by peripherals interface 204. In some embodiments, audio circuitry 206 also includes a headset jack for providing an interface between the audio circuitry 206 and removable audio input/output peripherals, such as output-only headphones or a headset with both output (e.g., a headphone for one or both ears) and input (e.g., a microphone).

I/O subsystem 211 couples input/output peripherals on the device 200, such as touch screen 215 and other input/control devices 217, to the peripherals interface 204. The I/O subsystem 211 may include a display controller 218 and one or more input controllers 220 for other input or control devices. The one or more input controllers 220 receive/send electrical signals from/to other input or control devices 217. The other input/control devices 217 may include physical buttons (e.g., push buttons, rocker buttons, etc.), dials, slider switches, joysticks, click wheels, and so forth. In some alternate embodiments, input controller(s) 220 may be coupled to any (or none) of the following: a keyboard, infrared port, USB port, and a pointer device such as a mouse, an up/down button for volume control of the speaker 221 and/or the microphone 222. Touch screen 215 may also be used to implement virtual or soft buttons and one or more soft keyboards.

Touch screen 215 provides an input interface and an output interface between the device and a user. The display controller 218 receives and/or sends electrical signals from/to the touch screen 215. Touch screen 215 displays visual output to the user. The visual output may include graphics, text, icons, video, and any combination thereof (collectively termed “graphics”). In some embodiments, some or all of the visual output may correspond to user-interface objects. Touch screen 215 has a touch-sensitive surface, sensor or set of sensors that accepts input from the user based on haptic and/or tactile contact. Touch screen 215 and display controller 218 (along with any associated modules and/or sets of instructions in memory 208) detect contact (and any movement or breaking of the contact) on the touch screen 215 and converts the detected contact into interaction with user-interface objects (e.g., one or more soft keys, icons, web pages or images) that are displayed on the touch screen. In an exemplary embodiment, a point of contact between a touch screen 215 and the user corresponds to a finger of the user. Touch screen 215 may use LCD (liquid crystal display) technology, or LPD (light emitting polymer display) technology, although other display technologies may be used in other embodiments. Touch screen 215 and display controller 218 may detect contact and any movement or breaking thereof using any of a plurality of touch sensing technologies now known or later developed, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with a touch screen 215.

Device 200 may also include one or more sensors 216 such as optical sensors that comprise charge-coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) phototransistors. The optical sensor may capture still images or video, where the sensor is operated in conjunction with touch screen display 215. Device 200 may also include one or more accelerometers 207, which may be operatively coupled to peripherals interface 204. Alternately, the accelerometer 207 may be coupled to an input controller 214 in the I/O subsystem 211. The accelerometer is preferably configured to output accelerometer data in the x, y, and z axes.

In some embodiments, the software components stored in memory 208 may include an operating system 209, a communication module 210, a text/graphics module 211, a Global Positioning System (GPS) module 212, audio decoder 213 and applications 214. Operating system 209 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or an embedded operating system such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware and software components. Communication module 210 facilitates communication with other devices over one or more external ports and also includes various software components for handling data received by the RF circuitry 205. An external port (e.g., Universal Serial Bus (USB), Firewire, etc.) may be provided and adapted for coupling directly to other devices or indirectly over a network (e.g., the Internet, wireless LAN, etc.

Text/graphics module 211 includes various known software components for rendering and displaying graphics on the touch screen 215, including components for changing the intensity of graphics that are displayed. As used herein, the term “graphics” includes any object that can be displayed to a user, including without limitation text, web pages, icons (such as user-interface objects including soft keys), digital images, videos, animations and the like. Additionally, soft keyboards may be provided for entering text in various applications requiring text input. GPS module 212 determines the location of the device and provides this information for use in various applications. Applications 214 may include various modules, including address books/contact list, email, instant messaging, video conferencing, media player, widgets, instant messaging, camera/image management, and the like. Examples of other applications include word processing applications, JAVA-enabled applications, encryption, digital rights management, voice recognition, and voice replication.

Returning briefly to the example of FIG. 1, user devices 102-106 may receive media received from a media source 112, which preferably provides network-based media, such as streaming media or digital media files. Media source 112 may comprise one or more servers (102a, 102b) communicatively linked to network 110, which may provide media to devices 102-105 via wired, wireless (108b) and/or cellular (108a) communication. It is understood that other media formats are possible in this disclosure as well, including cable, satellite, distributed on storage media, or by any other means or technique that is humanly perceptible, without regard to the form or content of such data. With regard to devices 102-106, the example of FIG. 2 shows that any of devices 102-106 can be in the form of a device such as a personal computer, cell phone (or laptop, tablet, etc.). As will be explained in further details below, device 200 receives encoded audio through a wired or wireless connection (e.g., 802.11g, 802.11n, Bluetooth, etc.). The encoded audio is natively decoded using decoding software 213. After the encoded audio is decoded, one or more messages are detected.

Turning to FIG. 3, an exemplary architecture is provided for software stored in memory 208. Preferably, for each of software 209-214, and particularly for audio decoder 213 are configured in the application layer 304, which sits at the top of the operating system stack and contains the frameworks that are most commonly used by the software. Application layer 304 is preferably configured under an Objective-C platform containing standard application interfaces (APIs) known in the art. Application layer 304 is configured to support multiple frameworks for allowing software to operate, including, but not limited to, a programming interface (e.g., Java, UIKit framework) for providing user interface management, application lifecycle management, application event handling, multitasking, data protection via encryption, data handling, inter-application integration, push notification, local notification, accessibility, and the like. Other frameworks known in the art may be utilized as well. Media layer 303 may be configured to provide application layer 304 with audio, video, animation and graphics capabilities. As with the other layers comprising the stack of FIG. 3, the media layer comprises a number of frameworks that may be supported. In addition to frameworks for graphic and video support, media layer 303 may be configured to support an audio framework (Objective-C, Java) configured to allow the playback and management of audio content A core audio framework would be responsible for supporting various audio types, playback of audio files and streams and also provide access to device's 200 built-in audio processing units. A media player framework in media layer 303 would advantageously support the playing of movies, music, audio podcast, audio book files, streaming media, stored media library files, etc. at a variety of compression standards, resolutions and frame rates.

Core services layer 302 comprises fundamental system services that all applications use, and also provides interfaces that use object-oriented abstractions for working with network protocols and for providing control over protocols stack and provide simplified use of lower-level constructs such as BSD sockets. Functions of core services layer 302 provide simplified tasks such as communicating with FTP and HTTP servers or resolving DNS hosts. Core OS layer 301 is the deepest layer of the architecture of FIG. 3 and provides an interface between existing hardware and system frameworks. Core OS Layer 301 comprises the kernel environment, drivers, and basic interfaces of the operating system. Many functions including virtual memory system, threads, file system, network, and inter-process communication is managed by the kernel. It should be understood by those skilled in the art that the embodiment of FIG. 3 describes a software architecture based on multiple abstraction layers (e.g., iOS). Other suitable architectures incorporating media players and audio reproduction are contemplated as well. As one example, the software architecture may be based on a Linux kernel comprising middleware, libraries and APIs written in C, and application software running on an application framework which includes Java-compatible libraries based on Apache Harmony and the like.

Turning to FIG. 4, an exemplary embodiment is provided of a media reproduction software architecture that may be utilized in any of the embodiments described above in connection with FIGS. 1-3. In this example, media player 401 and audio decoder 402 are preferably configured in an application layer (304) for device 200, in which each is communicatively coupled to each other and to lower layer modules 403-406 (shown separated by the dashed line in FIG. 4). Media player 401 may be configured to control playback of audio/visual (A/V) media locally using media framework 403, subject to audio classes 404 defined for the player (e.g., AVAudioPlayer). A device may also play A/V media via embedded web content classes (e.g. UIWebView, QT Web View) or play HTTP live streams by initializing an instance of a media player item (e.g., AVPlayerItem) using a URL. Primitive data structures for media framework 403, including time-related data structures and opaque objects to carry and describe media data may be defined in core media framework 405. Supported audio types, playback and recording of audio files and streams may be defined in core audio 406 and may also provide access to the device's built-in audio processing units.

During one exemplary mode of operation, which will be discussed in greater detail below, the audio portion of media played using media player 401 is stored and/or forwarded to decoder application 402. Using one or more techniques described herein below, decoder 402 processes the audio portion to detect if ancillary codes are present within the audio. If present, the ancillary codes are read, stored, and ultimately transmitted to a remote or central location (114) where the codes may be further processed to determine characteristics (e.g., identification, origin, etc.) of the media and further determine media exposure for a user associated with a device (200) for audience measurement purposes.

With regard to encoding audio, FIG. 5 illustrates a message 500 that may be embedded/encoded into an audio signal. In this embodiment, message 500 includes three or more layers that are inserted by encoders in a parallel format. Suitable encoding techniques are disclosed in U.S. Pat. No. 6,871,180, titled “Decoding of Information in Audio Signals,” issued Mar. 22, 2005, which is assigned to the assignee of the present application, and is incorporated by reference in its entirety herein. Other suitable techniques for encoding data in audio data are disclosed in U.S. Pat. No. 7,640,141 to Ronald S. Kolessar and U.S. Pat. No. 5,764,763 to James M. Jensen, et al., which are also assigned to the assignee of the present application, and which are incorporated by reference in their entirety herein. Other appropriate encoding techniques are disclosed in U.S. Pat. No. 5,579,124 to Aijala, et al., U.S. Pat. Nos. 5,574,962, 5,581,800 and 5,787,334 to Fardeau, et al., and U.S. Pat. No. 5,450,490 to Jensen, et al., each of which is assigned to the assignee of the present application and all of which are incorporated herein by reference in their entirety.

When utilizing a multi-layered message, a plurality of layers may be present in an encoded data stream, and each layer may be used to convey different data. Turning to FIG. 5, message 500 includes a first layer 501 containing a message comprising multiple message symbols. During the encoding process, a predefined set of audio tones (e.g., ten) or single frequency code components are added to the audio signal during a time slot for a respective message symbol. At the end of each message symbol time slot, a new set of code components is added to the audio signal to represent a new message symbol in the next message symbol time slot. At the end of such new time slot another set of code components may be added to the audio signal to represent still another message symbol, and so on during portions of the audio signal that are able to psychoacoustically mask the code components so they are inaudible. Preferably, the symbols of each message layer are selected from a unique symbol set. In layer 501, each symbol set includes two synchronization symbols (also referred to as marker symbols) 504, 506, a larger number of data symbols 505, 507, and time code symbols 508. Time code symbols 508 and data symbols 905, 907 are preferably configured as multiple-symbol groups.

The second layer 502 of message 500 is illustrated having a similar configuration to layer 501, where each symbol set includes two synchronization symbols 509, 511, a larger number of data symbols 510, 512, and time code symbols 513. The third layer 503 includes two synchronization symbols 514, 516, and a larger number of data symbols 515, 517. The data symbols in each symbol set for the layers (501-503) should preferably have a predefined order and be indexed (e.g., 1, 2, 3). The code components of each symbol in any of the symbol sets should preferably have selected frequencies that are different from the code components of every other symbol in the same symbol set. Under one embodiment, none of the code component frequencies used in representing the symbols of a message in one layer (e.g., Layer1 501) is used to represent any symbol of another layer (e.g., Layer2 502). In another embodiment, some of the code component frequencies used in representing symbols of messages in one layer (e.g., Layer3 503) may be used in representing symbols of messages in another layer (e.g., Layer1 501). However, in this embodiment, it is preferable that “shared” layers have differing formats (e.g., Layer3 503, Layer1 501) in order to assist the decoder in separately decoding the data contained therein.

Sequences of data symbols within a given layer are preferably configured so that each sequence is paired with the other and is separated by a predetermined offset. Thus, as an example, if data 905 contains code 1, 2, 3 having an offset of “2”, data 507 in layer 501 would be 3, 4, 5. Since the same information is represented by two different data symbols that are separated in time and have different frequency components (frequency content), the message may be diverse in both time and frequency. Such a configuration is particularly advantageous where interference would otherwise render data symbols undetectable. Under one embodiment, each of the symbols in a layer have a duration (e.g., 0.2-0.8 sec) that matches other layers (e.g., Layer1 501, Layer2 502). In another embodiment, the symbol duration may be different (e.g., Layer 2 502, Layer 3 503). During a decoding process, the decoder detects the layers and reports any predetermined segment that contains a code.

FIG. 6 is a functional block diagram illustrating a decoding algorithm under one embodiment. An audio signal which may be encoded as described herein with a plurality of code symbols is received at a digital input 352. The received audio signal may be from streaming media, otherwise communicated signal, or a signal reproduced from storage in a device. It may be a direct-coupled or an acoustically coupled signal. From the following description in connection with the accompanying drawings, it will be appreciated that decoder 350 is capable of detecting codes in addition to those arranged in the formats disclosed herein.

For received audio signals in the time domain, decoder 350 transforms such signals to the frequency domain by means of function 356. Function 356 preferably is performed by a digital processor implementing a fast Fourier transform (FFT) although a direct cosine transform, a chirp transform or a Winograd transform algorithm (WFTA) may be employed in the alternative. Any other time-to-frequency-domain transformation function providing the necessary resolution may be employed in place of these. It will be appreciated that in certain implementations, function 356 may also be carried out by filters, by an application specific integrated circuit, or any other suitable device or combination of devices. Function 356 may also be implemented by one or more devices which also implement one or more of the remaining functions illustrated in FIG. 6.

The frequency domain-converted audio signals are processed in a symbol values derivation function 360, to produce a stream of symbol values for each code symbol included in the received audio signal. The produced symbol values may represent, for example, signal energy, power, sound pressure level, amplitude, etc., measured instantaneously or over a period of time, on an absolute or relative scale, and may be expressed as a single value or as multiple values. Where the symbols are encoded as groups of single frequency components each having a predetermined frequency, the symbol values preferably represent either single frequency component values or one or more values based on single frequency component values. Function 360 may be carried out by a digital processor, which advantageously carries out some or all of the other functions of decoder 350. However, the function 360 may also be carried out by an application specific integrated circuit, or by any other suitable device or combination of devices, and may be implemented by apparatus apart from the means which implement the remaining functions of the decoder 350.

The stream of symbol values produced by the function 360 are accumulated over time in an appropriate storage device on a symbol-by-symbol basis, as indicated by function 366. In particular, function 366 is advantageous for use in decoding encoded symbols which repeat periodically, by periodically accumulating symbol values for the various possible symbols. For example, if a given symbol is expected to recur every X seconds, the function 366 may serve to store a stream of symbol values for a period of nX seconds (n>1), and add to the stored values of one or more symbol value streams of nX seconds duration, so that peak symbol values accumulate over time, improving the signal-to-noise ratio of the stored values. Function 366 may be carried out by a digital processor (or a DSP) which advantageously carries out some or all of the other functions of decoder 350. However, the function 360 may also be carried out using a memory device separate from such a processor, or by an application specific integrated circuit, or by any other suitable device or combination of devices, and may be implemented by apparatus apart from the means which implements the remaining functions of the decoder 300.

The accumulated symbol values stored by the function 366 are then examined by the function 370 to detect the presence of an encoded message and output the detected message at an output 376. Function 370 can be carried out by matching the stored accumulated values or a processed version of such values, against stored patterns, whether by correlation or by another pattern matching technique. However, function 370 advantageously is carried out by examining peak accumulated symbol values and their relative timing, to reconstruct their encoded message. This function may be carried out after the first stream of symbol values has been stored by the function 366 and/or after each subsequent stream has been added thereto, so that the message is detected once the signal-to-noise ratios of the stored, accumulated streams of symbol values reveal a valid message pattern.

FIG. 7 is a flow chart for a decoder application according to one advantageous embodiment. Step 430 is provided for those applications in which the encoded audio signal is received in analog form, for example, where it has been picked up by a microphone or an RF receiver. The decoder of FIG. 6 is particularly well adapted for detecting code symbols each of which includes a plurality of predetermined frequency components, e.g. ten components, within a frequency range of 1000 Hz to 3000 Hz. In this embodiment, the decoder is designed specifically to detect a message having a specific sequence wherein each symbol occupies a specified time interval (e.g., 0.5 sec). In this exemplary embodiment, it is assumed that the symbol set consists of twelve symbols, each having ten predetermined frequency components, none of which is shared with any other symbol of the symbol set. It will be appreciated that the FIG. 6 decoder may readily be modified to detect different numbers of code symbols, different numbers of components, different symbol sequences and symbol durations, as well as components arranged in different frequency bands.

In order to separate the various components, a processor on device 200 repeatedly carries out FFTs on audio signal samples falling within successive, predetermined intervals. The intervals may overlap, although this is not required. In an exemplary embodiment, ten overlapping FFT's are carried out during each second of decoder operation. Accordingly, the energy of each symbol period falls within five FFT periods. The FFT's are preferably windowed, although this may be omitted in order to simplify the decoder. The samples are stored and, when a sufficient number are thus available, a new FFT is performed, as indicated by steps 434 and 438.

In this embodiment, the frequency component values are produced on a relative basis. That is, each component value is represented as a signal-to-noise ratio (SNR), produced as follows. The energy within each frequency bin of the FFT in which a frequency component of any symbol can fall provides the numerator of each corresponding SNR Its denominator is determined as an average of adjacent bin values. For example, the average of seven of the eight surrounding bin energy values may be used, the largest value of the eight being ignored in order to avoid the influence of a possible large bin energy value which could result, for example, from an audio signal component in the neighborhood of the code frequency component. Also, given that a large energy value could also appear in the code component bin, for example, due to noise or an audio signal component, the SNR is appropriately limited. In this embodiment, if SNR>6.0, then SNR is limited to 6.0, although a different maximum value may be selected. The ten SNR's of each FFT and corresponding to each symbol which may be present, are combined to form symbol SNR's which are stored in a circular symbol SNR buffer, as indicated in step 442. In certain embodiments, the ten SNR's for a symbol are simply added, although other ways of combining the SNR's may be employed. The symbol SNR's for each of the twelve symbols are stored in the symbol SNR buffer as separate sequences, one symbol SNR for each FFT for 50 μl FFT's. After the values produced in the 50 FFT's have been stored in the symbol SNR buffer, new symbol SNR's are combined with the previously stored values, as described below.

When the symbol SNR buffer is filled, this is detected in a step 446. In certain advantageous embodiments, the stored SNR's are adjusted to reduce the influence of noise in a step 452, although this step may be optional. In this optional step, a noise value is obtained for each symbol (row) in the buffer by obtaining the average of all stored symbol SNR's in the respective row each time the buffer is filled. Then, to compensate for the effects of noise, this average or “noise” value is subtracted from each of the stored symbol SNR values in the corresponding row. In this manner, a “symbol” appearing only briefly, and thus not a valid detection, is averaged out over time.

After the symbol SNR's have been adjusted by subtracting the noise level, the decoder attempts to recover the message by examining the pattern of maximum SNR values in the buffer in a step 456. In certain embodiments, the maximum SNR values for each symbol are located in a process of successively combining groups of five adjacent SNR's, by weighting the values in the sequence in proportion to the sequential weighting (6 10 10 10 6) and then adding the weighted SNR's to produce a comparison SNR centered in the time period of the third SNR in the sequence. This process is carried out progressively throughout the fifty FFT periods of each symbol. For example, a first group of five SNR's for a specific symbol in FFT time periods (e.g., 1-5) are weighted and added to produce a comparison SNR for a specific FFT period (e.g., 3). Then a further comparison SNR is produced using the SNR's from successive FFT periods (e.g., 2-6), and so on until comparison values have been obtained centered on all FFT periods. However, other means may be employed for recovering the message. For example, either more or less than five SNR's may be combined, they may be combined without weighing, or they may be combined in a non-linear fashion.

After the comparison SNR values have been obtained, the decoder algorithm examines the comparison SNR values for a message pattern. Under a preferred embodiment, the synchronization (“marker”) code symbols are located first. Once this information is obtained, the decoder attempts to detect the peaks of the data symbols. The use of a predetermined offset between each data symbol in the first segment and the corresponding data symbol in the second segment provides a check on the validity of the detected message. That is, if both markers are detected and the same offset is observed between each data symbol in the first segment and its corresponding data symbol in the second segment, it is highly likely that a valid message has been received. If this is the case, the message is logged, and the SNR buffer is cleared 466. It is understood by those skilled in the art that decoder operation may be modified depending on the structure of the message, its timing, its signal path, the mode of its detection, etc., without departing from the scope of the present invention. For example, in place of storing SNR's, FFT results may be stored directly for detecting a message.

FIG. 8 is a flow chart for another decoder configuration according to a further advantageous embodiment likewise implemented by means of a processor controlled by a decoder application. The decoder application of FIG. 8 is especially adapted to detect a repeating sequence of code symbols (e.g., 5 code symbols) consisting of a marker symbol followed by a plurality (e.g., 4) data symbols wherein each of the code symbols includes a plurality of predetermined frequency components and has a predetermined duration (e.g., 0.5 sec) in the message sequence. It is assumed in this example that each symbol is represented by ten unique frequency components and that the symbol set includes twelve different symbols. It is understood that this embodiment may readily be modified to detect any number of symbols, each represented by one or more frequency components.

Steps employed in the decoding process illustrated in FIG. 8 which correspond to those of FIG. 7 are indicated by the same reference numerals, and these steps consequently are not further described. The FIG. 8 embodiment uses a circular buffer which is twelve symbols wide by 150 FFT periods long. Once the buffer has been filled, new symbol SNRs each replace what are than the oldest symbol SNR values. In effect, the buffer stores a fifteen second window of symbol SNR values. As indicated in step 574, once the circular buffer is filled, its contents are examined in a step 578 to detect the presence of the message pattern. Once full, the buffer remains full continuously, so that the pattern search of step 578 may be carried out after every FFT.

Since each five symbol message repeats every 2½ seconds, each symbol repeats at intervals of 2½ seconds or every 25 FFT's. In order to compensate for the effects of burst errors and the like, the SNR's R1 through R150 are combined by adding corresponding values of the repeating messages to obtain 25 combined SNR values SNRn, n=1,2 . . . 25, as follows:

${SNR}_{n} = \sum_{i = 0}^{5} R_{n + 25 i}$

Accordingly, if a burst error should result in the loss of a signal interval i, only one of the six message intervals will have been lost, and the essential characteristics of the combined SNR values are likely to be unaffected by this event. Once the combined SNR values have been determined, the decoder detects the position of the marker symbol's peak as indicated by the combined SNR values and derives the data symbol sequence based on the marker's position and the peak values of the data symbols. Once the message has thus been formed, as indicated in steps 582 and 583, the message is logged. However, unlike the embodiment of FIG. 7 the buffer is not cleared. Instead, the decoder loads a further set of SNR's in the buffer and continues to search for a message.

As in the decoder of FIG. 7, it will be apparent from the foregoing to modify the decoder of FIG. 8 for different message structures, message timings, signal paths, detection modes, etc., without departing from the scope of the present invention. For example, the buffer of the FIG. 8 embodiment may be replaced by any other suitable storage device; the size of the buffer may be varied; the size of the SNR values windows may be varied, and/or the symbol repetition time may vary. Also, instead of calculating and storing signal SNR's to represent the respective symbol values, a measure of each symbol's value relative to the other possible symbols, for example, a ranking of each possible symbol's magnitude, is instead used in certain advantageous embodiments.

In a further variation which is especially useful in audience measurement applications, a relatively large number of message intervals are separately stored to permit a retrospective analysis of their contents to detect a media content change. In another embodiment, multiple buffers are employed, each accumulating data for a different number of intervals for use in the decoding method of FIG. 8. For example, one buffer could store a single message interval, another two accumulated intervals, a third four intervals and a fourth eight intervals. Separate detections based on the contents of each buffer are then used to detect a media content change.

Turning to FIG. 9, an exemplary decoder interface process is disclosed, where device (200) executes a decoder operation. The decoder in this example may be written in C, or any suitable code known in the art. At the beginning, a current version of the decoder is called and initialized in 430. At this point, use of the decoder may be dependent upon the satisfaction of an encryption key 431, which may be advantageous for limiting use of the decoder only to authorized users. The decoder interface security may comprise a required file containing encrypted decoder initialization parameters that may be used as an input for the decoder. The parameters may include pointer(s) to decoder memory, size of the decoder memory, pointer to encrypted decoder initialization parameters and pointer to an encryption key provided by the research entity, if not supplied to the application as a compile-time switch. Of course, if security is not an issue, the encryption steps may be omitted. Once any security/encryption is satisfied, the decoder loads initialization parameters that include allocating memory for audio decoding in step 432. Preferably, memory is allocated prior to executing other functions in the decoder. As audio is received in device 200, the audio is sampled 433 and transformed (e.g., FFT) in 434. As one example, the sampled audio may comprise 2048 16-bit monophonic audio samples obtained through an 8 k sample rate, while the transformation may result in 1024 FFT bin results. During the decoding process, may use the initialized pointer to access decoder memory to obtain arrays(s) of transformed bin powers returned from the transformations, and utilize them to read code in 435. Once the code is read it may be stored in memory and transmitted to a remote location for audience measurement purposes.

In an alternate embodiment, multiple instances of the decoder may be initialized multiple times using different memory areas. In such a case, the decoder application would be responsible for keeping track of which memory pointers are used in subsequent calls to initialize and retrieve code from the proper decoder.

In the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.

Claims

1. A processing device, comprising:

an audio input configured to receive digital audio from one of (i) a computer network and (ii) a digital media file;

a memory, operatively coupled to the audio input;

a processing apparatus, communicatively coupled to the audio input, wherein the processing apparatus is configured to activate a decoder application on said processing device to transform the received digital audio from a time domain to a frequency domain and store the transformed digital audio in the memory;

wherein the decoding application is configured to process the stored digital audio using a decoder application interface to determine the presence of ancillary codes encoded into the digital audio, wherein the ancillary codes identify at least one characteristic of the digital audio.

2. The processing device of claim 1, wherein the decoding application is configured to communicate with a media player application In the processing device.

3. The processing device of claim 2, wherein the media player application is configured to communicate with a core services layer in the processing device.

4. The processing device of claim 1, wherein the transformed digital audio comprises audio transformed via fast Fourier transform (FFT).

5. The processing device of claim 1, wherein the decoding application produces a plurality of symbol values for each code symbol of the ancillary codes, said symbol values representing one of digital audio signal energy, power and amplitude.

6. The processing device of claim 5, wherein the decoding application accumulates symbol values over time and processes the accumulated symbol values to determine the presence of ancillary codes encoded into the digital audio.

7. The processing device of claim 1, wherein the ancillary codes comprise single frequency code components.

8. A processor-based method for configuring a processing device to read ancillary codes embedded in digital audio, comprising:

receiving digital audio at an input of the processing device from one of (i) a computer network and (ii) a digital media file; and

configuring a processor on the processing device to activate a decoder application on said processing device to transform the received digital audio from a time domain to a frequency domain and store the transformed digital audio in a memory;

wherein the decoding application is configured to process the stored digital audio using a decoder application interface to determine the presence of ancillary codes encoded into the digital audio, wherein the ancillary codes identify at least one characteristic of the digital audio.

9. The method of claim 8, wherein the decoding application is configured to communicate with a media player application In the processing device.

10. The method of claim 9, wherein the media player application is configured to communicate with a core services layer in the processing device.

11. The method of claim 8, wherein the transformed digital audio comprises audio transformed via fast Fourier transform (FFT).

12. The method of claim 8, wherein the decoding application produces a plurality of symbol values for each code symbol of the ancillary codes, said symbol values representing one of digital audio signal energy, power and amplitude.

13. The method of claim 12, wherein the decoding application accumulates symbol values over time and processes the accumulated symbol values to determine the presence of ancillary codes encoded into the digital audio.

14. The method of claim 8, wherein the ancillary codes comprise single frequency code components.

15. A computer program product, comprising a computer usable medium having a computer readable program code tangibly embodied therein, said computer readable program code adapted to be executed to implement a method for reading ancillary code from digital audio, said method comprising:

sampling the digital audio from an input of the processing device, said input being configured to receive the digital audio from one of (i) a computer network and (ii) a digital media file; and

activating a decoder application on said processing device to transform the received digital audio from a time domain to a frequency domain and store the transformed digital audio in a memory;

wherein the decoding application is configured to process the stored digital audio using a decoder application interface to determine the presence of ancillary codes encoded into the digital audio, and wherein the ancillary codes identify at least one characteristic of the digital audio.

15. The computer program product of claim 14, wherein the decoding application is configured to communicate with a media player application. In the processing device.

16. The computer program product of claim 15, wherein the media player application is configured to communicate with a core services layer in the processing device.

17. The computer program product of claim 14, wherein the transformed digital audio comprises audio transformed via fast Fourier transform (FFT).

18. The computer program product of claim 14, wherein the decoding application produces a plurality of symbol values for each code symbol of the ancillary codes, said symbol values representing one of digital audio signal energy, power and amplitude.

19. The computer program product of claim 18, wherein the decoding application accumulates symbol values over time and processes the accumulated symbol values to determine the presence of ancillary codes encoded into the digital audio.

20. The computer program product of claim 14, wherein the ancillary codes comprise single frequency code components.