MEDIA CONTENT ITEMS SEQUENCING
A media content item sequencing system determines a sequence for playback of selected media content items, such as media content items in a playlist. The system calculates similarities between all possible pairs of the media content items and determines a sequence of the media content items using the similarities. The sequence of media content items can be determined by modeling the track features of the media content items with a graphic traversal problem and calculating a solution to the problem with various methods.
This application claims priority to U.S. Application Ser. No. 62/313,636 filed on Mar. 25, 2016 and entitled SYSTEM AND METHOD FOR AUTOMATIC AND SCALABLE PLAYLIST SEQUENCING AND TRANSITIONS, the disclosure of which is hereby incorporated by reference in its entirety.
BACKGROUNDMedia content, such as audio content or video content, is widely consumed in various environments, such as daily, recreation, or fitness activities. Examples of audio content include songs, albums, podcasts, audiobooks, etc. Examples of video content include movies, music videos, television episodes, etc. Using a mobile phone or other media playback device a person can access large catalogs of media content. For example, a user can access an almost limitless catalog of media content through various free and subscription-based streaming services. Additionally, a user can store a large catalog of media content on his or her mobile device.
This nearly limitless access to media content introduces new challenges for users. For example, it may be difficult to find or select the right media content that complements a particular moment such as running or other repetitive-motion activity. Further, it is desirable to play a series of media content items to create engaging, seamless, and cohesive listening experiences, which can be provided by professional music curators and DJs who carefully sort and mix tracks together. Average listeners typically lack the time and skill required to craft such an experience for their own personal enjoyment.
SUMMARYIn general terms, this disclosure is directed to systems and methods for managing a sequence between media content items. In one possible configuration and by non-limiting example, the systems and methods use a plurality of track features of media content items and determine a sequence of media content items based on similarities of the track features thereof. Various aspects are described in this disclosure, which include, but are not limited to, the following aspects.
One aspect is a method for playing media content items. The method includes determining a plurality of track features of each of the media content items; obtaining weighting data for the plurality of track features; generating a plurality of weighted track features for each of the media content items by applying the weighting data to the plurality of track features of each of the media content items; calculating aggregated track features for the media content items, respectively, based on the plurality of weighted track features; comparing the aggregated track features to determine similarities between the aggregated track features; and determining a sequence of the media content items based on the similarities.
Another aspect is a method for sequencing media content items. The method comprising determining a plurality of track features of each of the media content items; weighting the plurality of track features; mapping the plurality of weighted track features of each of the media content items to an aggregated feature vector; determining similarities among the aggregated feature vectors; and determining a sequence of the media content items based on the similarities.
Yet another aspect is a computer readable storage device storing data instructions that when executed by a processing device causes the processing device to: determine a plurality of track features of each of the media content items; weight the plurality of track features; map the plurality of weighted track features of each of the media content items to an aggregated feature vector; determine similarities among the aggregated feature vectors; and determine a sequence of the media content items based on the similarities.
Another aspect is a system comprising: at least one processing device; and at least one computer readable storage device storing data instructions, which when executed by the at least one processing device, cause the at least one processing device to: determine a plurality of track features of each of the media content items; weight the plurality of track features; map the plurality of weighted track features of each of the media content items to an aggregated feature vector; determine similarities among the aggregated feature vectors; and determine a sequence of the media content items based on the similarities.
Various embodiments will be described in detail with reference to the drawings, wherein like reference numerals represent like parts and assemblies throughout the several views. Reference to various embodiments does not limit the scope of the claims attached hereto. Additionally, any examples set forth in this specification are not intended to be limiting and merely set forth some of the many possible embodiments for the appended claims.
In general, the system of the present disclosure determines a sequence for playback of selected media content items, such as media content items in a playlist. For a given set of media content items (for example, in the form of a playlist), the system calculates similarities between all possible pairs of the media content items, and determines a sequence of the media content items using the similarities. Each of the similarities can be calculated by comparing track features of two media content items. In some embodiments, such track features can be represented as numerical values. In other embodiments, the track features can be represented as a vector. The sequence of media content items can be determined by modeling the track features of the media content items with a graphic traversal problem and calculating a solution to the problem with various methods.
In certain examples, the system of the present disclosure is used to play back a plurality of media content items to continuously support a user's repetitive motion activity without distracting the user's cadence.
As such, the system provides a simple, efficient solution to sequencing of selected media content items with professional-level quality. In certain examples, the management process for sequencing between media content items is executed in a server computing device, rather than a user's media playback device. Accordingly, the media playback device can save its resources for playing back media content items in a desirable sequence, and the management process can be efficiently maintained and conveniently modified as appropriate without interacting with the media playback device.
The media playback device 102 operates to play media content items to produce media output 108. In some embodiments, the media content items are provided by the media delivery system 104 and transmitted to the media playback device 102 using the network 106. A media content item is an item of media content, including audio, video, or other types of media content, which may be stored in any format suitable for storing media content. Non-limiting examples of media content items include songs, albums, music videos, movies, television episodes, podcasts, other types of audio or video content, and portions or combinations thereof. In this document, the media content items can also be referred to as tracks.
The media delivery system 104 operates to provide media content items to the media playback device 102. In some embodiments, the media delivery system 104 are connectable to a plurality of media playback devices 102 and provide media content items to the media playback devices 102 independently or simultaneously.
The media content sequencing engine 110 operates to play media content items in a desirable sequence. In some embodiments, a sequence of the media content items are determined by the media delivery system 104 and the media playback device 102 merely operates to play back the media content items according to the sequence. In other embodiments, the media content sequencing engine 110 operates to determine such a sequence of the media content items, either independently or in cooperation with the media delivery system 104 including the media content sequence determination engine 112.
In some embodiments, as illustrated in
The media content sequence determination engine 112 operates to determine a sequence of media content items which are played. In some embodiments, a sequence of the media content items are determined by the media delivery system 104, either independently or in cooperation with the media playback device 102 including the media content sequencing engine 110. As described herein, in some embodiments, the media content sequence determination engine 112 operates to determine a sequence of media content items where a group of the media content items are given to be played on the media playback device 102. Such a group of media content items can be provided in the form of a playlist 114, which can be manually selected by the user and/or automatically populated for the user. In other embodiments, the sequencing can be determined for other media content items stored in either or both of the media playback device 102 and the media delivery system 104.
As described herein, the media playback device 102 operates to play media content items. In some embodiments, the media playback device 102 operates to play media content items that are provided (e.g., streamed, transmitted, etc.) by a system external to the media playback device such as the media delivery system 104, another system, or a peer device. Alternatively, in some embodiments, the media playback device 102 operates to play media content items stored locally on the media playback device 102. Further, in at least some embodiments, the media playback device 102 operates to play media content items that are stored locally as well as media content items provided by other systems.
In some embodiments, the media playback device 102 is a computing device, handheld entertainment device, smartphone, tablet, watch, wearable device, or any other type of device capable of playing media content. In yet other embodiments, the media playback device 102 is a laptop computer, desktop computer, television, gaming console, set-top box, network appliance, blue-ray or DVD player, media player, stereo, or radio.
In at least some embodiments, the media playback device 102 includes a location-determining device 130, a touch screen 132, a processing device 134, a memory device 136, a content output device 138, and a network access device 140. Other embodiments may include additional, different, or fewer components. For example, some embodiments may include a recording device such as a microphone or camera that operates to record audio or video content. As another example, some embodiments do not include one or more of the location-determining device 130 and the touch screen 132.
The location-determining device 130 is a device that determines the location of the media playback device 102. In some embodiments, the location-determining device 130 uses one or more of the following technologies: Global Positioning System (GPS) technology which may receive GPS signals from satellites S, cellular triangulation technology, network-based location identification technology, Wi-Fi positioning systems technology, and combinations thereof.
The touch screen 132 operates to receive an input from a selector (e.g., a finger, stylus etc.) controlled by the user U. In some embodiments, the touch screen 132 operates as both a display device and a user input device. In some embodiments, the touch screen 132 detects inputs based on one or both of touches and near-touches. In some embodiments, the touch screen 132 displays a user interface 144 for interacting with the media playback device 102. As noted above, some embodiments do not include a touch screen 132. Some embodiments include a display device and one or more separate user interface devices. Further, some embodiments do not include a display device.
In some embodiments, the processing device 134 comprises one or more central processing units (CPU). In other embodiments, the processing device 134 additionally or alternatively includes one or more digital signal processors, field-programmable gate arrays, or other electronic circuits.
The memory device 136 operates to store data and instructions. In some embodiments, the memory device 136 stores instructions for a media playback engine 146 that includes a media content selection engine 148 and the media content sequencing engine 110.
The memory device 136 typically includes at least some form of computer-readable media. Computer readable media include any available media that can be accessed by the media playback device 102. By way of example, computer-readable media include computer readable storage media and computer readable communication media.
Computer readable storage media includes volatile and nonvolatile, removable and non-removable media implemented in any device configured to store information such as computer readable instructions, data structures, program modules, or other data. Computer readable storage media includes, but is not limited to, random access memory, read only memory, electrically erasable programmable read only memory, flash memory and other memory technology, compact disc read only memory, blue ray discs, digital versatile discs or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by the media playback device 102. In some embodiments, computer readable storage media is non-transitory computer readable storage media.
Computer readable communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” refers to a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, computer readable communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.
The content output device 138 operates to output media content. In some embodiments, the content output device 138 generates media output 108 (
The network access device 140 operates to communicate with other computing devices over one or more networks, such as the network 106. Examples of the network access device include wired network interfaces and wireless network interfaces. Wireless network interfaces includes infrared, BLUETOOTH® wireless technology, 802.11a/b/g/n/ac, and cellular or other radio frequency interfaces in at least some possible embodiments.
The media playback engine 146 operates to play back one or more of the media content items (e.g., music) to the user U. When the user U is running while using the media playback device 102, the media playback engine 146 can operate to play media content items to encourage the running of the user U, as illustrated with respect to
The media content selection engine 148 operates to retrieve one or more media content items. In some embodiments, the media content selection engine 148 is configured to send a request to the media delivery system 104 for media content items and receive information about such media content items for playback. In some embodiments, media content items can be stored in the media delivery system 104. In other embodiments, media content items can be stored locally in the media playback device 102. In yet other embodiments, some media content items can be stored locally in the media playback device 102 and other media content items can be stored in the media delivery system 104.
The media content sequencing engine 110 is included in the media playback engine 146 in some embodiments. The media content sequencing engine 110, either independently or in cooperation with the media content sequence determination engine 112, can operate to arrange similar media content items closely so as to provide engaging, seamless and cohesive listening experiences which would otherwise be manually performed by music professionals, such as disc jockeys. Such sequencing can be performed by the media content sequence determination engine 112 of the media delivery system 104 alone. As described herein, such a sequence of media content items can also support a user's repetitive motion activity.
With still reference to
In some embodiments, the media delivery system 104 includes a media server application 150, a processing device 152, a memory device 154, and a network access device 156. The processing device 152, memory device 154, and network access device 156 may be similar to the processing device 134, memory device 136, and network access device 140 respectively, which have each been previously described.
In some embodiments, the media server application 150 operates to stream music or other audio, video, or other forms of media content. The media server application 150 includes a media stream service 160, a media data store 162, and a media application interface 164.
The media stream service 160 operates to buffer media content such as media content items 170 (including 170A, 170B, and 170Z) for streaming to one or more streams 172A, 172B, and 172Z.
The media application interface 164 can receive requests or other communication from media playback devices or other systems, to retrieve media content items from the media delivery system 104. For example, in
In some embodiments, the media data store 162 stores media content items 170, media content metadata 174, and playlists 176. The media data store 162 may comprise one or more databases and file systems. Other embodiments are possible as well. As noted above, the media content items 170 may be audio, video, or any other type of media content, which may be stored in any format for storing media content.
The media content metadata 174 operates to provide various pieces of information associated with the media content items 170. In some embodiments, the media content metadata 174 includes one or more of title, artist name, album name, length, genre, mood, era, etc.
In some embodiments, the media content metadata 174 includes acoustic metadata, cultural metadata, and explicit metadata. The acoustic metadata may be derived from analysis of the track refers to a numerical or mathematical representation of the sound of a track. Acoustic metadata may include temporal information such as tempo, rhythm, beats, downbeats, tatums, patterns, sections, or other structures. Acoustic metadata may also include spectral information such as melody, pitch, harmony, timbre, chroma, loudness, vocalness, or other possible features. Acoustic metadata may take the form of one or more vectors, matrices, lists, tables, and other data structures. Acoustic metadata may be derived from analysis of the music signal. One form of acoustic metadata, commonly termed an acoustic fingerprint, may uniquely identify a specific track. Other forms of acoustic metadata may be formed by compressing the content of a track while retaining some or all of its musical characteristics.
The cultural metadata refers to text-based information describing listeners' reactions to a track or song, such as styles, genres, moods, themes, similar artists and/or songs, rankings, etc. Cultural metadata may be derived from expert opinion such as music reviews or classification of music into genres. Cultural metadata may be derived from listeners through websites, chatrooms, blogs, surveys, and the like. Cultural metadata may include sales data, shared collections, lists of favorite songs, and any text information that may be used to describe, rank, or interpret music. Cultural metadata may also be generated by a community of listeners and automatically retrieved from Internet sites, chat rooms, blogs, and the like. Cultural metadata may take the form of one or more vectors, matrices, lists, tables, and other data structures. A form of cultural metadata particularly useful for comparing music is a description vector. A description vector is a multi-dimensional vector associated with a track, album, or artist. Each term of the description vector indicates the probability that a corresponding word or phrase would be used to describe the associated track, album or artist.
The explicit metadata refers to factual or explicit information relating to music. Explicit metadata may include album and song titles, artist and composer names, other credits, album cover art, publisher name and product number, and other information. Explicit metadata is generally not derived from the music itself or from the reactions or opinions of listeners.
At least some of the metadata 174, such as explicit metadata (names, credits, product numbers, etc.) and cultural metadata (styles, genres, moods, themes, similar artists and/or songs, rankings, etc.), for a large library of songs or tracks can be evaluated and provided by one or more third party service providers. Acoustic and cultural metadata may take the form of parameters, lists, matrices, vectors, and other data structures. Acoustic and cultural metadata may be stored as XML files, for example, or any other appropriate file type. Explicit metadata may include numerical, text, pictorial, and other information. Explicit metadata may also be stored in an XML or other file. All or portions of the metadata may be stored in separate files associated with specific tracks. All or portions of the metadata, such as acoustic fingerprints and/or description vectors, may be stored in a searchable data structure, such as a k-D tree or other database format.
The playlists 176, which includes the playlist 114 (
In some embodiments, playlists can be manually created, modified, and managed by users. In other embodiments, playlists can be automatically created by the media delivery system 104, the media playback device 102, and any other computing devices and presented or recommended to the users.
Referring still to
In various embodiments, the network 106 includes various types of links. For example, the network 106 can include wired and/or wireless links, including Bluetooth, ultra-wideband (UWB), 802.11, ZigBee, cellular, and other types of wireless links. Furthermore, in various embodiments, the network 106 is implemented at various scales. For example, the network 106 can be implemented as one or more local area networks (LANs), metropolitan area networks, subnets, wide area networks (such as the Internet), or can be implemented at another scale. Further, in some embodiments, the network 106 includes multiple networks, which may be of the same type or of multiple different types.
Although
Within this description, the terms “automatically” and “automated” mean “without user intervention”. An automated task may be initiated by a user but an automated task, once initiated, proceeds to a conclusion without further user action.
Within this description, a “track” is a digital data file containing audio information. A track may be stored on a storage device such as a hard disc drive, and may be a component of a library of audio tracks. A track may be a recording of a song or a section, such as a movement, of a longer musical composition. A track may be stored in any known or future audio file format. A track may be stored in an uncompressed format, such as a WAV file, or a compressed format such as an MP3 file. In this document, however, a track is not limited to be of audio type and it is also understood that a track can indicate a media content item of any suitable type.
The method 200 can begin at operation 202, in which the media delivery system 104 receives selection of media content items. In some embodiments, the media content items to be sequenced are identified in a playlist 114 (
At operation 204, the media delivery system 104 determines one or more track features of each media content item. Track features represent various characteristics of a media content item in various forms. In some embodiments, track features can be obtained from various sources, such as the media content metadata 174 including acoustic metadata, cultural metadata, and explicit metadata. In other embodiments, track features can be obtained by retrieving the media content metadata 174 and processing it to different formats. Example track features which can be used for sequencing are further described with reference to
At operation 206, the media delivery system 104 obtains weighting data. At operation 208, the media delivery system 104 then weights the track features based on the weighting data, thereby generating weighted track features for each media content item.
Track features and weighted track features can be represented in various formats. In some embodiments, track features and weighted track features can be represented by a numerical value or score, as illustrated in
The weighting data, such as weighting data 380 (
In some embodiments, the track features used for sequencing can be weighted in a way that is consistent with intended applications. By way of example, a generic playlist of media content items can be sequenced using only timbral descriptors, while tempo and key consistency may be the most important aspects in the case of a dance party playlist where the crossfade between media content items should preserve the rhythmic regularity and harmonic flow. As such, the track features can be weighted differently according to various factors which may determine the characteristics of the set (e.g., playlist) of media content items to be sequenced.
In some embodiments, weighting information included in the weighting data can be selected or adjusted manually by a user, as further illustrated in
At operation 210, the media delivery system 104 calculates an aggregated track feature for each media content item based on the weighted track features for that media content item. In some embodiments, the aggregated track feature for each media content items, such as an aggregated track feature 302 (
The aggregated track feature can be represented in various formats. In some embodiments, the aggregated track feature can be represented by a numerical value or score, as illustrated in
In some embodiments, the operation 210 can be repeated until the aggregated track features are obtained for all of the media content items to be sequenced.
At operation 212, the media delivery system 104 compares the aggregated track features. At operation 214, the media delivery system 104 determines similarities between the media content items based on the comparison between the media content items' aggregated track features.
A similarity between media content items can be calculated in various ways. In some embodiments, where aggregated track features are represented as numerical values, a similarity between two media content items can be determined based on a difference between the aggregated track feature values of the two media content items. In other embodiments, where aggregated track features are represented as vectors, a similarity between two media content items can be determined by calculating the Euclidean distance between the vectors representative of the aggregated track features of the two media content items. In yet other embodiments, any other similarity or comparison measurement can be used to compare two media content items.
A similarity can be represented in various formats. In some embodiments, a similarity result can be a value indicating the similarity between two media content items on a predetermined scale. For example, a similarity can be a score having a value between 0 and 1, 0 and 100, etc., with 0 indicating no similarity between two media content items and the maximum value indicating that two media content items are highly similar or identical. The similarity result may be expressed as a difference score, where zero may indicate no difference between two media content items and a higher value may indicating an increasing degree of difference. The similarity score may be quantized into levels, for example A/B/C/D/E, for reporting the requester. The similarity score may be compared to a predetermined threshold and converted into a binary value, for example Yes/No, for reporting the requester.
At operation 216, the media delivery system 104 operates to sequence the media content items based on the similarities. In some embodiments, where the aggregated track features are represented by numerical values, a difference between any two of the aggregated track features can determine an order of the media content items. Such an order of the media content items can begin with a seed media content item, which is selected from the media content items and to be played first among the media content items. The seed media content item can be manually selected by the user, or automatically selected by the media delivery system 104 or the media playback device 102.
By way of example, when the seed media content item is given, the next media content item can be selected to be a media content item having an aggregated track feature value that is more similar to the aggregated track feature value of the seed media content item than to the aggregated track feature values of the other media content items. As a simple example of sequencing three media content items, a first media content item is arranged prior to a second media content item and the second media content item is arranged prior to a third media content item when a difference between an aggregated track feature value of the first media content item and an aggregated track feature value of the second media content item being smaller than a difference between the aggregated track feature value of the first media content item and an aggregated track feature value of the third media content item.
In some embodiments, the track features 230 are computed for each track in the media delivery system 104. In other embodiments, the track features 230 can be calculated using one or more software programs running on the media delivery system or one or more other computing devices.
The acoustic features 240 represent the sound of a media content item, such as timbre, melody, pitch, harmony, and other possible features. In some embodiments, the acoustic features 240 can be obtained from the acoustic metadata of the media content item.
In some embodiments, a timbre feature 250 is used as an example of the acoustic features. The timbre feature 250 is character or quality of a sound or voice as distinct from its pitch and intensity. A timber feature is a perceived sound quality of a musical note, sound, or tone that distinguishes different types of sound production, such as choir voices, and musical instruments, such as string instruments, wind instruments, and percussion instruments.
The key and mode information 242. The mode generally refers to a type of scale, coupled with a set of characteristic melodic behaviors. The key of a piece is a group of pitches or scale upon which a music composition is created. The group features a tonic note and its corresponding chords, also called a tonic or tonic chord, providing a subjective sense of arrival and rest and also has a unique relationship to the other pitches of the same group, their corresponding chords, and pitches and chords outside the group. Notes and chords other than the tonic in a piece create varying degrees of tension, resolved when the tonic note or chord returns. The key may be in the major or minor mode.
The tempo 244 indicates the speed or pace of a given piece or subsection thereof, how fast or slow. Tempo is related to meter and is usually measured by beats per minute, with the beats being a division of the measures, though tempo is often indicated by terms which have acquired standard ranges of beats per minute or assumed by convention without indication.
In other embodiments, any other features or aspects of a media content item can be additionally or alternatively used as the track features 230. The methods of using the track features 230 for sequencing described herein are also applicable to such other features and aspects used as the track features 230.
In this example, the method 270 is described as being performed in the media delivery system 104. However, in other embodiments, only some of the processes in the method 270 can be performed by the media delivery system 104. In other embodiments, all or some of the processes in the method 270 are performed by the media playback device 102. In yet other embodiments, all or some of the processes in the method 270 are performed by both of the media delivery system 104 and the media playback device 102 in cooperation. In yet other embodiments, the method 270 can be performed by other computing devices and provided to the media delivery system 104.
The method 270 is used to receive a manual selection of weights from a user. The weights can be determined based on the overall characteristics (such as styles, genres, moods, themes, similar artists and/or songs, and rankings) of the media content items in the playlist. The weights are used to scale a plurality of track features such that the media content items are sequenced and played to provide a smooth, continuous playback. The given media content items may have similar values in one or more particular track features, and thus the weights can be adjusted to emphasize such particular track features more than other track features which are not shared by all or a majority of the media content items. By way of example, where a playlist includes media content items generally suitable for a dance party, consistency in tempo and key may be important aspects to preserve rhythmic regularity and harmonic flow between the media content items. In this case, the weights can be adjusted or set to give more weight on the tempo and the key feature.
The method 270 can begin at operation 272, in which the media delivery system 104 operates to provide a user interface for receiving a user input of weights on track features. The user interface enables a user to input or adjust weights for one or more track features 230. An example of the user interface is illustrated in
At operation 274, the media delivery system 104 operates to receive a user input of weights on one or more track features 230. In some embodiments, where the user inputs the weighting values through a user computing device, the media delivery system 104 receives such inputs from the user computing device. In other embodiments, the media delivery system 104 can directly receive the user input of weights from the user.
The method 290 is used to automatically determine the weights for scaling the track features 230. The method 290 can begin at operation 292, in which the media delivery system 104 obtains sequencing history data. The sequencing history data include information about a history of sequencing media content items in general. In some embodiments, the sequencing history data include a large volume of past sequencing events that have been performed by music professionals, such as professional music curators and disc jockeys. In other embodiments, the sequencing history data include a large volume of past sequencing events that have been performed by at least some users or listeners of the media content items provided by the media delivery system 104.
At operation 294, the media delivery system 104 operates to determine the sequencing history of the given media content items based on the sequencing history data. In some embodiments, the media delivery system 104 can identify a particular characteristic of the selected media content items to be sequenced. Given a set of media content items, the set of media content items (for example, in the form of a playlist) can be characterized to have a particular attribute in common, such as styles, genres, moods, themes, similar artists and/or songs, rankings, etc. The media delivery system 104 can then determine how the same media content items, or the media content items having a similar characteristic to the characteristic of the selected media content items, have been sequenced from the sequencing history data. The media delivery system 104 further determine a correlation between the sequencing history and the track features of the same media content items or the media content items having the similar characteristic. Such a correlation can be used to determine or predict how the track features 230 of the media content items to be sequenced should be weighted.
At operation 296, the media delivery system 104 can predict weights on the track features of the media content items to be sequenced, depending on the characteristic of the media content items.
In some embodiments, the aggregated track feature 302 are obtained as a weighted sum of the track features 230, such as the timbre feature 250, the key and mode feature 242, and the tempo feature 244. In the example table 300, the track features 230 are weighted such that the tempo feature 244 is only considered without the other track features (i.e., Timber:Key/Mode:Tempo=0:0:1).
At least some of the operations in the method 330 are performed similarly to the corresponding operations in the method 200 as described with reference to
The operations 332, 334, 336, and 338 are performed similarly to the operations 202, 204, 206, and 208 in the method 200. For brevity purposes, the description of the operations 332, 334, 336, and 338 are omitted.
At operation 340, the media delivery system 104 calculates feature vectors 376 (
At operation 344, the media delivery system 104 operates to compare the aggregated feature vectors 378 of each pair of the media content items 116. At operation 346, the media delivery system 104 then determines similarities between the aggregated feature vectors 378. At operation 348, the media delivery system 104 determines a sequence of the media content items 116 based on the determined similarities. An example of the operations 344, 346, and 348 is described in more detail with reference to
In some embodiments, the vector mapping engine 372 and the aggregation engine 374 are included in the media content sequence determination engine 112. In other embodiments, the vector mapping engine 372 and the aggregation engine 374 can be included in any other part of the media delivery system 104. In yet other embodiments, the vector mapping engine 372 and the aggregation engine 374 can be included in the media playback device 102 or any other computing devices.
The vector mapping engine 372 can refer to the track features 230 of each media content item 116 and associate them to corresponding feature vectors 376 in Euclidean spaces. In other embodiments, however, at least one of the feature vectors 376 can be generated from other data which are not directly related to corresponding track features.
Where the acoustic features 240 are concerned, in some embodiments, the vector mapping engine 372 can derive acoustic vectors from a convolutional neural network. A convolutional neural network is a type of feed-forward artificial neural network. One example of the convolutional neural network that can be utilized to obtain the acoustic vectors is described in Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep Content-Based Music Recommendation. In Advances in Neural Information Processing Systems, pages 2643-2651, 2013.
In some examples, one of the acoustic vectors can capture a timbral characteristic (such as the timbre feature 250) of a media content item. According to a convolutional neural network, a low-dimensional embedding (such as in an eight (8) dimensional space (R8)) can be trained in a supervised setting to minimize the Euclidean distance between similar media content items based on metadata information. In other examples, other approaches can be used to generate one or more acoustic vectors.
Where the key and mode feature 242 is concerned, in some embodiments, the vector mapping engine 372 can map the key and mode information 242 into points 390 in a three (3) dimensional space (R3) so that adjacent keys in the circle of fifths and relative major/minor keys are equidistance, as illustrated in
Where the tempo feature 244 is concerned, in some embodiments, the vector mapping engine 372 can map the tempo feature 244 in a binary logarithmic scale. For example, in certain applications, tempo-octave invariance is preserved, and tempo is represented as a unit vector whose polar angle is mapped into a tempo octave, as illustrated in
In some embodiments, the above description about mapping of the acoustic features 240, the key and mode feature 242, and the tempo feature 244 can required for a particular purpose or application. In other embodiments, however, any number of dimensions and/or any type of mapping can be used.
Referring still to
At operation 412, the media delivery system 104 operates to calculate a distance (such as the Euclidean distance) between the aggregated feature vectors 378 of each pair from the media content items 116. Calculation of the distance between two aggregated feature vectors 378 is repeated for all possible pairs from the media content items 116 in the playlist. In other embodiments, any other distance measurement can be used to calculate a distance between two aggregated feature vectors.
At operation 414, the media delivery system 104 determines a seed media content item 310 (such as shown in
At operation 416, the media delivery system 104 determines a sequence between the media content items in the playlist based on the distances calculated at the operation 412. The sequence begins from the seed media content item 310. In a simple example, a first media content item is arranged prior to a second media content item and the second media content item is arranged prior to a third media content item when a distance between an aggregated feature vector of the first media content item and an aggregated feature vector of the second media content item is smaller than a distance between the aggregated feature vector of the first media content item and an aggregated feature vector of the third media content item. Other example sequencing methods are further described with reference to
In this example, the sequence of media content items is modeled with a graph traversal problem and determined using a graph which represents the media content items to be sequenced and the similarities between the media content items.
At operation 432, the media delivery system 104 operates to generate a graph 450 (
In the graph 450, the vertices 452 correspond with the media content items 116 to be sequenced, respectively. Where the graph 450 is symmetrical, the positions of the media content items 116 with respect to the vertices 452 are irrelevant.
Each of the edges 454 connecting the vertices 452 (i.e., the media content items) can have a property representative of the similarity between two media content items connected via that edge. In some embodiments, each of the edges 454 represents a distance (e.g., Euclidean distance) between the aggregated feature vectors 378 of two media content items connected via that edge. In the illustrated example of
At operation 434, the media delivery system 104 identifies a seed vertex 456. The seed vertex 456 is a vertex associated with the seed media content item 310. When determining an optima path in subsequent operations, the seed vertex 456 is used as a starting point.
At operation 436, the media delivery system 104 determines an optimal path 460 (dotted lines in
In some embodiments, the optimal path is found using the shortest Hamiltonian path approach. In other embodiments, the optimal path can be found using the shortest Hamiltonian cycle approach. As the Hamiltonian path problem and the Hamiltonian cycle problem are both NP-complete, approximation approaches are used to find the shortest paths at the operation 436. In one example, a straight forward greedy approximation can be used, which iteratively selects the closest non-visited vertex, starting from the seed vertex. In another example, an improvement to the straight forward greedy approximation can be made by selecting the closest non-visited vertex from either the tail or the head of the partial sequencing.
As the edges are weighted by the Euclidean distance between the corresponding media content item features (e.g., the aggregated feature vectors) in constructing the graph 450, the total cost of sequencing can be a sum of all the weights of the edges in the path.
At operation 438, the media delivery system 104 determines a sequence of the media content items based on the optimal path 460. When the optimal path 460 is determined at the operation 436, the media content items can be arranged in the same order as the corresponding vertices 452 along the calculated optimal path 460.
As such, the operations 436 and 438 are configured to determine an optimal path that visits each vertex 452 exactly once. Accordingly, such an optimal path among the vertices 452 can give an optimal order of the media content items 116.
Referring now to
Users of media playback devices often consume media content while engaging in various activities, including repetitive motion activities. As noted above, examples of repetitive-motion activities may include swimming, biking, running, rowing, and other activities. Consuming media content may include one or more of listening to audio content, watching video content, or consuming other types of media content. For ease of explanation, the embodiments described in this application are presented using specific examples. For example, audio content (and in particular music) is described as an example of one form of media consumption. As another example, running is described as one example of a repetitive-motion activity. However, it should be understood that the same concepts are equally applicable to other forms of media consumption and to other forms of repetitive-motion activities, and at least some embodiments include other forms of media consumption and/or other forms of repetitive-motion activities.
The users may desire that the media content fits well with the particular repetitive activity. For example, a user who is running may desire to listen to music with a beat that corresponds to the user's cadence. Beneficially, by matching the beat of the music to the cadence, the user's performance or enjoyment of the repetitive-motion activity may be enhanced. This desire cannot be met with traditional media playback devices and media delivery systems.
In the system 1000, the media playback device 102 further includes a cadence-acquiring device 1114, as well as the media content sequencing engine 110. Also shown are a user U who is running. The user U's upcoming steps S are shown as well. A step represents a single strike of the runner's foot upon the ground.
The media playback device 102 can play media content for the user based on the user's cadence. In the example shown, the media output 108 includes music with a tempo that corresponds to the user's cadence. The tempo (or rhythm) of music refers to the frequency of the beat and is typically measured in beats per minute (BPM). The beat is the basic unit of rhythm in a musical composition (as determined by the time signature of the music). Accordingly, in the example shown, the user U's steps occur at the same frequency as the beat of the music.
For example, if the user U is running at a cadence of 180 steps per minute, the media playback device 102 may play a media content item having a tempo equal to or approximately equal to 180 BPM. In other embodiments, the media playback device 102 plays a media content item having a tempo equal or approximately equal to the result of dividing the cadence by an integer such as a tempo that is equal to or approximately equal to one-half (e.g., 90 BPM when the user is running at a cadence of 180 steps per minute), one-fourth, or one-eighth of the cadence. Alternatively, the media playback device 102 plays a media content item having a tempo that is equal or approximately equal to an integer multiple (e.g., 2×, 4×, etc.) of the cadence. Further, in some embodiments, the media playback device 102 operates to play multiple media content items including one or more media content items having a tempo equal to or approximately equal to the cadence and one or more media content items have a tempo equal or approximately equal to the result of multiplying or dividing the cadence by an integer. Various other combinations are possible as well.
In some embodiments, the media playback device 102 operates to play music having a tempo that is within a predetermined range of a target tempo. In at least some embodiments, the predetermined range is plus or minus 2.5 BPM. For example, if the user U is running at a cadence of 180 steps per minute, the media playback device 102 operates to play music having a tempo of 177.5-182.5 BPM. Alternatively, in other embodiments, the predetermined range is itself in a range from 1 BPM to 10 BPM. Other ranges of a target tempo are also possible.
Further, in some embodiments, the media content items that are played back on the media playback device 102 have a tempo equal to or approximately equal to a user U's cadence after it is rounded. For example, the cadence may be rounded to the nearest multiple of 2.5, 5, or 10 and then the media playback device 102 plays music having a tempo equal to or approximately equal to the rounded cadence. In yet other embodiments, the media playback device 102 uses the cadence to select a predetermined tempo range of music for playback. For example, if the user U's cadence is 181 steps per minute, the media playback device 102 may operate to play music from a predetermined tempo range of 180-184.9 BPM; while if the user U's cadence is 178 steps per minute, the media playback device 102 may operate to play music from a predetermined tempo range of 175-179.9 BPM.
Referring still to
In at least some embodiments, the media server 1200 and the repetitive-motion activity server 1202 are provided by separate computing devices. In other embodiments, the media server 1200 and the repetitive-motion activity server 1202 are provided by the same computing devices. Further, in some embodiments, one or both of the media server 1200 and the repetitive-motion activity server 1202 are provided by multiple computing devices. For example, the media server 1200 and the repetitive-motion activity server 1202 may be provided by multiple redundant servers located in multiple geographic locations.
The repetitive-motion activity server 1202 operates to provide repetitive-motion activity-specific information about media content items to media playback devices. In some embodiments, the repetitive-motion activity server 1202 includes a repetitive-motion activity server application 1220, a processing device 1222, a memory device 1224, and a network access device 1226. The processing device 1222, memory device 1224, and network access device 1226 may be similar to the processing device 152, memory device 154, and network access device 156 respectively, which have each been previously described.
In some embodiments, repetitive-motion activity server application 1220 operates to transmit information about the suitability of one or more media content items for playback during a particular repetitive-motion activity. The repetitive-motion activity server application 1220 includes a repetitive-motion activity interface 1228 and a repetitive-motion activity media metadata store 1230.
In some embodiments, the repetitive-motion activity server application 1220 may provide a list of media content items at a particular tempo to a media playback device in response to a request that includes a particular cadence value. Further, in some embodiments, the media content items included in the returned list will be particularly relevant for the repetitive motion activity in which the user is engaged (for example, if the user is running, the returned list of media content items may include only media content items that have been identified as being highly runnable).
The repetitive-motion activity interface 1228 operates to receive requests or other communication from media playback devices or other systems to retrieve information about media content items from the repetitive-motion activity server 1202. For example, in
In some embodiments, the repetitive-motion activity media metadata store 1230 stores repetitive-motion activity media metadata 1232. The repetitive-motion activity media metadata store 1230 may comprise one or more databases and file systems. Other embodiments are possible as well.
The repetitive-motion activity media metadata 1232 operates to provide various information associated with media content items, such as the media content items 170. In some embodiments, the repetitive-motion activity media metadata 1232 provides information that may be useful for selecting media content items for playback during a repetitive-motion activity. For example, in some embodiments, the repetitive-motion activity media metadata 1232 stores runnability scores for media content items that corresponds to the suitability of particular media content items for playback during running. As another example, in some embodiments, the repetitive-motion activity media metadata 1232 stores timestamps (e.g., start and end points) that identify portions of a media content items that are particularly well-suited for playback during running (or another repetitive-motion activity).
Each of the media playback device 102 and the media delivery system 104 can include additional physical computer or hardware resources. In at least some embodiments, the media playback device 102 communicates with the media delivery system 104 via the network 106.
In at least some embodiments, the media delivery system 104 can be used to stream, progressively download, or otherwise communicate music, other audio, video, or other forms of media content items to the media playback device 102 based on a cadence acquired by the cadence-acquiring device 1114 of the media playback device 102. In accordance with an embodiment, a user U can direct the input to the user interface 144 to issue requests, for example, to playback media content corresponding to the cadence of a repetitive motion activity on the media playback device 102.
The media mix data generation engine 1240 operates to generate media mix data to be used for sequencing and/or crossfading cadence-based media content items. As described herein, such media mix data can be incorporated in repetitive-motion activity media metadata 1232.
In this example, the media content sequencing engine 110 operates to arrange selected media content items (such as ones in a playlist) in such an order that the media content items are played on the media playback device 102 to continuously support a user's repetitive motion activity without interruption or jarring effect.
In this document, for the purpose of determining track features or feature vectors, calculating an aggregated track feature or aggregated feature vector, or determining similarity between two media content items or tracks, a media content item or a track may indicate the entire media content item or the entire track, a portion of the media content item or a portion of the track, or a collection of media content items or a collection of tracks, such as an album or a playlist.
The various examples and teachings described above are provided by way of illustration only and should not be construed to limit the scope of the present disclosure. Those skilled in the art will readily recognize various modifications and changes that may be made without following the examples and applications illustrated and described herein, and without departing from the true spirit and scope of the present disclosure.
Claims
1. A method for playing media content items, the method comprising:
- determining a plurality of track features of each of the media content items;
- obtaining weighting data for the plurality of track features;
- generating a plurality of weighted track features for each of the media content items by applying the weighting data to the plurality of track features of each of the media content items;
- calculating aggregated track features for the media content items, respectively, based on the plurality of weighted track features;
- comparing the aggregated track features to determine similarities between the aggregated track features; and
- determining a sequence of the media content items based on the similarities.
2. The method of claim 1, further comprising:
- receiving a selection of the plurality of media content items.
3. The method of claim 1, further comprising:
- obtaining a playlist identifying the plurality of media content items.
4. The method of claim 1, wherein obtaining weighting data comprises:
- receiving a user input of weights on the plurality of track features.
5. The method of claim 1, wherein obtaining weighting data comprises:
- obtaining sequencing history data;
- determining a sequencing history of the media content items based on the sequencing history data; and
- predicting weights on the plurality of track features of the media content items.
6. The method of claim 1, wherein the plurality of track features includes acoustic features, key and mode information, and tempo.
7. The method of claim 1, wherein the aggregated track features are represented by numerical values.
8. The method of claim 1, wherein determining a sequence of the media content items comprises:
- arranging a first media content item prior to a second media content item, the first media content item and the second media content item being selected from the media content items, and the first media content item being played before the second media content item; and
- arranging the second media content item prior to a third media content item, the third media content item being selected from the media content item and played after the second media content item, a difference between a numerical value of an aggregated track feature of the first media content item and a numerical value of an aggregated track feature of the second media content item being smaller than a difference between the numerical value of the aggregated track feature of the first media content item and a numerical value of an aggregated track feature of the third media content item.
9. The method of claim 1, further comprising:
- identifying a seed media content item selected from the media content items, the seed media content item sequenced to be played first among the media content items.
10. The method of claim 8, wherein the first media content item is identified as a seed media content item, the seed media content item sequenced to be played first among the media content items.
11. A method for sequencing media content items, the method comprising:
- determining a plurality of track features of each of the media content items;
- weighting the plurality of track features;
- mapping the plurality of weighted track features of each of the media content items to an aggregated feature vector;
- determining similarities among the aggregated feature vectors; and
- determining a sequence of the media content items based on the similarities.
12. The method of claim 11, wherein determining similarities comprises:
- calculating distances between the aggregated feature vectors.
13. The method of claim 12, wherein determining a sequence of the media content items comprises:
- arranging a first media content item prior to a second media content item, the first media content item and the second media content item being selected from the media content items, and the first media content item being played before the second media content item; and
- arranging the second media content item prior to a third media content item, the third media content item being selected from the media content item and played after the second media content item, a distance between a feature vector of the first media content item and a feature vector of the second media content item being smaller than a distance between the feature vector of the first media content item and a feature vector of the third media content item.
14. The method of claim 11, wherein determining similarities comprises:
- generating a complete symmetric graph with vertices and edges, the vertices associated with the media content items, respectively, and connected via the edges, the edges having values representative of distances between the aggregated feature vectors of the media content items; and
- determining an optimal path crossing all of the vertices, the optimal path used to determine the sequence of the media content items.
15. The method of claim 14, further comprising:
- identifying a seed vertex from the vertices, the seed vertex associated with one of the media content items to be played first among the media content items.
16. The method of claim 14, wherein the optimal path includes a route defined by at least some of the edges and visiting all the vertices only once.
17. The method of claim 14, wherein the optimal path is calculated using the shorted Hamiltonian path.
18. The method of claim 11, further comprising:
- obtaining a playlist identifying the media content items.
19. The method of claim 11, further comprising:
- receiving a user input of weights on the plurality of track features.
20. A computer readable storage device storing data instructions that when executed by a processing device causes the processing device to:
- determine a plurality of track features of each of the media content items;
- weight the plurality of track features;
- map the plurality of weighted track features of each of the media content items to an aggregated feature vector;
- determine similarities among the aggregated feature vectors; and
- determine a sequence of the media content items based on the similarities.
21. A system comprising:
- at least one processing device; and
- at least one computer readable storage device storing data instructions, which when executed by the at least one processing device, cause the at least one processing device to: determine a plurality of track features of each of the media content items; weight the plurality of track features; map the plurality of weighted track features of each of the media content items to an aggregated feature vector; determine similarities among the aggregated feature vectors; and determine a sequence of the media content items based on the similarities.
Type: Application
Filed: Mar 24, 2017
Publication Date: Oct 19, 2017
Inventors: Tristan Jehan (Brooklyn, NY), Rachel Bittner (New York, NY), Nicola Montecchio (Berlin), Hunter McCurry (New York, NY), Minwei Gu (New York, NY), Gandalf Hernandez (New York, NY)
Application Number: 15/469,060