IMAGE AND AUDIO RECOGNITION AND SEARCH PLATFORM
The present disclosure relates to receiving video and audio from a plurality of devices, performing image recognition on the video and audio recognition on the audio, receiving an input image or input audio, and identifying video clips and audio clips containing a match to the input image or input audio.
Latest Patents:
This application is a continuation of U.S. patent application Ser. No. 16/429,756, filed on Jun. 3, 2019, issued as U.S. Pat. No. 11,100,953, and titled “Automatic Selection of Audio and Video Segments to Generate an Audio and Video Clip,” which is a continuation of U.S. patent application Ser. No. 14/844,471, filed on Sep. 3, 2015, issued as U.S. Pat. No. 10,347,288, and titled “Method and System for Capturing, Synchronizing, and Editing Video From a Primary Device and Devices in Proximity to the Primary Device,” which is a continuation of U.S. patent application Ser. No. 14/103,541, filed on Dec. 11, 2013, issued as U.S. Pat. No. 9,129,640, and titled “Collaborative Digital Video Platform That Enables Synchronized Capture, Curation and Editing of Multiple User-Generated Videos,” which claims priority from U.S. Patent Application Nos. 61/736,367 filed on Dec. 12, 2012, 61/760,129 filed on Feb. 3, 2013, and 61/790,066 filed on Mar. 15, 2013, each of which is incorporated herein by reference.
FIELD OF THE DISCLOSUREThe present disclosure relates to receiving video and audio from a plurality of devices, performing image recognition on the video and audio recognition on the audio, receiving an input image or input audio, and identifying video clips and audio clips containing a match to the input image or input audio.
BACKGROUNDWith the global proliferation of video enabled mobile devices, consumers capture and upload millions of videos each week. Often, numerous videos of events are posted, sometimes numbering in the hundreds for popular events such as concerts, sporting events and other public occasions. These amateur videos often are of uneven quality and length, and with a large number of websites in which to post a video, it is hard for consumers to know where to find a video of an interest of a certain topic or location.
In addition, in the prior art the labeling or tagging of a submitted video is left to the submitter and is not subject to any standards for grouping or searching. The ability to sort through this mass of video content is nearly impossible.
There is also no method to easily combine multiple videos that are captured of a particular event. Further, there is no simple way to edit those multiple videos into a single video or into a video of multiple best-of edits. Traditional film edit tools are expensive and hard to use. Further, the output is typically a single edited version based on the editor's determination of best edit. There is no consumer friendly way to create individual edits of a video, or to create and/or view an edit of a film that is a result of the wisdom of the crowd throughout the video. Other websites have created “black box” scene selectors to combine videos, but this typically results in videos of limited value, and fails to engage the crowd in the creation and edit process.
There is also no method available to consumers to enable the sharing and collaboration on video in a “private” environment that allows a limited subset of users (such as users who have been invited by an originator) to access videos and contribute videos. There is also no simple way for each individual to make his own video version or edit of an event that has been filmed by multiple cameras or smart phones during the event. There also is no method for synchronized capture and review of multiple angles from multiple locations to use for security review and entertainment. The wisdom of the “crowd” and the needs of the individual have been largely ignored in the various attempts to combine multiple amateur video submissions.
In addition, there is a need for a search platform to enable searching for video clips and audio clips that match an input image or input audio.
BRIEF SUMMARY OF THE INVENTIONThe embodiments described herein utilize an application (“app”) known as “CROWDFLIK.” CROWDFLIK is preferably implemented as an app for mobile devices, such as smartphones and tablets. It also can be implemented as an app for desktops, notebooks, or any other device that can capture video and run a software application. The app works in conjunction with a server, accessible over the Internet, which together facilitate the synchronized capture, synchronized grouping, multiple user edit, crowd curation, group and individual viewing, multiple edits and sharing of video edits and clips.
The CROWDFLIK app allows each user to activate and/or accept a location/event confirmation, or check-in, in order to activate the capture and submit video function of the CROWDFLIK app which tags or marks each submitted video with location specific data, allowing proper grouping for synchronized review and edit. During the video capture and submission process, the CROWDFLIK mobile app activates a unique process of synchronized tagging, or cutting, of the video at synchronized Y second increments according to the CROWDFLIK app's master clock, where Y is the length, typically measured in seconds, of each sub-segment of submitted video. The captured videos are cut at synchronized Y second intervals. Typically, only full Y second segments are submitted to the CROWDFLIK app's Review/Edit platform. The segments are then grouped and synchronized on the CROWDFLIK Review/Edit platform for user combination, editing, review, sharing, tagging, re-editing, saving, and more based on the location/time tag.
The CROWDFLIK Review/Edit platform allows users to review all video submissions that have been combined and synchronized for each location/time (e.g., event). The CROWDFLIK app Review/Edit platform allows users to review and edit the multiple submissions to create unique video edits of the event. The CROWDFLIK app allows for a seamless resynchronization of the series of segments selected by the user resulting in his or her own personal edit. A user is permitted to select a subset of the entire event video in order to create and save shorter videos that are a subset of the overall video based on selecting submission for each successive time segment of Y second(s). The aggregate of the individual selections determines a ‘best of’ selection for each Y second(s) segment which in turn determines the crowd curated best-of edit based on the CROWDFLIK curation algorithm.
One benefit of these embodiments is that a user can generate a video of an event using segments that were captured from different devices at the event. Unless an event is designated as private or is otherwise restricted, any user with access to the CROWDFLIK app may review, create edits, share, and upload fliks regardless of whether they attended the original event.
Another benefit of these embodiments is the ability to search within a plurality of video clips and audio clips for a match with an input image or input audio.
In a preferred embodiment, CROWDFLIK is a mobile app for use on mobile devices, such as smartphones, tablets, and the like, that works in conjunction with an Internet platform that facilitates uploading, downloading, and encoding for a variety of device platform playback, as well as coding for a variety of security and privacy preferences. The mobile app and Internet platform also facilitate the synchronized capture, synchronized grouping, distributed reviewing, crowd curation, group and individual viewing, multiple edits and sharing of edited video clips.
In order to use the CROWDFLIK app 10 video capture function, the user will set up an account with server 30 (
As illustrated in step 130, and in
With reference again to
Once the event has been confirmed, the user can begin filming or recording the event (e.g.,
At step 150, the CROWDFLIK app 10 can begin capturing the video (see also
In another embodiment, an event is created based on the participants involved rather than on location information, which allows the selected or invited devices to be synchronized over a period of time regardless of location. In another embodiment, a first user may create a personal event where his or her device is the center of the event. As the first user moves, other users in proximity of the first user may join the event, thereby capturing and contributing synchronized video to the first user's event. This can be useful, for example, if the user is sightseeing or engaged in other physical movement.
In another embodiment, a first user may create a personal event where his or her device is the center of the event. As the first user moves, video/audio captured by other users within a certain geofence of the first user automatically are added to the event, thereby capturing and contributing synchronized video to the first user's event. This can be useful, for example, if the user is sightseeing or engaged in other physical movement. For example, if a user runs a marathon and films a portion of the marathon, the user will later have access to video/audio captured by other users who were within the geofence of that user as he or she moved. The user can then create a video/audio clip (flik) that contains video/audio from other users whom the user does not even know or interact with.
The CROWDFLIK app 10 preferably synchronizes all devices at any given event that are using the CROWDFLIK 10 app to capture video, and that have selected the specific event.
However, in case multiple instances of the event are separately created by multiple users, the CROWDFLIK app 10 and server 30 can synchronize these multiple instances at a later time.
In one embodiment, the CROWDFLIK app 10 can incorporate a clock algorithm that uses the Internet or other network functionality to connect with a known, reliable clock 40 such as the US Naval Atomic clock to determine the difference between the Atomic time and the time code of each individual device. The Naval Atomic clock can then serve as the CROWDFLIK master clock, and all time stamping and coding can be referenced to this master clock. The CROWDFLIK app 10 can then apply a “time-delta” to each device based on the difference between the clock of device 20 and the master clock. Preferably, the “time-delta” can be applied as Meta tags to each video segment captured and uploaded by the CROWDFLIK app 10 for future review, edit and sharing.
Alternatively, in a peer-to-peer time embodiment, the device of the creator or another user can serve as the master clock, and all other devices who join the same event will synchronize to that clock. Thereafter, each device that has synchronized to the master clock can serve as additional master clocks for new devices that join the event.
In another embodiment, instead of time synchronization, video streams can be synchronized based on sound or images. For example, if two different devices capture video streams of an event, even if they are not synchronized to a master clock or to each other, the captured video can still be synchronized based on image recognition performed on the video streams or based on sound detection performed on the audio streams associated with the video streams. This would be particularly accurate, for example, for synchronizing multiple captures of a speech, rock concert, sporting event, etc.
When video is captured, the CROWDFLIK app 10 determines time intervals of duration “Y” to cut and synchronize the captured video at step 155. Y can be, for example, 5 seconds, 10 seconds, or any desired duration. Each Y second(s) segment of the video is tagged and/or marked at step 160 with the location data determined from the check-in, and the time data as determined by the app's master clock. The captured video can be submitted to server 30 at step 165, and all video submissions can be tagged and/or marked with time/date/location for synchronized display and edit at the proper section of the Review/Edit platform in order for videos to be placed in a synchronized fashion with other videos from like events. In certain embodiments, the app may allow users to select different values for “Y” to review and edit video. Users may select shorter or longer lengths of the segments depending on the user's needs at the time they are creating fliks. Users may also select varied time segments for their creative purposes.
The CROWDFLIK app time tagging at pre-selected, uniform intervals of Y seconds is utilized to assure the seamless re-synchronization of users' preferred video segment for each time slot.
A video captured via the CROWDFLIK app is “cut” or marked at Y second intervals. Typically, the length of time that represents Y for each segment is predetermined by the app, and is applied on a consistent basis to all video segments captured and uploaded via the CROWDFLIK app at a given event. Preferably, only full Y second(s) segments are submitted. For example, if a user begins to capture video in between the Y second(s) segments, the first video segment prior to the start of the next Y second(s) segment may be incomplete, and may not be submitted to the Review/Edit platform depending on rules that the CROWDFLIK app is applying at that time to the video captured at that event. Alternatively, the incomplete segment may be padded at the beginning of the segment with blank video content to extend the segment to a full Y second clip. A similar process may take place when the video capture ends after Y second(s) but before a subsequent Y second(s) where only the full Y second(s) segments may be uploaded to the platform. In one embodiment, all of the user's video is uploaded to the server, and only segments that are Y seconds (i.e., full clips) are presented during the review and edit process. In another embodiment, the user can decide whether segments that are less than Y seconds in duration should be uploaded to the server.
The video captured by the device 20 running CROWDFLIK 10 is saved on the device 20 in its entirety in the conventional manner that is determined by the device. In other words, as the video is processed and submitted by the CROWDFLIK app 10, it is simultaneously saved on device 20 in an unprocessed and uncut form according to the host device standards. The various video segments captured via the CROWDFLIK app 10 from the various users are then grouped and synchronized according to location/time/event tags on the CROWDFLIK Review/Edit platform for subsequent multiple user edit, review, sharing and saving. In the alternative, CROWDFLIK app 10 permits the user to opt out of saving all video to device 20 in the settings function if the user wants to upload only to CROWFLIK app 10 and server 30.
A user can capture video of any length via the CROWDFLIK app 10 for submission to the CROWDFLIK Review/Edit platform. The CROWDFLIK app 10's unique location/time tagging functionality at the time of video capture allows for proper grouping and synchronization, and gives the user robust search functionality.
The CROWDFLIK app 10 allows multiple simultaneous videos from multiple devices to be captured and uploaded via the CROWDFLIK app 10 to be viewed and edited, and can be grouped according to the location/time tagging.
With reference to
The Review/Edit platform 500, which is illustrated in
Referring to
For example, if Y is 60 seconds and t is 5:00 p.m., segments 525, 530, and 535 would begin at 5:00 p.m., segments 526, 531, and 536 would begin at 5:01 pm, and segments 527, 532, and 537 would begin at 5:02 pm, etc. For illustration purposes, only three segments are shown for timeline 510 and video streams 515 and 520, but one of ordinary skill in the art will appreciate that segments of any number can be used.
A user can roll vertically through the available segments within each time slot of Y duration, and can select and place segments from any available video 515, and 520 and all additional video uploaded to the platform timeline 510, to create a user customized video of the event. Each segment selected from video 515, 520, or other video can only be placed in the timeline 510 at the location corresponding to the location on the timeline. For example, if segment 526 begins at 5:01 pm, only segments from video 515, 520, or other video that also begin at 5:01 pm can be placed in segment 526, in this example, segments 531 and 536. This ensures temporal synchronization from all video streams.
Review/Edit Platform 500 optionally can include input devices 591, 592, 593, and others to perform certain functions. For example, input device 591, when selected, can generate a screen shot capture of a shot within a video 510, 515, or 520 and allow the user to then download (for example, as a JPEG file) or view a still image of that shot. This allows an easy way for a user to obtain a photo from a video.
A variation of the embodiment of
In one embodiment, all video uploaded to server 30 is tagged and marked for future organization on the Review/Edit Platform 500 or 600. CROWDFLIK app 10 and/or server 30 can restrict access to any uploaded video or part thereof based on location and time coding. For example if a performer decides to not allow his or her performance on the CROWDFLIK platform then all video from his or her performance can be blocked by server 30 from future use. Also, content that is inappropriate for certain audiences can be blocked from those users by server 30. For example, video with foul language can be restricted to users who are above age 13, etc.
The CROWDFLIK app 10 can be set to allow Y second(s) to be a constant interval throughout a particular captured and edited event or can be set to alternate synchronized values. For example for a wedding event, that is captured and edited via CROWDFLIK, the Y value may be 10 seconds. In this wedding example, each segment is cut into 10 seconds based on the app's master clock. Alternatively, for a sporting event, the Y may be a repeating pattern of 10 seconds/5 seconds/10 seconds/5 seconds. In either or any case, the length of each segment (Y) is applied across all captured video presented on Review/Edit platform 500 for each and all of the multiple sources at that location/time event.
The CROWDFLIK app “cuts” all video captured via the CROWDFLIK app synchronized by the app's master clock to Y second(s) per segment. At step 210 of
To facilitate the user friendly tap to select or drag and drop editing, the start and end of shorter videos will preferably correspond to a start and end of the Y second(s) segments respectively. An example of this could be selecting a user's edit of a single song from a concert event consisting of many songs or a single series of plays from an entire football game. This mini-edit may also be saved, shared, viewed and re-edited by users who are signed in to their accounts.
The CROWDFLIK app allows for time linear resynchronization or assembly of video segments as well as non-time linear video editing. For example, the user can be permitted to select any submitted video segment and place it anywhere in the video timeline of the user's edit. If a user chooses to drag a video segment into the timeline of his video edit he may do so in any order that he chooses. This allows for video creation and viewing of nonlinear video such as multiple segments from the same Y time slot.
The CROWDFLIK app 10 supports sponsored events via its unique business model whereby the app serves as the connector between fan captured, uploaded, edited, and shared video of an event and the message of the event sponsor venue, act, or other user directed commercial or community based message.
Optionally, the CROWDFLIK app 10's use provides for a unique crowd curated video of each event. As each signed-in user makes, creates and saves a personal edit using one of the systems described previously for
This aspect of the embodiments is shown in
In one embodiment, a user may review and edit video on the CROWDFLIK Review/Edit platform as a guest who has not signed in under an account USER ID. In order to encourage user registration, if a user views and edits as a guest (e.g., not signed in or registered), that user may be provided with limited functionality. For example, the user may be precluded from saving edits for future viewing or the edit selections may be ignored in the overall curation tally for best edit or crowd choice.
In one embodiment, the CROWDFLIK app can allow for the automated insertion of sponsor messages at a pre-determined time slot in between certain Y second(s) segments along the time-line of the edited versions of videos as well as the insertion of pre-roll advertising video or other message from a sponsor, advertiser, or other source.
When a user selects CROWDFLIK through which to capture video, the user is prompted to accept/confirm location and time/date. This is to assure that when the CROWDFLIK app submits the user's captured video, it is correctly submitted based on its time/location (e.g., event) characteristics. However, the entry of the time/location information can be performed either prior to, during, or after, video capture. For example, the CROWDFLIK app allows for registered users to capture video prior to confirming location or joining an event, and will prompt the user to select event/location after the capture of video via the CROWDFLIK app.
In one embodiment, the CROWDFLIK app includes algorithms to assess location and time sameness among various submissions. The CROWDFLIK app can also determine if other users are in the vicinity of a signed in user. In a further embodiment, the CROWDFLIK app can notify a user upon location confirmation of nearby friends and/or other CROWDFLIK users.
When the user captures and submits video via CROWDFLIK app 10, the video is also saved on the smart phone camera roll of device 20 just as it would be if it were not captured through CROWDFLIK app 10. The saved video is not cut or altered by CROWDFLIK app 10. CROWDFLIK app 10 allows a user to review each captured video segment and decide or confirm to upload to server 30.
Preferably, the CROWDFLIK app uploads a thumbnail of each video segment as well as the user id of the capturer for easier user identification and review. In one embodiment, the CROWDFLIK app uses the Refactor Video Upload service, or another upload service or protocol, to ensure that the user Id and event Id provided by a user represent real data, and limits the creation of a video record to only occur after a video file and thumbnail was uploaded.
When a user creates an edit in the Review/Edit function, the user is able to attach tags 809 to further define the edit or content for later search purposes, as shown in
CROWDFLIK is a unique video capture and edit mobile and Internet platform that allows multiple users to submit video, and to create unique edits from the aggregate of multiple submitted videos; the CROWDFLIK app achieves this via synchronized tagging with location/time/event stamp at the time of capture, which assures that all video posted to the CROWDFLIK Review/Edit platform for subsequent review and edit is video from matched events based on location, and is synchronized for edit based on the CROWDFLIK app master clock; the synchronized video is searchable based on the time/location tag. In one embodiment, users can select multiple events to be presented on the review/edit platform in order to create a collage-type video. This also allows users to combine multiple CROWDFLIK events of the same real-life event in order to have access to all video captured at that real-life event. If there are multiple CROWDFLIK events of the same real-life event, each of the event creators may agree via the CROWDFLIK app 10 to combine their events to reduce confusion. This might happen, for example, at an event that physically exceeds the established geofence. For example, the presidential inauguration often spans over a mile in physical distance. If the geofence is set for 100 yards, the server may allow the creation of multiple events corresponding to the single real-life event (the presidential inauguration).
The CROWDFLIK app 10 uses a variety of inputs and methods to determine the optimal length for segments of captured video to be cut into for synchronized review and edit at the CROWDFLIK Review/edit platform. This length “Y” synchronized review/edit. The value Y may be in repeating patterns of unequal time segments—such as 10-5-10-5-10-5 seconds, etc. or in single segment length throughout the capture—such as 15-15-15-15 seconds, etc. The CROWDFLIK method of cutting submitted video into Y second(s) pieces allows for a simple and powerful process to create, view, save and share multiple edits based on the selection of a preferred submission for each of the Y second(s) time slots which then seamlessly re-synchronize back together to create a professional quality video consisting of multiple sequential clips of video pieces of lengths of Y second(s).
The CROWDFLIK app 10 tallies the aggregate of the multiple users' selections of segments to create their own edit, which results in a top ranked submission for each Y second(s) time slot. The aggregate of the most selected segments determines the best-of edit as curated by the crowd. To prevent and/or limit gaming, the CROWDFLIK app 10 applies certain methods and analysis to the curation process to determine the best-of edit. The vote tally may change as additional users create edits which will result in the best-of edit to change over time.
The CROWDFLIK app allows for unique sharing and posting of unique edits of videos created from submissions from multiple users and edited by a single user or multiple users.
Another aspect of the embodiments described herein is shown in
Notably, a user of the systems of
As shown in
Further, the exemplary processing arrangement 1002 can be provided with or include an input/output arrangement 1014, which can include, for example, a wired network, a wireless network, the internet, an intranet, a data collection probe, a sensor, etc. As shown in
The foregoing merely illustrates the principles of the disclosure. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous systems, arrangements, and procedures which, although not explicitly shown or described herein, embody the principles of the disclosure and can be thus within the spirit and scope of the disclosure. Various different exemplary embodiments can be used together with one another, as well as interchangeably therewith, as should be understood by those having ordinary skill in the art. In addition, certain terms used in the present disclosure, including the specification, drawings and claims thereof, can be used synonymously in certain instances, including, but not limited to, for example, data and information. It should be understood that, while these words, and/or other words that can be synonymous to one another, can be used synonymously herein, that there can be instances when such words can be intended to not be used synonymously. Further, to the extent that the prior art knowledge has not been explicitly incorporated by reference herein above, it is explicitly incorporated herein in its entirety. All publications referenced are incorporated herein by reference in their entireties.
Claims
1. A method of video searching, comprising:
- obtaining a plurality of video clips from a plurality of devices, each video clip associated with one or more of time information indicating a time when the video clip was captured and location information indicating a location where the video clip was captured;
- performing image recognition on the plurality of video clips;
- receiving an input image; and
- identifying all clips in the plurality of video clips that match the input image.
2. The method of claim 1, wherein the image recognition comprises facial recognition and the input image comprises an image of a face.
3. The method of claim 2, further comprising indicating time information for all clips in the plurality of video clips that match the image of the face.
4. The method of claim 2, further comprising indicating location information for all clips in the plurality of video clips that match the image of the face.
5. The method of claim 2, further comprising:
- generating a video comprising all clips in the plurality of video clips that match the image of the face.
6. The method of claim 2, wherein the step of receiving an image of a face comprises receiving a photograph.
7. The method of claim 6, further comprising:
- generating a video comprising all clips in the plurality of video clips that match the image of the face.
8. The method of claim 65, further comprising indicating time information for all clips in the plurality of video clips that match the image of the face.
9. The method of claim 8, further comprising:
- generating a video comprising all clips in the plurality of video clips that match the image of the face.
10. The method of claim 6, further comprising indicating location information for all clips in the plurality of video clips that match the image of the face.
11. The method of claim 10, further comprising:
- generating a video comprising all clips in the plurality of video clips that match the image of the face.
12. A method of audio searching, comprising:
- obtaining a plurality of audio clips from a plurality of devices, each audio clip associated with one or more of time information indicating a time when the audio clip was captured and location information indicating a location where the audio clip was captured;
- performing audio recognition on the plurality of audio clips;
- receiving input audio; and
- identifying all clips in the plurality of audio clips that match the input audio.
13. The method of claim 12, wherein the audio recognition comprises voice recognition and the input audio comprises a voice recording.
14. The method of claim 13, further comprising indicating time information for all clips in the plurality of audio clips that match the voice in the voice recording.
15. The method of claim 14, further comprising:
- generating audio comprising all clips in the plurality of audio clips that match the voice in the voice recording.
16. The method of claim 13, further comprising indicating location information for all clips in the plurality of audio clips that match the voice in the voice recording.
17. The method of claim 16, further comprising:
- generating audio comprising all clips in the plurality of audio clips that match the voice in the voice recording.
Type: Application
Filed: Aug 23, 2021
Publication Date: Dec 9, 2021
Applicant:
Inventor: Christopher Hamer (Westport, CT)
Application Number: 17/409,532