Patents by Inventor Zvi Figov
Zvi Figov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12266175Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.Type: GrantFiled: December 29, 2022Date of Patent: April 1, 2025Assignee: Microsoft Technology Licensing, LLCInventors: Yonit Hoffman, Mordechai Kadosh, Zvi Figov, Eliyahu Strugo, Mattan Serry, Michael Ben-Haym
-
Publication number: 20250095319Abstract: The technology relates to methods and systems for performing two-stage suppression of bounding boxes generated during object detection techniques for digital images. The two-stage suppression includes a per-class suppression stage and a class-agnostic suppression stage. In an example method, preliminary bounding boxes are generated for multiple objects in a digital image. A first subset of bounding boxes is selected by performing a per-class suppression of the preliminary bounding boxes. A second subset of bounding boxes is selected by performing a class-agnostic suppression of the first subset of bounding boxes. Based on the second subset of bounding boxes, at least one of an enriched image or a video index is generated.Type: ApplicationFiled: March 12, 2024Publication date: March 20, 2025Applicant: Microsoft Technology Licensing, LLCInventors: Shay AMRAM, Moti KADOSH, Yonit HOFFMAN, Zvi FIGOV
-
Publication number: 20250095161Abstract: Examples of the present disclosure describe systems and methods for track aware object detection. In examples, image content comprising one or more objects is received. Frames in the image content are identified. Candidate bounding boxes are created around objects to be tracked in the frames and a confidence score is assigned to each candidate bounding box. The candidate bounding boxes for each object are compared to a predicted bounding box that is generated based on a current track for the object. Candidate bounding boxes that are determined to be similar to the predicted bounding box and/or that exceed a confidence score threshold are selected. The selected candidate bounding boxes are filtered until a single candidate bounding box that is most representative of each object to be tracked remains. The frame comprising the representative bounding box for each object is then added to a current track for the object.Type: ApplicationFiled: December 29, 2023Publication date: March 20, 2025Applicant: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Shay AMRAM, Zvi FIGOV, Moti KADOSH, Yonit HOFFMAN
-
Patent number: 12190867Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.Type: GrantFiled: May 31, 2022Date of Patent: January 7, 2025Assignee: Microsoft Technology Licensing, LLCInventor: Zvi Figov
-
Publication number: 20240420469Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.Type: ApplicationFiled: April 26, 2024Publication date: December 19, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
-
Publication number: 20240420342Abstract: An object tracking tool integrates scene transition detection and/or dynamic queue resizing. By integrating shot transition detection, the object tracking tool can change which operations are performed depending on whether a shot transition has been detected. For example, if a shot transition is not detected, lower-complexity interpolation operations can be performed to determine spatial information for objects, instead of using higher-complexity object detection operations, which can reduce computational complexity. As another example, depending on whether a shot transition has been detected, the object tracking tool can adjust operations performed when associating identifiers with objects, which can improve accuracy of object tracking operations. With dynamic queue resizing, an object tracking tool can selectively adjust the maximum size of a queue used to store frames for object tracking.Type: ApplicationFiled: June 13, 2023Publication date: December 19, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Zvi FIGOV, Yonit HOFFMAN, Moti KADOSH
-
Publication number: 20240370661Abstract: Multimedia content is summarized with the use of summary prompts that are created with audio and visual insights obtained from the multimedia content. An aggregated timeline temporally aligns the audio and visual insights. The aggregated timeline is segmented into coherent segments that each include a unique combination of audio and visual insights. These segments are grouped into chunks, based on prompt size constraints, and are used with identified summarization styles to create the summary prompts. The summary prompts are provided to summarization models to obtain summaries having content and summarization styles based on the summary prompts.Type: ApplicationFiled: June 9, 2023Publication date: November 7, 2024Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA DOTAN, Oron NIR
-
Publication number: 20240303442Abstract: Systems and method are provided for extracting and processing terms referenced in multimedia content with the use of different term extraction models to determine the relevance of categories to the referenced terms and to rank the categories by relative dominance for the multimedia content. The most dominant category for the multimedia content and/or particular segment(s) of the multimedia content can then be identified and used to link to supplemental content and/or to identify multimedia content related to topics of interest.Type: ApplicationFiled: May 17, 2023Publication date: September 12, 2024Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA
-
Publication number: 20240257517Abstract: A tagging system gathers all events (tagged and untagged) generated by remote sensors at a location or facility over time. Based on the gathered events the tagging system uses machine learning to train a model to learn the sensor layout of a facility or location and the timing between the triggering of sensors. Once trained, the model can predict the movement and location of individuals and objects throughout the facility based on a starting tagged event. Given a series of tagged and untagged events, the system can use the movement predictions of the model to tag the untagged events in the series with the identification of an individual or object that triggered the generation of the untagged event.Type: ApplicationFiled: January 30, 2023Publication date: August 1, 2024Applicant: Verint Americas Inc.Inventors: Michael Sutton, Zvi Figov, Nir Naor
-
Publication number: 20240221379Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.Type: ApplicationFiled: December 29, 2022Publication date: July 4, 2024Inventors: Yonit HOFFMAN, Mordechai KADOSH, Zvi FIGOV, Eliyahu STRUGO, Mattan SERRY, Michael BEN-HAYM
-
Patent number: 12026200Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.Type: GrantFiled: July 14, 2022Date of Patent: July 2, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yonit Hoffman, Tom Hirshberg, Maayan Yedidia, Zvi Figov
-
Publication number: 20240177514Abstract: A system learns the structure of a form. The structure of the form can be learned from a single image (e.g., a photograph that includes the form) without user annotation. The form includes typewritten and handwritten text entries. The system groups text entries in the form based on lines detected in the form. The system then measures a distance and an angle between two text entry locations in the group of text entries. The group of text entries, the distances, and the angles can be captured in a bipartite graph. The bipartite graph represents possible pairing solutions where a typewritten text entry is paired with a handwritten text entry. The system identifies an optimal pairing solution, from the possible pairing solutions, using the distances and angles. The optimal pairing solution is identified by minimizing the standard deviation of the distances and/or by minimizing the circular standard deviation of the angles.Type: ApplicationFiled: November 29, 2022Publication date: May 30, 2024Inventors: Mattan SERRY, Zvi FIGOV
-
Patent number: 11995892Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.Type: GrantFiled: May 26, 2022Date of Patent: May 28, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Mattan Serry, Zvi Figov, Yonit Hoffman, Maayan Yedidia
-
Publication number: 20240020338Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.Type: ApplicationFiled: July 14, 2022Publication date: January 18, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Yonit HOFFMAN, Tom HIRSHBERG, Maayan YEDIDIA, Zvi FIGOV
-
Publication number: 20230419663Abstract: Examples of the present disclosure describe systems and methods for video genre classification. In one example implementation, video content is received. A plurality of sliding windows of the video content is sampled. The plurality of sliding windows comprises audio data and video data. The audio data is analyzed to identify a set of audio features. The video data is analyzed to identify a set of video features. The set of audio features and the set of video features is provided to a classifier. The classifier is configured to detect a genre for the video content using the set of audio features and the set of video features. The video content is indexed based on the genre.Type: ApplicationFiled: June 27, 2022Publication date: December 28, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Oron NIR, Mattan SERRY, Yonit HOFFMAN, Michael BEN-HAYM, Zvi FIGOV, Eliyahu STRUGO, Avi NEEMAN
-
Publication number: 20230409654Abstract: Examples of the present disclosure describe systems and methods for on-device, in-browser AI processing. In examples, a selection of an AI pipeline is received. Content associated with the AI pipeline is also received. The content is segmented into multiple data segments and a set of data features is generated for the data segments. AI modules associated with the AI pipeline are loaded to create the AI pipeline. The set of data features is provided to the AI pipeline. The AI pipeline is executed to generate insights for the set of data features. The insights are then provided to a user.Type: ApplicationFiled: June 21, 2022Publication date: December 21, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Ori ZIV, Barak KINARTI, Ben BAKHAR, Zvi FIGOV, Fardau VAN NEERDEN, Ohad JASSIN, Avi NEEMAN
-
Publication number: 20230343329Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.Type: ApplicationFiled: May 31, 2022Publication date: October 26, 2023Applicant: Microsoft Technology Licensing, LLCInventor: Zvi FIGOV
-
Publication number: 20230316753Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.Type: ApplicationFiled: May 26, 2022Publication date: October 5, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
-
Patent number: 11755643Abstract: A video indexing system identifies groups of frames within a video frame sequence captured by a static camera during a same scene. Context metadata is generated for each frame in each group based on an analysis of fewer than all frames in the group. The frames are indexed in a database in association with the generated context metadata.Type: GrantFiled: July 6, 2020Date of Patent: September 12, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Zvi Figov, Irit Ofer
-
Publication number: 20230093385Abstract: A computer-implemented method of accounting for visibility of a first attribute of one or more attributes associable with an object presented in an image is provided. The method includes inputting a training image of a first object into an attribute identification machine learning model, the training image being associated with labeled visibility data indicating whether the first attribute is visible in the inputted training image, generating, based on the inputted training image, visibility prediction data representing a prediction by the attribute identification machine learning model as to whether the first attribute is predicted to be visible in the inputted training image, comparing the generated visibility prediction data with labeled visibility data, and modifying the attribute identification machine learning model based on the comparison of the generated visibility prediction data and the labeled visibility data.Type: ApplicationFiled: September 17, 2021Publication date: March 23, 2023Inventors: Zvi FIGOV, Mattan SERRY