Patents by Inventor Zvi Figov

Zvi Figov has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Keyword Detection for Audio Content

Publication number: 20250218429

Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.

Type: Application

Filed: November 27, 2024

Publication date: July 3, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventor: Zvi FIGOV
Combining visual and audio insights to detect opening scenes in multimedia files

Patent number: 12266175

Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.

Type: Grant

Filed: December 29, 2022

Date of Patent: April 1, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yonit Hoffman, Mordechai Kadosh, Zvi Figov, Eliyahu Strugo, Mattan Serry, Michael Ben-Haym
Two-Stage Suppression for Multi-Class, Multi-Object Detection and Tracking Systems

Publication number: 20250095319

Abstract: The technology relates to methods and systems for performing two-stage suppression of bounding boxes generated during object detection techniques for digital images. The two-stage suppression includes a per-class suppression stage and a class-agnostic suppression stage. In an example method, preliminary bounding boxes are generated for multiple objects in a digital image. A first subset of bounding boxes is selected by performing a per-class suppression of the preliminary bounding boxes. A second subset of bounding boxes is selected by performing a class-agnostic suppression of the first subset of bounding boxes. Based on the second subset of bounding boxes, at least one of an enriched image or a video index is generated.

Type: Application

Filed: March 12, 2024

Publication date: March 20, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Shay AMRAM, Moti KADOSH, Yonit HOFFMAN, Zvi FIGOV
TRACK AWARE DETECTION FOR OBJECT TRACKING SYSTEMS

Publication number: 20250095161

Abstract: Examples of the present disclosure describe systems and methods for track aware object detection. In examples, image content comprising one or more objects is received. Frames in the image content are identified. Candidate bounding boxes are created around objects to be tracked in the frames and a confidence score is assigned to each candidate bounding box. The candidate bounding boxes for each object are compared to a predicted bounding box that is generated based on a current track for the object. Candidate bounding boxes that are determined to be similar to the predicted bounding box and/or that exceed a confidence score threshold are selected. The selected candidate bounding boxes are filtered until a single candidate bounding box that is most representative of each object to be tracked remains. The frame comprising the representative bounding box for each object is then added to a current track for the object.

Type: Application

Filed: December 29, 2023

Publication date: March 20, 2025

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Shay AMRAM, Zvi FIGOV, Moti KADOSH, Yonit HOFFMAN
Keyword detection for audio content

Patent number: 12190867

Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.

Type: Grant

Filed: May 31, 2022

Date of Patent: January 7, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventor: Zvi Figov
OBJECT TRACKING WITH SHOT TRANSITION DETECTION AND DYNAMIC QUEUE RESIZING

Publication number: 20240420342

Abstract: An object tracking tool integrates scene transition detection and/or dynamic queue resizing. By integrating shot transition detection, the object tracking tool can change which operations are performed depending on whether a shot transition has been detected. For example, if a shot transition is not detected, lower-complexity interpolation operations can be performed to determine spatial information for objects, instead of using higher-complexity object detection operations, which can reduce computational complexity. As another example, depending on whether a shot transition has been detected, the object tracking tool can adjust operations performed when associating identifiers with objects, which can improve accuracy of object tracking operations. With dynamic queue resizing, an object tracking tool can selectively adjust the maximum size of a queue used to store frames for object tracking.

Type: Application

Filed: June 13, 2023

Publication date: December 19, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Zvi FIGOV, Yonit HOFFMAN, Moti KADOSH
TEXTLESS MATERIAL SCENE MATCHING IN VIDEOS

Publication number: 20240420469

Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.

Type: Application

Filed: April 26, 2024

Publication date: December 19, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
GENERATING SUMMARY PROMPTS WITH VISUAL AND AUDIO INSIGHTS AND USING SUMMARY PROMPTS TO OBTAIN MULTIMEDIA CONTENT SUMMARIES

Publication number: 20240370661

Abstract: Multimedia content is summarized with the use of summary prompts that are created with audio and visual insights obtained from the multimedia content. An aggregated timeline temporally aligns the audio and visual insights. The aggregated timeline is segmented into coherent segments that each include a unique combination of audio and visual insights. These segments are grouped into chunks, based on prompt size constraints, and are used with identified summarization styles to create the summary prompts. The summary prompts are provided to summarization models to obtain summaries having content and summarization styles based on the summary prompts.

Type: Application

Filed: June 9, 2023

Publication date: November 7, 2024

Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA DOTAN, Oron NIR
NATURAL LANGUAGE PROCESSING BASED DOMINANT ITEM DETECTION IN VIDEOS

Publication number: 20240303442

Abstract: Systems and method are provided for extracting and processing terms referenced in multimedia content with the use of different term extraction models to determine the relevance of categories to the referenced terms and to rank the categories by relative dominance for the multimedia content. The most dominant category for the multimedia content and/or particular segment(s) of the multimedia content can then be identified and used to link to supplemental content and/or to identify multimedia content related to topics of interest.

Type: Application

Filed: May 17, 2023

Publication date: September 12, 2024

Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA
SYSTEM AND METHOD FOR TAGGING UNTAGGED EVENTS

Publication number: 20240257517

Abstract: A tagging system gathers all events (tagged and untagged) generated by remote sensors at a location or facility over time. Based on the gathered events the tagging system uses machine learning to train a model to learn the sensor layout of a facility or location and the timing between the triggering of sensors. Once trained, the model can predict the movement and location of individuals and objects throughout the facility based on a starting tagged event. Given a series of tagged and untagged events, the system can use the movement predictions of the model to tag the untagged events in the series with the identification of an individual or object that triggered the generation of the untagged event.

Type: Application

Filed: January 30, 2023

Publication date: August 1, 2024

Applicant: Verint Americas Inc.

Inventors: Michael Sutton, Zvi Figov, Nir Naor
COMBINING VISUAL AND AUDIO INSIGHTS TO DETECT OPENING SCENES IN MULTIMEDIA FILES

Publication number: 20240221379

Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.

Type: Application

Filed: December 29, 2022

Publication date: July 4, 2024

Inventors: Yonit HOFFMAN, Mordechai KADOSH, Zvi FIGOV, Eliyahu STRUGO, Mattan SERRY, Michael BEN-HAYM
Detecting prominence of objects in video information

Patent number: 12026200

Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.

Type: Grant

Filed: July 14, 2022

Date of Patent: July 2, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yonit Hoffman, Tom Hirshberg, Maayan Yedidia, Zvi Figov
LEARNING A FORM STRUCTURE

Publication number: 20240177514

Abstract: A system learns the structure of a form. The structure of the form can be learned from a single image (e.g., a photograph that includes the form) without user annotation. The form includes typewritten and handwritten text entries. The system groups text entries in the form based on lines detected in the form. The system then measures a distance and an angle between two text entry locations in the group of text entries. The group of text entries, the distances, and the angles can be captured in a bipartite graph. The bipartite graph represents possible pairing solutions where a typewritten text entry is paired with a handwritten text entry. The system identifies an optimal pairing solution, from the possible pairing solutions, using the distances and angles. The optimal pairing solution is identified by minimizing the standard deviation of the distances and/or by minimizing the circular standard deviation of the angles.

Type: Application

Filed: November 29, 2022

Publication date: May 30, 2024

Inventors: Mattan SERRY, Zvi FIGOV
Textless material scene matching in videos

Patent number: 11995892

Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.

Type: Grant

Filed: May 26, 2022

Date of Patent: May 28, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Mattan Serry, Zvi Figov, Yonit Hoffman, Maayan Yedidia
Detecting Prominence of Objects in Video Information

Publication number: 20240020338

Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.

Type: Application

Filed: July 14, 2022

Publication date: January 18, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yonit HOFFMAN, Tom HIRSHBERG, Maayan YEDIDIA, Zvi FIGOV
Systems and Methods for Video Genre Classification

Publication number: 20230419663

Abstract: Examples of the present disclosure describe systems and methods for video genre classification. In one example implementation, video content is received. A plurality of sliding windows of the video content is sampled. The plurality of sliding windows comprises audio data and video data. The audio data is analyzed to identify a set of audio features. The video data is analyzed to identify a set of video features. The set of audio features and the set of video features is provided to a classifier. The classifier is configured to detect a genre for the video content using the set of audio features and the set of video features. The video content is indexed based on the genre.

Type: Application

Filed: June 27, 2022

Publication date: December 28, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Oron NIR, Mattan SERRY, Yonit HOFFMAN, Michael BEN-HAYM, Zvi FIGOV, Eliyahu STRUGO, Avi NEEMAN
On-Device Artificial Intelligence Processing In-Browser

Publication number: 20230409654

Abstract: Examples of the present disclosure describe systems and methods for on-device, in-browser AI processing. In examples, a selection of an AI pipeline is received. Content associated with the AI pipeline is also received. The content is segmented into multiple data segments and a set of data features is generated for the data segments. AI modules associated with the AI pipeline are loaded to create the AI pipeline. The set of data features is provided to the AI pipeline. The AI pipeline is executed to generate insights for the set of data features. The insights are then provided to a user.

Type: Application

Filed: June 21, 2022

Publication date: December 21, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Ori ZIV, Barak KINARTI, Ben BAKHAR, Zvi FIGOV, Fardau VAN NEERDEN, Ohad JASSIN, Avi NEEMAN
Keyword Detection for Audio Content

Publication number: 20230343329

Abstract: Examples of the present disclosure describe improved systems and methods for detecting keywords in audio content. In one example implementation, audio content is segmented into one or more audio segments. One or more text segments is generated, each text segment corresponding to each of the audio segments. For each text segment, one or more phrase candidate values is generated using a textual analysis, and one or more sentence embedding values is generated using a sentence embedding analysis. Next, an average sentence embedding value is calculated using the one or more sentence embedding values. Each of the one or more phrase candidate values is compared to the average sentence embedding value. Each phrase candidate value having a comparison value above a threshold value is labeled as representing a keyword.

Type: Application

Filed: May 31, 2022

Publication date: October 26, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventor: Zvi FIGOV
TEXTLESS MATERIAL SCENE MATCHING IN VIDEOS

Publication number: 20230316753

Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.

Type: Application

Filed: May 26, 2022

Publication date: October 5, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
Metadata generation for video indexing

Patent number: 11755643

Abstract: A video indexing system identifies groups of frames within a video frame sequence captured by a static camera during a same scene. Context metadata is generated for each frame in each group based on an analysis of fewer than all frames in the group. The frames are indexed in a database in association with the generated context metadata.

Type: Grant

Filed: July 6, 2020

Date of Patent: September 12, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zvi Figov, Irit Ofer

1 2 3 next