Patents by Inventor Yonit Hoffman
Yonit Hoffman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250131035Abstract: Systems and methods for spatial-textual clustering-based recognition of text in videos are disclosed. A method includes performing textual clustering on a first subset of a set of predictions that correspond to numeric characters only and performing spatial-textual clustering on a second subset of the set of predictions that correspond to alphabetical characters only. The method includes, for each cluster of predictions associated with the first subset of the set of predictions, choosing a first cluster representative to correct any errors in each cluster of predictions associated with the first subset of the set of predictions and outputting any recognized numeric characters. The method includes, for each cluster of predictions associated with the second subset of the set of predictions, choosing a second cluster representative to correct any errors in each cluster of predictions associated with the second subset of the set of predictions and outputting any recognized alphabetical characters.Type: ApplicationFiled: October 9, 2024Publication date: April 24, 2025Applicant: Microsoft Technology Licensing, LLCInventors: Yonit HOFFMAN, Maayan YEDIDIA, Avner LEVI
-
Patent number: 12272377Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.Type: GrantFiled: March 5, 2024Date of Patent: April 8, 2025Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
-
Patent number: 12266175Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.Type: GrantFiled: December 29, 2022Date of Patent: April 1, 2025Assignee: Microsoft Technology Licensing, LLCInventors: Yonit Hoffman, Mordechai Kadosh, Zvi Figov, Eliyahu Strugo, Mattan Serry, Michael Ben-Haym
-
Publication number: 20250095319Abstract: The technology relates to methods and systems for performing two-stage suppression of bounding boxes generated during object detection techniques for digital images. The two-stage suppression includes a per-class suppression stage and a class-agnostic suppression stage. In an example method, preliminary bounding boxes are generated for multiple objects in a digital image. A first subset of bounding boxes is selected by performing a per-class suppression of the preliminary bounding boxes. A second subset of bounding boxes is selected by performing a class-agnostic suppression of the first subset of bounding boxes. Based on the second subset of bounding boxes, at least one of an enriched image or a video index is generated.Type: ApplicationFiled: March 12, 2024Publication date: March 20, 2025Applicant: Microsoft Technology Licensing, LLCInventors: Shay AMRAM, Moti KADOSH, Yonit HOFFMAN, Zvi FIGOV
-
Publication number: 20250095161Abstract: Examples of the present disclosure describe systems and methods for track aware object detection. In examples, image content comprising one or more objects is received. Frames in the image content are identified. Candidate bounding boxes are created around objects to be tracked in the frames and a confidence score is assigned to each candidate bounding box. The candidate bounding boxes for each object are compared to a predicted bounding box that is generated based on a current track for the object. Candidate bounding boxes that are determined to be similar to the predicted bounding box and/or that exceed a confidence score threshold are selected. The selected candidate bounding boxes are filtered until a single candidate bounding box that is most representative of each object to be tracked remains. The frame comprising the representative bounding box for each object is then added to a current track for the object.Type: ApplicationFiled: December 29, 2023Publication date: March 20, 2025Applicant: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Shay AMRAM, Zvi FIGOV, Moti KADOSH, Yonit HOFFMAN
-
Publication number: 20240420342Abstract: An object tracking tool integrates scene transition detection and/or dynamic queue resizing. By integrating shot transition detection, the object tracking tool can change which operations are performed depending on whether a shot transition has been detected. For example, if a shot transition is not detected, lower-complexity interpolation operations can be performed to determine spatial information for objects, instead of using higher-complexity object detection operations, which can reduce computational complexity. As another example, depending on whether a shot transition has been detected, the object tracking tool can adjust operations performed when associating identifiers with objects, which can improve accuracy of object tracking operations. With dynamic queue resizing, an object tracking tool can selectively adjust the maximum size of a queue used to store frames for object tracking.Type: ApplicationFiled: June 13, 2023Publication date: December 19, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Zvi FIGOV, Yonit HOFFMAN, Moti KADOSH
-
Publication number: 20240420469Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.Type: ApplicationFiled: April 26, 2024Publication date: December 19, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
-
Patent number: 12141200Abstract: Systems and methods for spatial-textual clustering-based recognition of text in videos are disclosed. A method includes performing textual clustering on a first subset of a set of predictions that correspond to numeric characters only and performing spatial-textual clustering on a second subset of the set of predictions that correspond to alphabetical characters only. The method includes, for each cluster of predictions associated with the first subset of the set of predictions, choosing a first cluster representative to correct any errors in each cluster of predictions associated with the first subset of the set of predictions and outputting any recognized numeric characters. The method includes, for each cluster of predictions associated with the second subset of the set of predictions, choosing a second cluster representative to correct any errors in each cluster of predictions associated with the second subset of the set of predictions and outputting any recognized alphabetical characters.Type: GrantFiled: May 27, 2022Date of Patent: November 12, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yonit Hoffman, Maayan Yedidia, Avner Levi
-
Publication number: 20240370661Abstract: Multimedia content is summarized with the use of summary prompts that are created with audio and visual insights obtained from the multimedia content. An aggregated timeline temporally aligns the audio and visual insights. The aggregated timeline is segmented into coherent segments that each include a unique combination of audio and visual insights. These segments are grouped into chunks, based on prompt size constraints, and are used with identified summarization styles to create the summary prompts. The summary prompts are provided to summarization models to obtain summaries having content and summarization styles based on the summary prompts.Type: ApplicationFiled: June 9, 2023Publication date: November 7, 2024Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA DOTAN, Oron NIR
-
Publication number: 20240363139Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.Type: ApplicationFiled: March 5, 2024Publication date: October 31, 2024Inventors: Lihi Ahuva SHILOH PERL, Ben FISHMAN, Gilad PUNDAK, Yonit HOFFMAN
-
Publication number: 20240303442Abstract: Systems and method are provided for extracting and processing terms referenced in multimedia content with the use of different term extraction models to determine the relevance of categories to the referenced terms and to rank the categories by relative dominance for the multimedia content. The most dominant category for the multimedia content and/or particular segment(s) of the multimedia content can then be identified and used to link to supplemental content and/or to identify multimedia content related to topics of interest.Type: ApplicationFiled: May 17, 2023Publication date: September 12, 2024Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA
-
Publication number: 20240221379Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.Type: ApplicationFiled: December 29, 2022Publication date: July 4, 2024Inventors: Yonit HOFFMAN, Mordechai KADOSH, Zvi FIGOV, Eliyahu STRUGO, Mattan SERRY, Michael BEN-HAYM
-
Patent number: 12026200Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.Type: GrantFiled: July 14, 2022Date of Patent: July 2, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yonit Hoffman, Tom Hirshberg, Maayan Yedidia, Zvi Figov
-
Patent number: 12020483Abstract: A system for indexing animated content receives detections extracted from a media file, where each one of the detections includes an image extracted from a corresponding frame of the media file that corresponds to a detected instance of an animated character. The system determines, for each of the received detections, an embedding defining a set of characteristics for the detected instance. The embedding associated with each detection is provided to a grouping engine that is configured to dynamically configure at least one grouping parameter based on a total number of the detections received. The grouping engine is also configured to sort the detections into groups using the grouping parameter and the embedding for each detection. A character ID is assigned to each one of the groups of detections, and the system indexes the groups of detections in a database in association with the character ID assigned to each group.Type: GrantFiled: August 26, 2022Date of Patent: June 25, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Yonit Hoffman, Irit Ofer, Avner Levi, Haim Sabo, Reut Amior
-
Publication number: 20240203146Abstract: Systems and methods for detecting text in videos. To address problems with conventional Optical Character Recognition (OCR) systems, the present disclosure provides detection of text for improved OCR. Aspects of the present disclosure can, therefore, be utilized to detect a textual logo in videos, including when the text of the textual logo is clearly visible and when the text is inferred. Thus, examples capture appearance time of a textual logo from a video view perspective. Aspects use a multi-threshold pipeline for detecting video frames including the textual logo. A textual-visual scoring system is additionally used to leverage visual aspects of text in logos. A shot detection system is used to detect inferred text beyond a detected video frame. One or more verification models can be further applied.Type: ApplicationFiled: December 16, 2022Publication date: June 20, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Eliyahu Strugo, Yonit Hoffman
-
Patent number: 11995892Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.Type: GrantFiled: May 26, 2022Date of Patent: May 28, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Mattan Serry, Zvi Figov, Yonit Hoffman, Maayan Yedidia
-
Patent number: 11948599Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.Type: GrantFiled: January 6, 2022Date of Patent: April 2, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
-
Publication number: 20240020338Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.Type: ApplicationFiled: July 14, 2022Publication date: January 18, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Yonit HOFFMAN, Tom HIRSHBERG, Maayan YEDIDIA, Zvi FIGOV
-
Publication number: 20230419663Abstract: Examples of the present disclosure describe systems and methods for video genre classification. In one example implementation, video content is received. A plurality of sliding windows of the video content is sampled. The plurality of sliding windows comprises audio data and video data. The audio data is analyzed to identify a set of audio features. The video data is analyzed to identify a set of video features. The set of audio features and the set of video features is provided to a classifier. The classifier is configured to detect a genre for the video content using the set of audio features and the set of video features. The video content is indexed based on the genre.Type: ApplicationFiled: June 27, 2022Publication date: December 28, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Oron NIR, Mattan SERRY, Yonit HOFFMAN, Michael BEN-HAYM, Zvi FIGOV, Eliyahu STRUGO, Avi NEEMAN
-
Publication number: 20230316753Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.Type: ApplicationFiled: May 26, 2022Publication date: October 5, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA