Patents by Inventor Vanja Josifovski

Vanja Josifovski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generating and applying a trained structured machine learning model for determining a semantic label for content of a transient segment of a communication

Patent number: 10540610

Abstract: Methods, apparatus, and computer-readable media are provided for analyzing a cluster of communications, such as B2C emails, to generate a template for the cluster that defines transient segments and fixed segments of the cluster of communications. More particularly, methods, apparatus, and computer-readable media are provided for generating and/or applying a trained structured machine learning model for a generated template that can be used to determine, for one or more transient segments of subsequent communications, a corresponding probability that a given semantic label is the correct semantic label for extracted content of the transient segment(s).

Type: Grant

Filed: April 27, 2016

Date of Patent: January 21, 2020

Assignee: GOOGLE LLC

Inventors: Jie Yang, Amr Ahmed, Luis Garcia Pueyo, Mike Bendersky, Amitabh Saikia, Marc-Allen Cartright, Marc Alexander Najork, MyLinh Yang, Hui Tan, Weinan Zhang, Vanja Josifovski, Alexander J. Smola
Generating and applying event data extraction templates

Patent number: 10360537

Abstract: Techniques are described herein for generating and applying event data extraction templates. In various implementations, a data extraction template may be applied to structured communications to extract, from each structured communication, event data associated with a transient markup language path indicated in the data extraction template. The data extraction template may include an event-related semantic data type assigned to the transient markup language path and a strength of association between the transient structural path and the event-related semantic data type. Feedback may be obtained concerning event data extracted from one or more of the structured communications. Based on the feedback, the strength of association between the transient markup language path and the event-related semantic data type may be altered.

Type: Grant

Filed: April 11, 2017

Date of Patent: July 23, 2019

Assignee: GOOGLE LLC

Inventors: Mike Bendersky, Maureen Heymans, Jinan Lou, Jie Yang, MyLinh Yang, Amitabh Saikia, Marc-Allen Cartright, Vanja Josifovski, Hui Tan, Luis Garcia Pueyo
Selecting pattern matching segments for electronic communication clustering

Patent number: 10216837

Abstract: Methods, apparatus, systems, and computer-readable media are provided for selecting pattern matching segments suitable for electronic communication clustering. A set of pattern matching segments may be identified that match at least one of a corpus of electronic communication addresses. A measure of coverage of each of the set of pattern matching segments across the corpus of electronic communication addresses may be determined. A score associated with each pattern matching segment may be determined based on the measure of coverage and one or more measures of flexibility associated with each of the set of pattern matching segments. One or more of the pattern matching segments may be selected based on the determine scores. A corpus of electronic communications may then be grouped into a plurality of clusters based on a comparison of the one or more selected pattern matching segments to electronic communication addresses associated with the corpus of electronic communications.

Type: Grant

Filed: December 29, 2014

Date of Patent: February 26, 2019

Assignee: GOOGLE LLC

Inventors: Amitabh Saikia, Marc-Allen Cartright, Luis Garcia Pueyo, Vanja Josifovski, Jie Yang, Mike Bendersky, MyLinh Yang
Generating and applying data extraction templates

Patent number: 10216838

Abstract: Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.

Type: Grant

Filed: December 29, 2016

Date of Patent: February 26, 2019

Assignee: Google LLC

Inventors: Luis Garcia Pueyo, Vanja Josifovski, Amitabh Saikia, Jie Yang, Mike Bendersky, Srinidhi Viswanatha, Marc-Allen Cartright
Method and apparatus for web ad matching

Patent number: 9824124

Abstract: A method and apparatus are provided for better web ad matching by combining relevance with consumer click feedback. In one example, the method includes receiving a query page, extracting features from the query page, re-weighting the query page, evaluating the query page in light of each ad in order to score each ad and pick substantially best ad matches of the indexed ads, and returning the substantially best ad matches to the consumer computer.

Type: Grant

Filed: December 12, 2013

Date of Patent: November 21, 2017

Assignee: Excalibur IP, LLC

Inventors: Deepayan Chakrabarti, Deepak K. Agrawal, Vanja Josifovski
AUTOMATIC GENERATION OF TEMPLATES FOR PARSING ELECTRONIC DOCUMENTS

Publication number: 20170308517

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a plurality of electronic documents, each electronic document being associated with an identifier that is associated with a source of the electronic document, grouping electronic documents of the plurality of electronic documents into a plurality of base sub-groups based on respective sources, for each base sub-group of the plurality of base sub-groups, automatically processing electronic documents to provide one or more templates, each template mapping content to one or more markers, and storing the one or more templates in memory, each template being accessible by one or more parsers to parse content from subsequently received electronic documents.

Type: Application

Filed: September 11, 2013

Publication date: October 26, 2017

Applicant: Google Inc.

Inventors: Vanja Josifovski, Srinidhi Viswanatha
Generating and applying data extraction templates

Patent number: 9785705

Abstract: Methods, apparatus, systems, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of plain text communications such as emails may be grouped into clusters based on one or more similarities between the plain text communications. One or more segments of communications of a particular cluster may be classified as transient based on textual pattern matching. One or more other segments of the communications of the particular cluster may be classified as transient based on various criteria. One or more transient segments may be assigned a generic and/or specific semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent plain text communications, content associated with transient (and in some cases, non-confidential) segments.

Type: Grant

Filed: October 16, 2014

Date of Patent: October 10, 2017

Assignee: GOOGLE INC.

Inventors: Marc-Allen Cartright, Luis Garcia Pueyo, Vanja Josifovski, Amitabh Saikia, Jie Yang, Mike Bendersky, MyLinh Yang
Information redaction from document data

Patent number: 9734148

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.

Type: Grant

Filed: October 21, 2014

Date of Patent: August 15, 2017

Assignee: Google Inc.

Inventors: Mike Bendersky, Vanja Josifovski, Amitabh Saikia, Marc-Allen Cartright, Jie Yang, Luis Garcia Pueyo, MyLinh Yang
Query evaluation using ancestor information

Patent number: 9659001

Abstract: Provided are techniques for processing a query. A query is received, wherein the query is formed by one or more paths, and wherein each path includes one or more steps. A hierarchical document including one or more document nodes is received. While processing the query and traversing the hierarchical document, one or more extraction entries are constructed, wherein each extraction entry includes a step instance match candidate identifying a document node and a step instance ancestor path for the document node, and one or more tuples are constructed using the one or more extraction entries by associating the step instance match candidate from one of the one or more extraction entries with the step instance match candidate from at least one of the one or more other extraction entries.

Type: Grant

Filed: June 2, 2015

Date of Patent: May 23, 2017

Assignee: International Business Machines Corporation

Inventors: Vanja Josifovski, Edison L. Ting
Generating and applying event data extraction templates

Patent number: 9652530

Abstract: Methods and apparatus are described herein for generating and applying event data extraction templates. In various implementations, a set of structural paths may be identified from a corpus of communications. A first structural path of the set of structural paths, associated with a first segment of text, may be classified as transient in response to a determination that a frequency of occurrences of the first segment of text across the corpus satisfies a criterion. Event heuristics may be applied to the communications of the corpus. A determination may be made, based on the applying, that the communications of the corpus are event-related. An event data type may be assigned to the transient structural path based on the applying. An event data extraction template may be generated to extract, from one or more subsequent communications, one or more event-related segments of text associated with the transient structural path.

Type: Grant

Filed: August 27, 2014

Date of Patent: May 16, 2017

Assignee: GOOGLE INC.

Inventors: Mike Bendersky, Maureen Heymans, Jinan Lou, Jie Yang, MyLinh Yang, Amitabh Saikia, Marc-Allen Cartright, Vanja Josifovski, Hui Tan, Luis Garcia Pueyo
Generating and applying data extraction templates

Patent number: 9563689

Abstract: Methods, apparatus, and computer-readable media are provided for generating and applying data extraction templates. In various implementations, a corpus of structured communications such as emails may be grouped into clusters based on one or more similarities between the structured communications. A set of structural paths may be identified from structured communications of a particular cluster. One or more structural paths of the set may be classified as transient wherein a count of occurrences of one or more associated segments of text across the particular cluster satisfies a criterion. One or more transient paths may be assigned a semantic data type and/or a confidentiality designation based on various signals. A data extraction template may be generated to extract, from subsequent structured communications, segments of text associated with transient (and in some cases, non-confidential) structural paths.

Type: Grant

Filed: August 27, 2014

Date of Patent: February 7, 2017

Assignee: Google Inc.

Inventors: Luis Garcia Pueyo, Vanja Josifovski, Amitabh Saikia, Jie Yang, Mike Bendersky, Srinidhi Viswanatha, Marc-Allen Cartright
CLASSIFYING DOCUMENTS BY CLUSTER

Publication number: 20160314184

Abstract: Methods, apparatus, systems, and computer-readable media are provided for classifying, or “labeling,” documents such as emails en masse based on association with a cluster/template. In various implementations, a corpus of documents may be grouped into a plurality of disjoint clusters of documents based on one or more shared content attributes. A classification distribution associated with a first cluster of the plurality of clusters may be determined based on classifications assigned to individual documents of the first cluster. A classification distribution associated with a second cluster of the plurality of clusters may then be determined based at least in part on the classification distribution associated with the first cluster and a relationship between the first and second clusters.

Type: Application

Filed: April 27, 2015

Publication date: October 27, 2016

Inventors: Mike Bendersky, Jie Yang, Amitabh Saikia, Marc-Allen Cartright, Sujith Ravi, Balint Miklos, Ivo Krka, Vanja Josifovski, James Wendt, Luis Garcia Pueyo
Publish-subscribe based methods and apparatuses for associating data files

Patent number: 9405846

Abstract: Various methods and apparatuses are provided which may be implemented using one or more computing devices within a networked computing environment to employ publish-subscribe techniques to associate subscriber encoded data files with a set of publisher encoded data files.

Type: Grant

Filed: November 15, 2011

Date of Patent: August 2, 2016

Assignee: Yahoo! Inc.

Inventors: Alexander Shraer, Maxim Gurevich, Vanja Josifovski, Marcus Fontoura
INFORMATION REDACTION FROM DOCUMENT DATA

Publication number: 20160110352

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for redacting data from a document collection generated for a set of documents that include personal information. The redaction of the data is based in part on a comparison of the document collection to a set of a personal documents of users for which the users have provided explicit approval to use in the processing of the document collection.

Type: Application

Filed: October 21, 2014

Publication date: April 21, 2016

Inventors: Mike Bendersky, Vanja Josifovski, Amitabh Saikia, Marc-Allen Cartright, Jie Yang, Luis Garcia Pueyo, MyLinh Yang
QUERY EVALUATION USING ANCESTOR INFORMATION

Publication number: 20150261815

Abstract: Provided are techniques for processing a query. A query is received, wherein the query is formed by one or more paths, and wherein each path includes one or more steps. A hierarchical document including one or more document nodes is received. While processing the query and traversing the hierarchical document, one or more extraction entries are constructed, wherein each extraction entry includes a step instance match candidate identifying a document node and a step instance ancestor path for the document node, and one or more tuples are constructed using the one or more extraction entries by associating the step instance match candidate from one of the one or more extraction entries with the step instance match candidate from at least one of the one or more other extraction entries.

Type: Application

Filed: June 2, 2015

Publication date: September 17, 2015

Inventors: Vanja JOSIFOVSKI, Edison L. TING
Using external sources for sponsored search AD selection

Patent number: 9129300

Abstract: A system and a method are provided for using external sources (e.g., landing pages) for sponsored search ad selection. In one example, the system identifies one or more regions of an external source. The one or more regions are relevant to a query. The external source includes a source that includes relevant data that is usable for augmenting an ad selection process. The system extracts one or more features from the one or more regions. The system determines which of the one or more features are relevant for item indexing. The system then augments an item selection process by using the one or more features that are relevant for item indexing.

Type: Grant

Filed: April 21, 2010

Date of Patent: September 8, 2015

Assignee: Yahoo! Inc.

Inventors: Marcus Fontoura, Vanja Josifovski, Evgeniy Gabrilovich, Bo Pang, Yejin Choi, Mauricio Riguette Mediano
Query evaluation using ancestor information

Patent number: 9087139

Abstract: Provided are techniques for processing a query. A query is received, wherein the query is formed by one or more paths, and wherein each path includes one or more steps. A hierarchical document including one or more document nodes is received. While processing the query and traversing the hierarchical document, one or more extraction entries are constructed, wherein each extraction entry includes a step instance match candidate identifying a document node and a step instance ancestor path for the document node, and one or more tuples are constructed using the one or more extraction entries by associating the step instance match candidate from one of the one or more extraction entries with the step instance match candidate from at least one of the one or more other extraction entries.

Type: Grant

Filed: February 12, 2014

Date of Patent: July 21, 2015

Assignee: International Business Machines Corporation

Inventors: Vanja Josifovski, Edison L. Ting
Context transfer in search advertising

Patent number: 8886636

Abstract: A computer-implemented method is disclosed for determining a type of landing page to which to transfer web searchers that enter a particular query, the method comprising: classifying a landing page as one of a plurality of landing page classes with a trained classifier of a computer based on textual content of the landing page; determining, by the computer, characteristics of one or more query to be associated with the landing page; and choosing, with the computer, whether to retain or to change classification of the landing page to be associated with the one or more query based on relative average conversion rates of advertisements on a plurality of manually-classified landing pages when associated with the characteristics of the one or more query.

Type: Grant

Filed: December 23, 2008

Date of Patent: November 11, 2014

Assignee: Yahoo! Inc.

Inventors: Evgeniy Gabrilovich, Andrei Broder, Bo Pang, Vanja Josifovski, Hila Becker
METHODS OF DYNAMICALLY CREATING PERSONALIZED INTERNET ADVERTISEMENTS BASED ON CONTENT

Publication number: 20140310100

Abstract: Advertising is used to generate awareness of commercial Internet web sites. To greatly simplify the marketing of a commercial Internet web site, the automatic creation of an advertising campaign would be desirable. A method of automatically creating an Internet web site may be performed by first crawling through the Internet web site to identify products and services offered by the Internet web site. Information about the identified products and services is stored. The system then creates advertisements for the identified products and services. The advertisements may include images, text, a link to the web page where the product or service was found, and keywords associated with the product or service. The automatically created advertisements may then be placed into an advertisement pool for use with advertising supported web sites. The automatic Internet advertisement campaign creations system of the present invention may be used to create free trial advertisement campaigns for potential advertising clients.

Type: Application

Filed: June 27, 2014

Publication date: October 16, 2014

Applicant: Yahoo! Inc.

Inventors: Andrei Zary Broder, Marcus Felipe Fontoura, Vanja Josifovski
Method and system for quantifying user interactions with web advertisements

Patent number: 8812362

Abstract: Methods and systems are provided that may be used to determine a probability of whether a visitor to a web document is likely to click on a web advertisement. An exemplary method may include detecting one or more features in a web document. One or more expert statistical models to which the web document belongs may be determined and associated weightings may be determined based, at least in part, on the one or more features detected. A click-through-rate probability for a web advertisement to be placed on the web document may be estimated based on the one or more expert statistical models.

Type: Grant

Filed: February 20, 2009

Date of Patent: August 19, 2014

Assignee: Yahoo! Inc.

Inventors: Deepak K. Agarwal, Vanja Josifovski, Andrei Broder, Evgeniy Gabrilovich, Robert Hall

1 2 3 4 5 … next