Patents by Inventor Swapnil Hajela
Swapnil Hajela has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9514216Abstract: Exemplary methods and apparatuses are provided which may be used for classifying and indexing segmented portions of web pages and providing related information for use in information extraction and/or information retrieval systems. In an embodiment, an index of segmented portions may be used by a search engine to respond to a search query. In an embodiment, one or more machine learned models may be used to identify one or more feature properties of a plurality of segmented portions within one or more files, or otherwise inferable from the one or more files. In an embodiment, one or more machine learned models may be used to classify one or more of a plurality of segmented portions as being at least one of a plurality of segment types.Type: GrantFiled: September 8, 2014Date of Patent: December 6, 2016Assignee: Yahoo! Inc.Inventors: Lei Duan, Fan Li, Srinivas Vadrevu, Emre Velipasaoglu, Swapnil Hajela, Deepayan Chakrabarti
-
Publication number: 20150066934Abstract: Exemplary methods and apparatuses are provided which may be used for classifying and indexing segmented portions of web pages and providing related information for use in information extraction and/or information retrieval systems.Type: ApplicationFiled: September 8, 2014Publication date: March 5, 2015Inventors: Lei Duan, Fan Li, Srinivas Vadrevu, Emre Velipasaoglu, Swapnil Hajela, Deepayan Chakrabarti
-
Patent number: 8849725Abstract: Exemplary methods and apparatuses are provided which may be used for classifying and indexing segmented portions of web pages and providing related information for use in information extraction and/or information retrieval systems.Type: GrantFiled: August 10, 2009Date of Patent: September 30, 2014Assignee: Yahoo! Inc.Inventors: Lei Duan, Fan Li, Srinivas Vadrevu, Emre Velipasaoglu, Swapnil Hajela, Deepayan Chakrabarti
-
Patent number: 8255793Abstract: To provide valuable information regarding a webpage, the webpage must be divided into distinct semantically coherent segments for analysis. A set of heuristics allow a segmentation algorithm to identify an optimal number of segments for a given webpage or any portion thereof more accurately. A first heuristic estimates the optimal number of segments for any given webpage or portion thereof. A second heuristic coalesces segments where the number of segments identified far exceeds the optimal number recommended. A third heuristic coalesces segments corresponding to a portion of a webpage with much unused whitespace and little content. A fourth heuristic coalesces segments of nodes that have a recommended number of segments below a certain threshold into segments of other nodes. A fifth heuristic recursively analyzes and splits segments that correspond to webpage portions surpassing a certain threshold portion size.Type: GrantFiled: January 8, 2008Date of Patent: August 28, 2012Assignee: Yahoo! Inc.Inventors: Deepayan Chakrabarti, Manav Ratan Mital, Swapnil Hajela, Emre Velipasaoglu
-
Patent number: 8135717Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.Type: GrantFiled: March 30, 2009Date of Patent: March 13, 2012Assignee: SAP America, Inc.Inventors: Ramana B. Rao, Swapnil Hajela, Nareshkumar Rajkumar
-
Patent number: 8131730Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.Type: GrantFiled: March 30, 2009Date of Patent: March 6, 2012Assignee: SAP America, Inc.Inventors: Swapnil Hajela, Nareshkumar Rajkumar
-
Publication number: 20110035345Abstract: Exemplary methods and apparatuses are provided which may be used for classifying and indexing segmented portions of web pages and providing related information for use in information extraction and/or information retrieval systems.Type: ApplicationFiled: August 10, 2009Publication date: February 10, 2011Applicant: Yahoo! Inc.Inventors: Lei Duan, Fan Li, Srinivas Vadrevu, Emre Velipasaoglu, Swapnil Hajela, Deepayan Chakrabarti
-
Publication number: 20090193005Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.Type: ApplicationFiled: March 30, 2009Publication date: July 30, 2009Inventors: Ramana B. Rao, Swapnil Hajela, Nareshkumar Rajkumar
-
Publication number: 20090187564Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.Type: ApplicationFiled: March 30, 2009Publication date: July 23, 2009Inventors: Swapnil Hajela, Nareshkumar Rajkumar
-
Publication number: 20090177959Abstract: To provide valuable information regarding a webpage, the webpage must be divided into distinct semantically coherent segments for analysis. A set of heuristics allow a segmentation algorithm to identify an optimal number of segments for a given webpage or any portion thereof more accurately. A first heuristic estimates the optimal number of segments for any given webpage or portion thereof. A second heuristic coalesces segments where the number of segments identified far exceeds the optimal number recommended. A third heuristic coalesces segments corresponding to a portion of a webpage with much unused whitespace and little content. A fourth heuristic coalesces segments of nodes that have a recommended number of segments below a certain threshold into segments of other nodes. A fifth heuristic recursively analyzes and splits segments that correspond to webpage portions surpassing a certain threshold portion size.Type: ApplicationFiled: January 8, 2008Publication date: July 9, 2009Inventors: DEEPAYAN CHAKRABARTI, Manav Ratan Mital, Swapnil Hajela, Emre Velipasaoglu
-
Patent number: 7516125Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.Type: GrantFiled: March 29, 2006Date of Patent: April 7, 2009Assignee: Business Objects AmericasInventors: Ramana B. Rao, Swapnil Hajela, Nareshkumar Rajkumar
-
Patent number: 7512596Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.Type: GrantFiled: March 29, 2006Date of Patent: March 31, 2009Assignee: Business Objects AmericasInventors: Swapnil Hajela, Nareshkumar Rajkumar
-
Publication number: 20070027853Abstract: Phrases in a corpus of documents including stopwords are found using a data processor arranged to execute phrase queries. Memory stores an index structure which maps entries in the index structure to documents in the corpus. Entries in the index structure represent words and other entries represent stopwords found in the corpus coalesced with prefixes of respective adjacent words adjacent to the stopwords. The prefixes comprise one or more leading characters of the respective adjacent words. A query processor forms a modified query by substituting a stopword with a search token representing the stopword coalesced with a prefix of the next word in the query. The processor executes the modified query. Also, index structures including coalesced stopwords are created and maintained.Type: ApplicationFiled: March 29, 2006Publication date: February 1, 2007Applicant: Inxight Software, Inc.Inventors: Swapnil Hajela, Nareshkumar Rajkumar
-
Publication number: 20070027854Abstract: Words having selected characteristics in a corpus of documents are found using a data processor arranged to execute queries. Memory stores an index structure in which entries in the index structure map words and marks for words having the selected characteristics to locations within documents in the corpus. Entries in the index structure represent words and other entries represent marks with the location information of a marked word. The entries for the marks can be tokens coalesced with prefixes of respective marked words or adjacent. A query processor forms a modified query by adding a mark for a word to the query. The processor executes the modified query.Type: ApplicationFiled: March 29, 2006Publication date: February 1, 2007Applicant: Inxight Software, Inc.Inventors: Ramana Rao, Swapnil Hajela, Nareshkumar Rajkumar