Patents by Inventor Rajesh M. Desai
Rajesh M. Desai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190294608Abstract: Provided are a computer program product, system, and method for distributed processing of a query with distributed posting lists. A dispatch map has entries, wherein each entry identifies one of a plurality of terms in a dictionary, wherein for each of the terms there is a posting list identifying zero or more objects including the term, wherein at least one of the dispatch map entries indicate at least one distributed processing element including the posting list for the term. The dispatch map is used to dispatch sub-expressions comprising portions of a query to distributed processing elements having the posting lists for terms in the sub-expressions, wherein the distributed processing elements distributed the sub-expressions execute the sub-expressions on the posting lists for the terms in the sub-expression.Type: ApplicationFiled: June 13, 2019Publication date: September 26, 2019Inventors: Rajesh M. Desai, Alon S. Housfater, Roger C. Raphael, Paul S. Taylor
-
Publication number: 20190278747Abstract: An electronic-discovery system and method, wherein content items and hold anchors are stored in a repository, tracking objects and representational anchor objects are stored in a database system, and the tracking objects represent the content items and the representational anchor objects represent the hold anchors. A first hold anchor is used for placing a hold on the content items for a first defined period of time, and a first representational anchor object and one or more of the tracking objects are used for representing and tracking the holds for the first defined period of time. When the first defined period of time expires, a second hold anchor is used for placing the hold on the content items for a second defined period of time, and a second representational anchor object and the tracking objects are used for representing and tracking the holds for the second defined period of time.Type: ApplicationFiled: May 14, 2019Publication date: September 12, 2019Inventors: Rajesh M. Desai, Aidon P. Jennery, Lijing E. Lin, Roger C. Raphael
-
Patent number: 10402400Abstract: Provided are a computer program product, system, and method for distributed processing of a query with distributed posting lists. A dispatch map has entries, wherein each entry identifies one of a plurality of terms in a dictionary, wherein for each of the terms there is a posting list identifying zero or more objects including the term, wherein at least one of the dispatch map entries indicate at least one distributed processing element including the posting list for the term. The dispatch map is used to dispatch sub-expressions comprising portions of a query to distributed processing elements having the posting lists for terms in the sub-expressions, wherein the distributed processing elements distributed the sub-expressions execute the sub-expressions on the posting lists for the terms in the sub-expression.Type: GrantFiled: June 25, 2015Date of Patent: September 3, 2019Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Alon S. Housfater, Roger C. Raphael, Paul S. Taylor
-
Publication number: 20190251286Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.Type: ApplicationFiled: April 23, 2019Publication date: August 15, 2019Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
-
Publication number: 20190228487Abstract: Embodiments generally relate to providing litigation management for multiple remote content systems using asynchronous bi-directional replication pipelines. In some embodiments, a method includes retrieving, at one or more inbound replicators of one or more respective bi-directional pipelines, metadata associated with documents stored in one or more content repositories. The method further includes resolving, at a governance control hub, conflicts associated with legal holds on one or more of the documents based on the metadata. The method further includes sending conflict resolution results from one or more outbound applicators of the bi-directional pipelines to the content repositories, where the content repositories enforce legal holds on the documents.Type: ApplicationFiled: January 23, 2018Publication date: July 25, 2019Inventors: Roger C. RAPHAEL, Ronald L. RATHGEBER, Rajesh M. DESAI, Gabriel VALENCIA, Justo PEREZ, William Russell BELKNAP, Sudhakar BASIREDDY
-
Publication number: 20190122000Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.Type: ApplicationFiled: October 20, 2017Publication date: April 25, 2019Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
-
Publication number: 20190095802Abstract: Provided are techniques for heuristic and non-semantic prediction of the cost to find and review data that is relevant to a task. A corpus of documents is accessed for a domain. Terms associated with the domain are accessed, where the terms have an order on a list. For each of the documents, term positional dispersion is determined for each of the terms in the ordered list associated with the domain. Then, a document review quanta is determined for the document based on a summation of the term positional dispersion for each term in that document adjusted by a weight. A subset of documents in the corpus of documents are selected that are to be reviewed based on the document review quanta for each of the selected documents exceeding a threshold.Type: ApplicationFiled: September 25, 2017Publication date: March 28, 2019Inventors: Roger C. Raphael, Rajesh M. Desai, Nazrul Islam, Theodore S. Barassi
-
Publication number: 20190079945Abstract: An electronic-discovery system and method, wherein content items and hold anchors are stored in a repository, tracking objects and representational anchor objects are stored in a database system, and the tracking objects represent the content items and the representational anchor objects represent the hold anchors. A first hold anchor is used for placing a hold on the content items for a first defined period of time, and a first representational anchor object and one or more of the tracking objects are used for representing and tracking the holds for the first defined period of time. When the first defined period of time expires, a second hold anchor is used for placing the hold on the content items for a second defined period of time, and a second representational anchor object and the tracking objects are used for representing and tracking the holds for the second defined period of time.Type: ApplicationFiled: September 14, 2017Publication date: March 14, 2019Inventors: Rajesh M. Desai, Aidon P. Jennery, Lijing E. Lin, Roger C. Raphael
-
Patent number: 10133713Abstract: Provided are techniques for a domain specific representation of document text for accelerated natural language processing. A document is selected from a set of documents to be analyzed. A character stream from the document is converted into a token stream based on tokenization rules. Irrelevant tokens are removed from the token stream. The tokens remaining in the token stream are converted into an integer domain representation based on a domain specific ontology dictionary. The integer domain representation are stored to a Graphics Processing Unit (GPU) processing queue of each of one or more GPUs. Then, a result set is received from the one or more GPUs.Type: GrantFiled: June 8, 2016Date of Patent: November 20, 2018Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Alon S. Housfater, Philip E. Parker, Roger C. Raphael
-
Publication number: 20180276222Abstract: Provided are techniques for a high performance compliance mechanism for structured and unstructured data in an enterprise. A record to represent a collection of structured objects is generated. The record is stored in a file plan container associated with a disposition schedule.Type: ApplicationFiled: March 27, 2017Publication date: September 27, 2018Inventors: William R. Belknap, Rajesh M. Desai, Roger C. Raphael, Ronald L. Rathgeber
-
Publication number: 20180268009Abstract: Processing a database query for sets of data includes assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data. A representation is then generated on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set. Finally, a query is processed based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time.Type: ApplicationFiled: March 15, 2017Publication date: September 20, 2018Inventors: Rajesh M. Desai, Magesh Jayapandian, Iun V. Leong, Justo L. Perez, Roger C. Raphael, Gabriel Valencia
-
Patent number: 9971760Abstract: In an approach for parallelizing document processing in an information handling system, a processor receives a document, wherein the document includes text content. A processor extracts information from the text content, utilizing natural language processing and semantic analysis, to form tokenized semantic partitions, comprising a plurality of sub-documents. A processor schedules a plurality of concurrently executing threads to process the plurality of sub-documents.Type: GrantFiled: December 22, 2014Date of Patent: May 15, 2018Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Philip E. Parker, Roger C. Raphael, Paul S. Taylor
-
Patent number: 9971761Abstract: In an approach for parallelizing document processing in an information handling system, a processor receives a document, wherein the document includes text content. A processor extracts information from the text content, utilizing natural language processing and semantic analysis, to form tokenized semantic partitions, comprising a plurality of sub-documents. A processor schedules a plurality of concurrently executing threads to process the plurality of sub-documents.Type: GrantFiled: June 9, 2015Date of Patent: May 15, 2018Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Philip E. Parker, Roger C. Raphael, Paul S. Taylor
-
Patent number: 9922129Abstract: Systems and associated methods for clustering a plurality of nodes based on connectivity among the plurality of nodes, determining relevant content of the clusters, and applying knowledge regarding the relevant content are described. The nodes can include for example web-based documents such as web pages. The clusters can include for example groups of web pages that are linked together, as via hyperlinks. The relevant content can include one or more topics associated with the web page, as for example determined via text mining. Applying the knowledge regarding the relevant content can include for example using the one or more topics associated with the web pages to augment search results and/or conduct contextual advertising.Type: GrantFiled: September 27, 2010Date of Patent: March 20, 2018Assignee: International Business Machines CorporationInventors: Varun Bhagwan, Rajesh M. Desai, Jeffrey Alan Kusnitz
-
Patent number: 9898447Abstract: Provided are techniques for a domain specific representation of document text for accelerated natural language processing. A document is selected from a set of documents to be analyzed. A character stream from the document is converted into a token stream based on tokenization rules. Irrelevant tokens are removed from the token stream. The tokens remaining in the token stream are converted into an integer domain representation based on a domain specific ontology dictionary. The integer domain representation are stored to a Graphics Processing Unit (GPU) processing queue of each of one or more GPUs. Then, a result set is received from the one or more GPUs.Type: GrantFiled: June 22, 2015Date of Patent: February 20, 2018Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Alon S. Housfater, Philip E. Parker, Roger C. Raphael
-
Patent number: 9613041Abstract: According to one embodiment of the present invention, a system extends a content repository by creating an auxiliary data store outside of the content repository and storing auxiliary data in the auxiliary data store, wherein the auxiliary data is associated with a collection of documents in the content repository. The system stores version information for the auxiliary data store and records of operations against the auxiliary data store in a log in the repository. In response to receiving a request for an operation against the auxiliary data store, the system determines that the auxiliary data store and repository are consistent based on the version information and applies the operation against the auxiliary data store. Embodiments of the present invention further include a method and computer program product for extending a content repository data model in substantially the same manners described above.Type: GrantFiled: October 3, 2013Date of Patent: April 4, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh M. Desai, Magesh Jayapandian, Aidon P. Jennery, Justo L. Perez
-
Patent number: 9606998Abstract: According to one embodiment of the present invention, a system extends a content repository by creating an auxiliary data store outside of the content repository and storing auxiliary data in the auxiliary data store, wherein the auxiliary data is associated with a collection of documents in the content repository. The system stores version information for the auxiliary data store and records of operations against the auxiliary data store in a log in the repository. In response to receiving a request for an operation against the auxiliary data store, the system determines that the auxiliary data store and repository are consistent based on the version information and applies the operation against the auxiliary data store. Embodiments of the present invention further include a method and computer program product for extending a content repository data model in substantially the same manners described above.Type: GrantFiled: June 6, 2014Date of Patent: March 28, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Rajesh M. Desai, Magesh Jayapandian, Aidon P. Jennery, Justo L. Perez
-
Patent number: 9594813Abstract: In searching electronic documents, prior to executing a query, a reviewer indicates whether a result set of the query will be dynamic or static. The query is then executed on the electronic documents to obtain an original result set, which is provided to the reviewer through a user interface. Upon determining that one or more changes to one or more of the electronic documents have occurred, and if the result set is static, then the original result set continues to be provided to the reviewer without re-executing the query. If the result set is dynamic, then the query is re-executed on the electronic documents to obtain an updated result set, and the updated result set is provided to the reviewer through the user interface. The original result set may be associated with a search session and/or may be a random sample of the electronic documents for an overview query.Type: GrantFiled: November 29, 2014Date of Patent: March 14, 2017Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Magesh Jayapandian, Aidon P. Jennery, Justo L. Perez
-
Patent number: 9589035Abstract: In searching electronic documents, prior to executing a query, a reviewer indicates whether a result set of the query will be dynamic or static. The query is then executed on the electronic documents to obtain an original result set, which is provided to the reviewer through a user interface. Upon determining that one or more changes to one or more of the electronic documents have occurred, and if the result set is static, then the original result set continues to be provided to the reviewer without re-executing the query. If the result set is dynamic, then the query is re-executed on the electronic documents to obtain an updated result set, and the updated result set is provided to the reviewer through the user interface. The original result set may be associated with a search session and/or may be a random sample of the electronic documents for an overview query.Type: GrantFiled: March 3, 2014Date of Patent: March 7, 2017Assignee: International Business Machines CorporationInventors: Rajesh M. Desai, Magesh Jayapandian, Aidon P. Jennery, Justo L. Perez
-
Patent number: 9582314Abstract: Embodiments of the present invention provide a method, system and computer program product for maintaining distributed state consistency in a distributed computing application. In an embodiment of the invention, a method for maintaining distributed state consistency in a distributed computing application can include registering a set of components of a distributed computing application, starting a transaction resulting in changes of state in different ones of the components in the registered set and determining in response to a conclusion of the transaction whether or not an inconsistency of state has arisen amongst the different components in the registered set in consequence of the changes of state in the different ones of the components in the registered set. If an inconsistency has arisen, each of the components in the registered set can be directed to rollback to a previously stored state. Otherwise a committal of state can be directed in each of the components in the registered set.Type: GrantFiled: September 25, 2009Date of Patent: February 28, 2017Assignee: International Business Machines CorporationInventors: Michael Busch, Rajesh M. Desai, Tom William Jacopi, Michael McCandless