Patents by Inventor Roger C. Raphael

Roger C. Raphael has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20200210619
    Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.
    Type: Application
    Filed: March 6, 2020
    Publication date: July 2, 2020
    Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
  • Patent number: 10671756
    Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.
    Type: Grant
    Filed: April 23, 2019
    Date of Patent: June 2, 2020
    Assignee: International Business Machines Corporation
    Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
  • Patent number: 10671754
    Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.
    Type: Grant
    Filed: October 20, 2017
    Date of Patent: June 2, 2020
    Assignee: International Business Machines Corporation
    Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
  • Publication number: 20200167329
    Abstract: A method, computer system, and computer program product for segment differential-based document text-index modeling are provided. The embodiment may include receiving, by a processor, a document with a valid document ID and version ID tuple. The embodiment may also include determining the received document is a new version of a previously stored document and consequently multiplexing versions of the document into a single indexed document. The embodiment may further include segmenting the received document and building a token vector. The embodiment may also include calculating a difference between the received new version of the document and the previously stored document using information obtained from the segmentation. The embodiment may further include in response to the calculated difference being below a pre-configured threshold value, discarding the received new version.
    Type: Application
    Filed: November 28, 2018
    Publication date: May 28, 2020
    Inventors: Roger C. Raphael, Rajesh M. Desai, Fumihiko Terui, Justo L. Perez, Thomas Hampp
  • Publication number: 20200134757
    Abstract: Provided are techniques for extracting, deriving, and using legal matter semantics to generate e-discovery queries in an e-discovery system. A semantic knowledge graph is iteratively built by receiving meet and confer document instances, legal matter types, historical e-discovery queries for different legal matters, and legal semantic types extracted from the historical e-discovery queries. The legal semantic types are added to the semantic knowledge graph, and a list of terms that serve as a basis of an initial query are identified. An e-discovery query is generated for an e-discovery system. The e-discovery query is modified using the semantic knowledge graph and additional input by receiving a legal matter type and meet and confer information, obtaining the legal semantic types that are relevant to the legal matter type and the meet and confer information, and modifying the e-discovery query. The modified e-discovery query is provided. Then, the modified e-discovery query is executed.
    Type: Application
    Filed: October 30, 2018
    Publication date: April 30, 2020
    Inventors: Roger C. Raphael, Rajesh M. Desai, Nazrul Islam, Satwik Hebbar
  • Patent number: 10628414
    Abstract: Provided are a computer program product, system, and method for distributed processing of a query with distributed posting lists. A dispatch map has entries, wherein each entry identifies one of a plurality of terms in a dictionary, wherein for each of the terms there is a posting list identifying zero or more objects including the term, wherein at least one of the dispatch map entries indicate at least one distributed processing element including the posting list for the term. The dispatch map is used to dispatch sub-expressions comprising portions of a query to distributed processing elements having the posting lists for terms in the sub-expressions, wherein the distributed processing elements distributed the sub-expressions execute the sub-expressions on the posting lists for the terms in the sub-expression.
    Type: Grant
    Filed: June 21, 2016
    Date of Patent: April 21, 2020
    Assignee: International Business Machines Corporation
    Inventors: Rajesh M. Desai, Alon S. Housfater, Roger C. Raphael, Paul S. Taylor
  • Publication number: 20190354514
    Abstract: Processing a database query for sets of data includes assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data. A representation is then generated on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set. Finally, a query is processed based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time.
    Type: Application
    Filed: July 29, 2019
    Publication date: November 21, 2019
    Inventors: Rajesh M. Desai, Magesh Jayapandian, Iun V. Leong, Justo L. Perez, Roger C. Raphael, Gabriel Valencia
  • Patent number: 10452631
    Abstract: Processing a database query for sets of data includes assigning a unique identifier from an integer space to each entity within data and creating one or more sets of entities each pertaining to a corresponding entity within the data. A representation is then generated on disk for each set of entities, wherein each representation encompasses and is suited for a range of the unique identifiers of entities within a corresponding set and indicates a presence of an entity within that corresponding set. Finally, a query is processed based on the representation for each set of entities to retrieve data satisfying the query, wherein the representation provides a constant time for association and dissociation operations that are append-only operations with deferred merge and automatic filtering of deleted and duplicate entities at query time.
    Type: Grant
    Filed: March 15, 2017
    Date of Patent: October 22, 2019
    Assignee: International Business Machines Corporation
    Inventors: Rajesh M. Desai, Magesh Jayapandian, Iun V. Leong, Justo L. Perez, Roger C. Raphael, Gabriel Valencia
  • Publication number: 20190304041
    Abstract: Embodiments generally relate to providing litigation management for multiple remote content systems using asynchronous bi-directional replication pipelines. In some embodiments, a method includes retrieving, at one or more inbound replicators of one or more respective bi-directional pipelines, metadata associated with documents stored in one or more content repositories. The method further includes resolving, at a governance control hub, conflicts associated with legal holds on one or more of the documents based on the metadata. The method further includes sending conflict resolution results from one or more outbound applicators of the bi-directional pipelines to the content repositories, where the content repositories enforce legal holds on the documents.
    Type: Application
    Filed: June 18, 2019
    Publication date: October 3, 2019
    Inventors: Roger C. RAPHAEL, Ronald L. RATHGEBER, Rajesh M. DESAI, Gabriel VALENCIA, Justo L. PEREZ, William Russell BELKNAP, Sudhakar BASIREDDY
  • Publication number: 20190294608
    Abstract: Provided are a computer program product, system, and method for distributed processing of a query with distributed posting lists. A dispatch map has entries, wherein each entry identifies one of a plurality of terms in a dictionary, wherein for each of the terms there is a posting list identifying zero or more objects including the term, wherein at least one of the dispatch map entries indicate at least one distributed processing element including the posting list for the term. The dispatch map is used to dispatch sub-expressions comprising portions of a query to distributed processing elements having the posting lists for terms in the sub-expressions, wherein the distributed processing elements distributed the sub-expressions execute the sub-expressions on the posting lists for the terms in the sub-expression.
    Type: Application
    Filed: June 13, 2019
    Publication date: September 26, 2019
    Inventors: Rajesh M. Desai, Alon S. Housfater, Roger C. Raphael, Paul S. Taylor
  • Publication number: 20190278747
    Abstract: An electronic-discovery system and method, wherein content items and hold anchors are stored in a repository, tracking objects and representational anchor objects are stored in a database system, and the tracking objects represent the content items and the representational anchor objects represent the hold anchors. A first hold anchor is used for placing a hold on the content items for a first defined period of time, and a first representational anchor object and one or more of the tracking objects are used for representing and tracking the holds for the first defined period of time. When the first defined period of time expires, a second hold anchor is used for placing the hold on the content items for a second defined period of time, and a second representational anchor object and the tracking objects are used for representing and tracking the holds for the second defined period of time.
    Type: Application
    Filed: May 14, 2019
    Publication date: September 12, 2019
    Inventors: Rajesh M. Desai, Aidon P. Jennery, Lijing E. Lin, Roger C. Raphael
  • Patent number: 10402400
    Abstract: Provided are a computer program product, system, and method for distributed processing of a query with distributed posting lists. A dispatch map has entries, wherein each entry identifies one of a plurality of terms in a dictionary, wherein for each of the terms there is a posting list identifying zero or more objects including the term, wherein at least one of the dispatch map entries indicate at least one distributed processing element including the posting list for the term. The dispatch map is used to dispatch sub-expressions comprising portions of a query to distributed processing elements having the posting lists for terms in the sub-expressions, wherein the distributed processing elements distributed the sub-expressions execute the sub-expressions on the posting lists for the terms in the sub-expression.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: September 3, 2019
    Assignee: International Business Machines Corporation
    Inventors: Rajesh M. Desai, Alon S. Housfater, Roger C. Raphael, Paul S. Taylor
  • Publication number: 20190251286
    Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.
    Type: Application
    Filed: April 23, 2019
    Publication date: August 15, 2019
    Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
  • Publication number: 20190228487
    Abstract: Embodiments generally relate to providing litigation management for multiple remote content systems using asynchronous bi-directional replication pipelines. In some embodiments, a method includes retrieving, at one or more inbound replicators of one or more respective bi-directional pipelines, metadata associated with documents stored in one or more content repositories. The method further includes resolving, at a governance control hub, conflicts associated with legal holds on one or more of the documents based on the metadata. The method further includes sending conflict resolution results from one or more outbound applicators of the bi-directional pipelines to the content repositories, where the content repositories enforce legal holds on the documents.
    Type: Application
    Filed: January 23, 2018
    Publication date: July 25, 2019
    Inventors: Roger C. RAPHAEL, Ronald L. RATHGEBER, Rajesh M. DESAI, Gabriel VALENCIA, Justo PEREZ, William Russell BELKNAP, Sudhakar BASIREDDY
  • Patent number: 10275458
    Abstract: A data structure is generated containing enumerators for data types of a domain, text forms of the enumerators and context patterns for the text forms. The data structure also includes information extraction rules that are associated with the enumerators. The data structure is updated with additional context patterns and text forms that are identified within a set of documents to which text analytic annotators are to be tuned. The set of documents are analyzed against the updated data structure and additional extraction rules are generated based on the analysis.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: April 30, 2019
    Assignee: International Business Machines Corporation
    Inventors: Harish Deshmukh, Philip E. Parker, Roger C. Raphael, Paul S. Taylor, Gabriel Valencia
  • Publication number: 20190122000
    Abstract: A method, system and computer program product for detecting sensitive personal information in a storage device. A block delta list containing a list of changed blocks in the storage device is processed. After identifying the changed blocks from the block delta list, a search is performed on those identified changed blocks for sensitive personal information using a character scanning technique. After identifying a changed block deemed to contain sensitive personal information, the changed block is translated from the block level to the file level using a hierarchical reverse mapping technique. By only analyzing the changed blocks to determine if they contain sensitive personal information, a lesser quantity of blocks needs to be processed in order to detect sensitive personal information in the storage device in near real-time. In this manner, sensitive personal information is detected in the storage device using fewer computing resources in a shorter amount of time.
    Type: Application
    Filed: October 20, 2017
    Publication date: April 25, 2019
    Inventors: Rajesh M. Desai, Mu Qiao, Roger C. Raphael, Ramani Routray
  • Publication number: 20190095802
    Abstract: Provided are techniques for heuristic and non-semantic prediction of the cost to find and review data that is relevant to a task. A corpus of documents is accessed for a domain. Terms associated with the domain are accessed, where the terms have an order on a list. For each of the documents, term positional dispersion is determined for each of the terms in the ordered list associated with the domain. Then, a document review quanta is determined for the document based on a summation of the term positional dispersion for each term in that document adjusted by a weight. A subset of documents in the corpus of documents are selected that are to be reviewed based on the document review quanta for each of the selected documents exceeding a threshold.
    Type: Application
    Filed: September 25, 2017
    Publication date: March 28, 2019
    Inventors: Roger C. Raphael, Rajesh M. Desai, Nazrul Islam, Theodore S. Barassi
  • Publication number: 20190079945
    Abstract: An electronic-discovery system and method, wherein content items and hold anchors are stored in a repository, tracking objects and representational anchor objects are stored in a database system, and the tracking objects represent the content items and the representational anchor objects represent the hold anchors. A first hold anchor is used for placing a hold on the content items for a first defined period of time, and a first representational anchor object and one or more of the tracking objects are used for representing and tracking the holds for the first defined period of time. When the first defined period of time expires, a second hold anchor is used for placing the hold on the content items for a second defined period of time, and a second representational anchor object and the tracking objects are used for representing and tracking the holds for the second defined period of time.
    Type: Application
    Filed: September 14, 2017
    Publication date: March 14, 2019
    Inventors: Rajesh M. Desai, Aidon P. Jennery, Lijing E. Lin, Roger C. Raphael
  • Patent number: 10169334
    Abstract: A data structure is generated containing enumerators for data types of a domain, text forms of the enumerators and context patterns for the text forms. The data structure also includes information extraction rules that are associated with the enumerators. The data structure is updated with additional context patterns and text forms that are identified within a set of documents to which text analytic annotators are to be tuned. The set of documents are analyzed against the updated data structure and additional extraction rules are generated based on the analysis.
    Type: Grant
    Filed: March 26, 2015
    Date of Patent: January 1, 2019
    Assignee: International Business Machines Corporation
    Inventors: Harish Deshmukh, Philip E. Parker, Roger C. Raphael, Paul S. Taylor, Gabriel Valencia
  • Publication number: 20180365560
    Abstract: A method loads training samples and forms training data set from the training samples. The method uses the bidirectional LSTM recurrent neural network that includes one or more input cells and one or more output cells and trains it with the training data set. The method determines a sensitive information and confidence values based on analyzing a text with the trained neural network. The method selects predicted samples from the text, where the sensitive information confidence value corresponding to a one or more predicted samples is above a threshold value, based on determining that a sensitive information accuracy has improved. The method forms a new training data set, where the new training data set comprises the samples and the verified one or more predicted samples based on the verified one or more predicted samples, and trains the previously trained neural network with the new training data set.
    Type: Application
    Filed: June 19, 2017
    Publication date: December 20, 2018
    Inventors: MU QIAO, YUYA J. ONG, RAMANI ROUTRAY, ROGER C. RAPHAEL