Patents by Inventor Marcia Lucas Pesce

Marcia Lucas Pesce has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11120031
    Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: September 14, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Adriana Bechara Prado, Vitor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
  • Publication number: 20210133189
    Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.
    Type: Application
    Filed: October 31, 2019
    Publication date: May 6, 2021
    Inventors: Adriana Bechara Prado, Vítor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira