Patents by Inventor Wagner dos Santos Vieira

Wagner dos Santos Vieira has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11120031
    Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: September 14, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Adriana Bechara Prado, Vitor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
  • Publication number: 20210133189
    Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.
    Type: Application
    Filed: October 31, 2019
    Publication date: May 6, 2021
    Inventors: Adriana Bechara Prado, Vítor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
  • Patent number: 10901782
    Abstract: Techniques are provided for dataflow execution time estimation for distributed processing frameworks. An exemplary method comprises: obtaining an input dataset for a dataflow for execution; determining a substantially minimal data unit for a given operation of the dataflow processed by the given operation; estimating a number of rounds required to execute a number of data units in the input dataset using nodes assigned to execute the given operation; determining an execution time spent by the given operation to process one data unit; estimating the execution time for the given operation based on the execution time spent by the given operation to process one data unit and the number of rounds required to execute the number of data units in the input dataset; and executing the given operation with the input dataset. A persistent cost model is optionally employed to record the execution times of known dataflow operations.
    Type: Grant
    Filed: July 20, 2018
    Date of Patent: January 26, 2021
    Assignee: EMC IP Holding Company LLC
    Inventors: Vinícius Michel Gottin, Jonas F. Dias, Edward José Pacheco Condori, Angelo E. M. Ciarlini, Bruno Carlos da Cunha Costa, Fábio André Machado Porto, Paulo de Figueiredo Pires, Yania Molina Souto, Wagner dos Santos Vieira
  • Publication number: 20200026550
    Abstract: Techniques are provided for dataflow execution time estimation for distributed processing frameworks. An exemplary method comprises: obtaining an input dataset for a dataflow for execution; determining a substantially minimal data unit for a given operation of the dataflow processed by the given operation; estimating a number of rounds required to execute a number of data units in the input dataset using nodes assigned to execute the given operation; determining an execution time spent by the given operation to process one data unit; estimating the execution time for the given operation based on the execution time spent by the given operation to process one data unit and the number of rounds required to execute the number of data units in the input dataset; and executing the given operation with the input dataset. A persistent cost model is optionally employed to record the execution times of known dataflow operations.
    Type: Application
    Filed: July 20, 2018
    Publication date: January 23, 2020
    Inventors: Vinícius Michel Gottin, Jonas F. Dias, Edward José Pacheco Condori, Angelo E. M. Ciarlini, Bruno Carlos da Cunha Costa, Fábio André Machado Porto, Paulo de Figueiredo Pires, Yania Molina Souto, Wagner dos Santos Vieira
  • Patent number: 10360215
    Abstract: Pattern queries are evaluated in parallel over large N-dimensional datasets to identify features of interest.
    Type: Grant
    Filed: March 30, 2015
    Date of Patent: July 23, 2019
    Assignee: EMC Corporation
    Inventors: Angelo E. M. Ciarlini, Fabio A. M. Porto, Amir H. K. Moghadam, Jonas F. Bias, Paulo de Figueiredo Pires, Fabio A. Perosi, Alex L. Bordignon, Bruno Carlos da Cunha Costa, Wagner dos Santos Vieira
  • Patent number: 10324845
    Abstract: Techniques are provided for automatic placement of cache operations in a dataflow. An exemplary method obtains a graph representation of a dataflow of operations; determines a number of executions and a computational cost of the operations, and a computational cost of a caching operation to cache a dataset generated by an operation; establishes a dataflow state structure recording values for properties of the dataflow operations for a number of variations of caching various dataflow operations; determines a cache gain factor for dataflow operations as an estimated reduction in the accumulated cost of the dataflow by caching an output dataset of a given operation; determines changes in the dataflow state structure by caching an output dataset of a different operation in the dataflow; and searches the dataflow state structures to determine the output datasets to cache based on a total dataflow execution cost.
    Type: Grant
    Filed: July 28, 2017
    Date of Patent: June 18, 2019
    Assignee: EMC IP Holding Company LLC
    Inventors: Vinicius Michel Gottin, Edward José Pacheco Condori, Jonas F. Dias, Angelo E. M. Ciarlini, Bruno Carlos da Cunha Costa, Wagner dos Santos Vieira, Paulo de Figueiredo Pires, Fábio André Machado Porto, Yania Molina Souto