Patents by Inventor Adriana Bechara Prado

Adriana Bechara Prado has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Workload-Oriented Prediction of Response Times of Storage Systems

Publication number: 20210342712

Abstract: Training examples are created from telemetry data, in which each training example engineered features derived from the telemetry data, storage system characteristics about the storage system that processed the workload associated with the telemetry data, and the response time of the storage system while processing the workload. The training examples are provided to an unsupervised learning process which assigns the training examples to clusters. Training examples of each cluster are used to train/test a separate supervised learning process for the cluster, to cause each supervised learning process to learn a regression between independent variables (system characteristics and workload features) and a dependent variable (storage system response time). To determine a response time of a proposed storage system, the proposed workload is used to select one of the clusters, and then the trained learning process for the selected cluster is used to determine the response time of the proposed storage system.

Type: Application

Filed: May 4, 2020

Publication date: November 4, 2021

Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
LOCALITY-AWARE COMPRESSOR-DECOMPRESSOR FOR KEEPING PREDICTION MODELS UP-TO-DATE IN RESOURCE CONSTRAINED NETWORKS

Publication number: 20210334668

Abstract: A global prediction manager for generating predictions using data from data zones includes storage for storing a model repository comprising a global model set and a prediction manager. The prediction manager obtains a local model set from a data zone of the data zones indicating that the global model set is unacceptable; makes a determination that the local model set is acceptable; in response to the determination: distributes the local model set to at least one second data zone of the data zones; obtains compressed telemetry data, that was compressed using the local model set, from the data zone and the at least one second data zone; and generates a global prediction regarding a future operating condition of the data zones using: the compressed local telemetry data and the local model set.

Type: Application

Filed: April 27, 2020

Publication date: October 28, 2021

Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva, Tiago Salviano Calmon
FRAMEWORK FOR MEASURING TELEMETRY DATA VARIABILITY FOR CONFIDENCE EVALUATION OF A MACHINE LEARNING ESTIMATOR

Publication number: 20210334678

Abstract: A deployment manager includes storage for storing a prediction model based on telemetry data from the deployments and a prediction manager. The prediction manager generates, using the prediction model and second telemetry data obtained from a deployment of the deployments: a prediction, and a prediction error estimate; in response to a determination that the prediction indicates a negative impact on the deployment: generates a confidence estimation for the prediction based on a variability of the second telemetry data from the telemetry data; in response to a second determination that the confidence estimation indicates that the prediction error estimate is inaccurate: remediates the prediction based on the variability to obtain an updated prediction; and performs an action set, based on the updated prediction, to reduce an impact of the negative impact on the deployment.

Type: Application

Filed: April 27, 2020

Publication date: October 28, 2021

Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
CONFIDENT PEAK-AWARE RESPONSE TIME ESTIMATION BY EXPLOITING TELEMETRY DATA FROM DIFFERENT SYSTEM CONFIGURATIONS

Publication number: 20210334597

Abstract: A prediction manager for providing responsiveness predictions for deployments includes persistent storage and a predictor. The persistent storage stores training data and conditioned training data. The predictor is programmed to obtain training data based on: a configuration of at least one deployment of the deployments, and a measured responsiveness of the at least one deployment, perform a peak extraction analysis on the measured responsiveness to obtain conditioned training data, obtain a prediction model using: the training data, and a first untrained prediction model, obtain a confidence prediction model using: the conditioned training data, and a second untrained prediction model, obtain a combined prediction using: the prediction model, and the confidence prediction model, and perform, based on the combined prediction, an action set to prevent a responsiveness failure.

Type: Application

Filed: April 27, 2020

Publication date: October 28, 2021

Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
Methods and apparatus for evaluation of combinatorial processes using simulation and multiple parallel statistical analyses of real data

Patent number: 11120174

Abstract: Methods and apparatus are provided for evaluating combinatorial processes using simulation techniques and multiple parallel statistical analyses of real-world data. A simulation model is generated that simulates one or more steps of a combinatorial process. The simulation model comprises key features of the combinatorial process. A plurality of first data mining tasks are performed in parallel over real data of the combinatorial process to obtain key feature prediction models that estimate the key features. The key feature prediction models are bound to the simulation model. Query types to be supported are identified and a plurality of simulation runs are generated in parallel, comprising simulated data for the supported query types. A plurality of second data mining tasks are performed in parallel over the plurality of simulation runs to build global prediction models to answer queries of each supported query type. An answer to a user query is determined using the global prediction models.

Type: Grant

Filed: March 20, 2015

Date of Patent: September 14, 2021

Assignee: EMC IP Holding Company LLC

Inventors: Angelo E. M. Ciarlini, Vinícius Michel Gottin, Rodrigo de Souza Lima Espinha, Adriana Bechara Prado, Rodrigo Dias Arruda Senra
Automatic indexing of relevant domains in a data lake for data discovery and integration

Patent number: 11120031

Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.

Type: Grant

Filed: October 31, 2019

Date of Patent: September 14, 2021

Assignee: EMC IP Holding Company LLC

Inventors: Adriana Bechara Prado, Vitor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
Methods and apparatus for person-centric multichannel opinion mining in data lakes

Patent number: 11113306

Abstract: Person-centric multi-channel opinion mining is performed in a single data repository, such as a data lake. An exemplary method comprises obtaining multi-channel heterogeneous data from a plurality of channels; identifying entities that are targets of opinion information across the plurality of channels; extracting a plurality of user identities from the plurality of channels; aligning the plurality of extracted user identities across the plurality of channels to link common user identities; identifying the entities that are targets of the opinion information of the extracted user identities; linking opinion information of the extracted user identities with a user identity associated with an opinion holder that expressed the opinion information; determining whether the opinion information comprises a positive or negative opinion; and providing a summary of the opinion information of a given opinion holder.

Type: Grant

Filed: April 29, 2016

Date of Patent: September 7, 2021

Assignee: EMC IP Holding Company LLC

Inventors: Karin Breitman, Rodrigo Dias Arruda Senra, Adriana Bechara Prado
COMPRESSION AND DECOMPRESSION OF TELEMETRY DATA FOR PREDICTION MODELS

Publication number: 20210232968

Abstract: An autoregressor that compresses input data for a specific purpose. Input data is compressed using a compression/decompression framework and by accounting for a purpose of a prediction model. The compression aspect of the framework is distributed and the decompression aspect of the framework may be centralized. The compression/decompression framework and a machine learning prediction model can be centrally trained. The compressor is distributed to nodes such that the input data can be compressed and transmitted to a central node. The model and the compression/decompression framework are continually trained on new data. This allows for lossy compression and higher compression rates while maintaining low prediction error rates.

Type: Application

Filed: January 29, 2020

Publication date: July 29, 2021

Inventors: Paulo Abelha Ferreira, Pablo Nascimento da Silva, Adriana Bechara Prado
Method and Apparatus for Estimating a Distribution of Response Times of a Storage System for a Proposed Workload

Publication number: 20210223963

Abstract: A distribution of response times of a storage system can be estimated for a proposed workload using a trained learning process. Collections of information about operational characteristics of multiple storage systems are obtained, in which each collection includes parameters describing the configuration of the storage system that was used to create the collection, workload characteristics describing features of the workload that the storage system processed, and storage system response times. For each collection, workload characteristics are aggregated, and the storage system response information is used to train a probabilistic mixture model. The aggregated workload information, storage system characteristics, and probabilistic mixture model parameters of the collections form training examples that are used to train the learning process.

Type: Application

Filed: January 20, 2020

Publication date: July 22, 2021

Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
STORAGE SYSTEM CONFIGURATION BASED ON WORKLOAD CHARACTERISTICS AND PERFORMANCE METRICS

Publication number: 20210223982

Abstract: One or more aspects of the present disclosure relate to providing storage system configuration recommendations. System configurations of one or more storage devices can be determined based on their respective collected telemetry information. Performance of storage devices having different system configurations can be predicted based on one or more of: the collected telemetry information and each of the different system configurations. In response to receiving one or more requested performance characteristics and workload conditions, one or more recommended storage device configurations can be provided for each request based on the predicted performance characteristics, the requested performance characteristics, and the workload conditions.

Type: Application

Filed: January 17, 2020

Publication date: July 22, 2021

Applicant: EMC IP Holding Company LLC

Inventors: Adriana Bechara Prado, Pablo Nascimento Da Silva, Paulo Abelha Ferreira
Method, medium, and system for recommending compositions of product features using regression trees

Patent number: 11030667

Abstract: Product planning techniques are provided that recommend compositions of product features for weighted heterogeneous consumer segments using regression trees. An exemplary method comprises obtaining historical consumer data comprising product preferences for existing product items for multiple consumer segments; obtaining product features indicating characteristics for each existing product item; prioritizing the consumer segments by obtaining a weight indicating an interest in each consumer segment; computing a total performance metric, for each product item, by calculating a dot product between the consumer segment weights and respective preferences of the consumer segments regarding a given product item; obtaining a regression tree from the existing product items to predict the total performance metric in terms of corresponding product features; and selecting a combination of the product features to be used in future product items based on identified paths in the regression tree.

Type: Grant

Filed: October 31, 2016

Date of Patent: June 8, 2021

Assignee: EMC IP Holding Company LLC

Inventors: Adriana Bechara Prado, Victor Bursztyn, Jonas F. Dias, André de Almeida Maximo, Angelo E. M. Ciarlini
Automatic Indexing of Relevant Domains in a Data Lake for Data Discovery and Integration

Publication number: 20210133189

Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.

Type: Application

Filed: October 31, 2019

Publication date: May 6, 2021

Inventors: Adriana Bechara Prado, Vítor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
Methods and apparatus for a semantic multi-database data lake

Patent number: 10901973

Abstract: Methods and apparatus are provided for integrating a plurality of different database types in a semantic multi-database data lake. An exemplary method comprises providing a plurality of databases having different database types; translating ontology definition language database commands obtained from a user into a plurality of data definition language and/or data manipulation language commands supported by the different database types in order to replicate data from the user to each of the different database types; obtaining a query specified in a query language of a given database; and delegating the query to the given database. A plurality of cluster gateways optionally manage a corresponding plurality of clusters of database instances and wherein queries are delegated to a given database instance by delegating the queries to the appropriate cluster gateway. Dark data that was not queried by any supported query language in a predefined period of time can be detected.

Type: Grant

Filed: April 29, 2016

Date of Patent: January 26, 2021

Assignee: EMC IP Holding Company LLC

Inventors: Rodrigo Dias Arruda Senra, Karin Breitman, Adriana Bechara Prado, Victor Bursztyn
Adaptive look-ahead configuration for prefetching data in input/output operations based on request size and frequency

Patent number: 10871902

Abstract: Techniques are provided for adaptive look-ahead configuration for data prefetching based on request size and frequency. One method comprises performing the following steps: estimating an earning value for a particular portion based on an average size and frequency of past input/output requests for the particular portion; calculating a quota for the particular portion by normalizing the earning value for the particular portion of the storage system based on earning values of one or more additional portions of the storage system; obtaining a size of a look-ahead window for a new request based on the quota for the particular portion over a prefetch budget assigned to the storage system; and moving a requested data item and one or more additional data items within the look-ahead window from the storage system to the cache memory responsive to the requested data item and/or the additional data items within the look-ahead window not being in the cache memory.

Type: Grant

Filed: April 29, 2019

Date of Patent: December 22, 2020

Assignee: EMC IP Holding Company LLC

Inventors: Jonas F. Dias, Rômulo Teixeira de Abreu Pinho, Adriana Bechara Prado, Vinícius Michel Gottin, Tiago Salviano Calmon, Eduardo Vera Sousa, Owen Martin
Allocation of shared computing resources using a classifier chain

Patent number: 10862765

Abstract: Techniques are provided for allocation of shared computing resources using a classifier chain. An exemplary method comprises obtaining an application for execution in a shared computing environment having multiple resources with multiple combinations of one or more hardware types; obtaining discriminative features for the application; obtaining a trained machine learning classifier chain, wherein the trained machine learning classifier chain comprises multiple classifiers, wherein the multiple classifiers comprise a classifier for each combination of hardware types; and generating, using the at least one trained machine learning classifier chain, a prediction of the combination of hardware types needed to satisfy one or more service level agreement requirements for the application to be executed in the shared computing environment.

Type: Grant

Filed: July 31, 2018

Date of Patent: December 8, 2020

Assignee: EMC IP Holding Company LLC

Inventors: Jonas F. Dias, Adriana Bechara Prado
PROVENANCE-BASED REUSE OF SOFTWARE CODE

Publication number: 20200348929

Abstract: Techniques are provided for provenance-based software script reuse. One method comprises extracting provenance data from source code comprising source code fragments, wherein the extracted provenance data indicates a control flow and a data flow of the source code; encapsulating source code fragments from the source code that satisfy a similarity criteria as a reusable source code fragment; and providing a repository of encapsulated reusable source code fragments for reuse during a development of new software scripts. The repository of encapsulated reusable source code fragments optionally comprises a searchable database further comprising the provenance data, data annotations, input parameters and generated results for the corresponding source code fragment.

Type: Application

Filed: May 2, 2019

Publication date: November 5, 2020

Inventors: Vitor Sousa, Jonas F. Dias, Adriana Bechara Prado
SYSTEM AND METHOD FOR PREDICTION BASED CACHE MANAGEMENT

Publication number: 20200341899

Abstract: A data processing device includes persistent storage, a cache for the persistent storage, and a cache manager. The persistent storage is divided into logical units. The cache manager obtains persistent storage use data; selects model parameters for a cache prediction model based on the persistent storage use data; trains the cache prediction model based on the persistent storage use data using the selected model parameters to obtain a trained cache prediction model; and manages the cache based on logical units of the persistent storage using the trained cache prediction model.

Type: Application

Filed: April 26, 2019

Publication date: October 29, 2020

Inventors: Jonas Furtado Dias, Rômulo Teixeira de Abreu Pinho, Adriana Bechara Prado, Vinicius Michel Gottin, Tiago Salviano Calmon, Owen Martin
ADAPTIVE LOOK-AHEAD CONFIGURATION FOR PREFETCHING DATA IN INPUT/OUTPUT OPERATIONS BASED ON REQUEST SIZE AND FREQUENCY

Publication number: 20200341813

Abstract: Techniques are provided for adaptive look-ahead configuration for data prefetching based on request size and frequency. One method comprises performing the following steps: estimating an earning value for a particular portion based on an average size and frequency of past input/output requests for the particular portion; calculating a quota for the particular portion by normalizing the earning value for the particular portion of the storage system based on earning values of one or more additional portions of the storage system; obtaining a size of a look-ahead window for a new request based on the quota for the particular portion over a prefetch budget assigned to the storage system; and moving a requested data item and one or more additional data items within the look-ahead window from the storage system to the cache memory responsive to one the requested data item and/or the additional data items within the look-ahead window not being in the cache memory.

Type: Application

Filed: April 29, 2019

Publication date: October 29, 2020

Inventors: Jonas F. Dias, Rômulo Teixeira de Abreu Pinho, Adriana Bechara Prado, Vinícius Michel Gottin, Tiago Salviano Calmon, Eduardo Vera Sousa, Owen Martin
Allocation of Shared Computing Resources Using a Classifier Chain

Publication number: 20200044938

Abstract: Techniques are provided for allocation of shared computing resources using a classifier chain. An exemplary method comprises obtaining an application for execution in a shared computing environment having multiple resources with multiple combinations of one or more hardware types; obtaining discriminative features for the application; obtaining a trained machine learning classifier chain, wherein the trained machine learning classifier chain comprises multiple classifiers, wherein the multiple classifiers comprise a classifier for each combination of hardware types; and generating, using the at least one trained machine learning classifier chain, a prediction of the combination of hardware types needed to satisfy one or more service level agreement requirements for the application to be executed in the shared computing environment.

Type: Application

Filed: July 31, 2018

Publication date: February 6, 2020

Inventors: Jonas F. Dias, Adriana Bechara Prado
Allocation of Shared Computing Resources Using Source Code Feature Extraction and Clustering-Based Training of Machine Learning Models

Publication number: 20200026577

Abstract: Techniques are provided for allocation of shared computing resources using source code feature extraction and cluster-based training of machine learning models. An exemplary method comprises: obtaining a source code corpus with source code segments for execution in a shared computing environment; extracting discriminative features from the source code segments in the source code corpus; obtaining a trained machine learning model, wherein the trained machine learning model is trained using samples of source code segments from clusters derived from clustering the source code corpus based on (i) a term frequency metric, and/or (ii) observed values of execution metrics; and generating, using the trained model, a prediction of an allocation of one or more resources of the shared computing environment needed to satisfy service level agreement requirements for source code to be executed in the shared computing environment.

Type: Application

Filed: July 19, 2018

Publication date: January 23, 2020

Inventors: Jonas F. Dias, Adriana Bechara Prado, Tiago Salviano Calmon

prev 1 2 3 next