Patents by Inventor Adriana Bechara Prado
Adriana Bechara Prado has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210342712Abstract: Training examples are created from telemetry data, in which each training example engineered features derived from the telemetry data, storage system characteristics about the storage system that processed the workload associated with the telemetry data, and the response time of the storage system while processing the workload. The training examples are provided to an unsupervised learning process which assigns the training examples to clusters. Training examples of each cluster are used to train/test a separate supervised learning process for the cluster, to cause each supervised learning process to learn a regression between independent variables (system characteristics and workload features) and a dependent variable (storage system response time). To determine a response time of a proposed storage system, the proposed workload is used to select one of the clusters, and then the trained learning process for the selected cluster is used to determine the response time of the proposed storage system.Type: ApplicationFiled: May 4, 2020Publication date: November 4, 2021Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
-
Publication number: 20210334668Abstract: A global prediction manager for generating predictions using data from data zones includes storage for storing a model repository comprising a global model set and a prediction manager. The prediction manager obtains a local model set from a data zone of the data zones indicating that the global model set is unacceptable; makes a determination that the local model set is acceptable; in response to the determination: distributes the local model set to at least one second data zone of the data zones; obtains compressed telemetry data, that was compressed using the local model set, from the data zone and the at least one second data zone; and generates a global prediction regarding a future operating condition of the data zones using: the compressed local telemetry data and the local model set.Type: ApplicationFiled: April 27, 2020Publication date: October 28, 2021Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva, Tiago Salviano Calmon
-
Publication number: 20210334678Abstract: A deployment manager includes storage for storing a prediction model based on telemetry data from the deployments and a prediction manager. The prediction manager generates, using the prediction model and second telemetry data obtained from a deployment of the deployments: a prediction, and a prediction error estimate; in response to a determination that the prediction indicates a negative impact on the deployment: generates a confidence estimation for the prediction based on a variability of the second telemetry data from the telemetry data; in response to a second determination that the confidence estimation indicates that the prediction error estimate is inaccurate: remediates the prediction based on the variability to obtain an updated prediction; and performs an action set, based on the updated prediction, to reduce an impact of the negative impact on the deployment.Type: ApplicationFiled: April 27, 2020Publication date: October 28, 2021Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
-
Publication number: 20210334597Abstract: A prediction manager for providing responsiveness predictions for deployments includes persistent storage and a predictor. The persistent storage stores training data and conditioned training data. The predictor is programmed to obtain training data based on: a configuration of at least one deployment of the deployments, and a measured responsiveness of the at least one deployment, perform a peak extraction analysis on the measured responsiveness to obtain conditioned training data, obtain a prediction model using: the training data, and a first untrained prediction model, obtain a confidence prediction model using: the conditioned training data, and a second untrained prediction model, obtain a combined prediction using: the prediction model, and the confidence prediction model, and perform, based on the combined prediction, an action set to prevent a responsiveness failure.Type: ApplicationFiled: April 27, 2020Publication date: October 28, 2021Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
-
Patent number: 11120174Abstract: Methods and apparatus are provided for evaluating combinatorial processes using simulation techniques and multiple parallel statistical analyses of real-world data. A simulation model is generated that simulates one or more steps of a combinatorial process. The simulation model comprises key features of the combinatorial process. A plurality of first data mining tasks are performed in parallel over real data of the combinatorial process to obtain key feature prediction models that estimate the key features. The key feature prediction models are bound to the simulation model. Query types to be supported are identified and a plurality of simulation runs are generated in parallel, comprising simulated data for the supported query types. A plurality of second data mining tasks are performed in parallel over the plurality of simulation runs to build global prediction models to answer queries of each supported query type. An answer to a user query is determined using the global prediction models.Type: GrantFiled: March 20, 2015Date of Patent: September 14, 2021Assignee: EMC IP Holding Company LLCInventors: Angelo E. M. Ciarlini, Vinícius Michel Gottin, Rodrigo de Souza Lima Espinha, Adriana Bechara Prado, Rodrigo Dias Arruda Senra
-
Patent number: 11120031Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.Type: GrantFiled: October 31, 2019Date of Patent: September 14, 2021Assignee: EMC IP Holding Company LLCInventors: Adriana Bechara Prado, Vitor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
-
Patent number: 11113306Abstract: Person-centric multi-channel opinion mining is performed in a single data repository, such as a data lake. An exemplary method comprises obtaining multi-channel heterogeneous data from a plurality of channels; identifying entities that are targets of opinion information across the plurality of channels; extracting a plurality of user identities from the plurality of channels; aligning the plurality of extracted user identities across the plurality of channels to link common user identities; identifying the entities that are targets of the opinion information of the extracted user identities; linking opinion information of the extracted user identities with a user identity associated with an opinion holder that expressed the opinion information; determining whether the opinion information comprises a positive or negative opinion; and providing a summary of the opinion information of a given opinion holder.Type: GrantFiled: April 29, 2016Date of Patent: September 7, 2021Assignee: EMC IP Holding Company LLCInventors: Karin Breitman, Rodrigo Dias Arruda Senra, Adriana Bechara Prado
-
Publication number: 20210232968Abstract: An autoregressor that compresses input data for a specific purpose. Input data is compressed using a compression/decompression framework and by accounting for a purpose of a prediction model. The compression aspect of the framework is distributed and the decompression aspect of the framework may be centralized. The compression/decompression framework and a machine learning prediction model can be centrally trained. The compressor is distributed to nodes such that the input data can be compressed and transmitted to a central node. The model and the compression/decompression framework are continually trained on new data. This allows for lossy compression and higher compression rates while maintaining low prediction error rates.Type: ApplicationFiled: January 29, 2020Publication date: July 29, 2021Inventors: Paulo Abelha Ferreira, Pablo Nascimento da Silva, Adriana Bechara Prado
-
Publication number: 20210223963Abstract: A distribution of response times of a storage system can be estimated for a proposed workload using a trained learning process. Collections of information about operational characteristics of multiple storage systems are obtained, in which each collection includes parameters describing the configuration of the storage system that was used to create the collection, workload characteristics describing features of the workload that the storage system processed, and storage system response times. For each collection, workload characteristics are aggregated, and the storage system response information is used to train a probabilistic mixture model. The aggregated workload information, storage system characteristics, and probabilistic mixture model parameters of the collections form training examples that are used to train the learning process.Type: ApplicationFiled: January 20, 2020Publication date: July 22, 2021Inventors: Paulo Abelha Ferreira, Adriana Bechara Prado, Pablo Nascimento da Silva
-
Publication number: 20210223982Abstract: One or more aspects of the present disclosure relate to providing storage system configuration recommendations. System configurations of one or more storage devices can be determined based on their respective collected telemetry information. Performance of storage devices having different system configurations can be predicted based on one or more of: the collected telemetry information and each of the different system configurations. In response to receiving one or more requested performance characteristics and workload conditions, one or more recommended storage device configurations can be provided for each request based on the predicted performance characteristics, the requested performance characteristics, and the workload conditions.Type: ApplicationFiled: January 17, 2020Publication date: July 22, 2021Applicant: EMC IP Holding Company LLCInventors: Adriana Bechara Prado, Pablo Nascimento Da Silva, Paulo Abelha Ferreira
-
Patent number: 11030667Abstract: Product planning techniques are provided that recommend compositions of product features for weighted heterogeneous consumer segments using regression trees. An exemplary method comprises obtaining historical consumer data comprising product preferences for existing product items for multiple consumer segments; obtaining product features indicating characteristics for each existing product item; prioritizing the consumer segments by obtaining a weight indicating an interest in each consumer segment; computing a total performance metric, for each product item, by calculating a dot product between the consumer segment weights and respective preferences of the consumer segments regarding a given product item; obtaining a regression tree from the existing product items to predict the total performance metric in terms of corresponding product features; and selecting a combination of the product features to be used in future product items based on identified paths in the regression tree.Type: GrantFiled: October 31, 2016Date of Patent: June 8, 2021Assignee: EMC IP Holding Company LLCInventors: Adriana Bechara Prado, Victor Bursztyn, Jonas F. Dias, André de Almeida Maximo, Angelo E. M. Ciarlini
-
Publication number: 20210133189Abstract: Techniques are provided for data discovery and data integration in a data lake. One method comprises obtaining data files from a data lake, wherein each data file comprises multiple records having multiple fields; selecting multiple candidate fields from a data file based on a record type; determining a relevance score for each candidate field from the data file based on multiple features extracted from the data file; and clustering the scored candidate fields into clusters of similar domains using a hashing algorithm, wherein a given cluster comprises candidate fields, wherein multiple data files can be integrated based on a domain of the candidate fields in the given cluster. The relevance score for each candidate field is based on multiple features comprising, for example, features that take into account a morphological or semantic similarity between file name, file metadata and/or file records and features that consider statistics of candidate fields in a data file.Type: ApplicationFiled: October 31, 2019Publication date: May 6, 2021Inventors: Adriana Bechara Prado, Vítor Silva Sousa, Marcia Lucas Pesce, Paulo de Figueiredo Pires, Fábio André Machado Porto, Altobelli de Brito Mantuan, Rodolpho Rosa da Silva, Wagner dos Santos Vieira
-
Patent number: 10901973Abstract: Methods and apparatus are provided for integrating a plurality of different database types in a semantic multi-database data lake. An exemplary method comprises providing a plurality of databases having different database types; translating ontology definition language database commands obtained from a user into a plurality of data definition language and/or data manipulation language commands supported by the different database types in order to replicate data from the user to each of the different database types; obtaining a query specified in a query language of a given database; and delegating the query to the given database. A plurality of cluster gateways optionally manage a corresponding plurality of clusters of database instances and wherein queries are delegated to a given database instance by delegating the queries to the appropriate cluster gateway. Dark data that was not queried by any supported query language in a predefined period of time can be detected.Type: GrantFiled: April 29, 2016Date of Patent: January 26, 2021Assignee: EMC IP Holding Company LLCInventors: Rodrigo Dias Arruda Senra, Karin Breitman, Adriana Bechara Prado, Victor Bursztyn
-
Patent number: 10871902Abstract: Techniques are provided for adaptive look-ahead configuration for data prefetching based on request size and frequency. One method comprises performing the following steps: estimating an earning value for a particular portion based on an average size and frequency of past input/output requests for the particular portion; calculating a quota for the particular portion by normalizing the earning value for the particular portion of the storage system based on earning values of one or more additional portions of the storage system; obtaining a size of a look-ahead window for a new request based on the quota for the particular portion over a prefetch budget assigned to the storage system; and moving a requested data item and one or more additional data items within the look-ahead window from the storage system to the cache memory responsive to the requested data item and/or the additional data items within the look-ahead window not being in the cache memory.Type: GrantFiled: April 29, 2019Date of Patent: December 22, 2020Assignee: EMC IP Holding Company LLCInventors: Jonas F. Dias, Rômulo Teixeira de Abreu Pinho, Adriana Bechara Prado, Vinícius Michel Gottin, Tiago Salviano Calmon, Eduardo Vera Sousa, Owen Martin
-
Patent number: 10862765Abstract: Techniques are provided for allocation of shared computing resources using a classifier chain. An exemplary method comprises obtaining an application for execution in a shared computing environment having multiple resources with multiple combinations of one or more hardware types; obtaining discriminative features for the application; obtaining a trained machine learning classifier chain, wherein the trained machine learning classifier chain comprises multiple classifiers, wherein the multiple classifiers comprise a classifier for each combination of hardware types; and generating, using the at least one trained machine learning classifier chain, a prediction of the combination of hardware types needed to satisfy one or more service level agreement requirements for the application to be executed in the shared computing environment.Type: GrantFiled: July 31, 2018Date of Patent: December 8, 2020Assignee: EMC IP Holding Company LLCInventors: Jonas F. Dias, Adriana Bechara Prado
-
Publication number: 20200348929Abstract: Techniques are provided for provenance-based software script reuse. One method comprises extracting provenance data from source code comprising source code fragments, wherein the extracted provenance data indicates a control flow and a data flow of the source code; encapsulating source code fragments from the source code that satisfy a similarity criteria as a reusable source code fragment; and providing a repository of encapsulated reusable source code fragments for reuse during a development of new software scripts. The repository of encapsulated reusable source code fragments optionally comprises a searchable database further comprising the provenance data, data annotations, input parameters and generated results for the corresponding source code fragment.Type: ApplicationFiled: May 2, 2019Publication date: November 5, 2020Inventors: Vitor Sousa, Jonas F. Dias, Adriana Bechara Prado
-
Publication number: 20200341899Abstract: A data processing device includes persistent storage, a cache for the persistent storage, and a cache manager. The persistent storage is divided into logical units. The cache manager obtains persistent storage use data; selects model parameters for a cache prediction model based on the persistent storage use data; trains the cache prediction model based on the persistent storage use data using the selected model parameters to obtain a trained cache prediction model; and manages the cache based on logical units of the persistent storage using the trained cache prediction model.Type: ApplicationFiled: April 26, 2019Publication date: October 29, 2020Inventors: Jonas Furtado Dias, Rômulo Teixeira de Abreu Pinho, Adriana Bechara Prado, Vinicius Michel Gottin, Tiago Salviano Calmon, Owen Martin
-
Publication number: 20200341813Abstract: Techniques are provided for adaptive look-ahead configuration for data prefetching based on request size and frequency. One method comprises performing the following steps: estimating an earning value for a particular portion based on an average size and frequency of past input/output requests for the particular portion; calculating a quota for the particular portion by normalizing the earning value for the particular portion of the storage system based on earning values of one or more additional portions of the storage system; obtaining a size of a look-ahead window for a new request based on the quota for the particular portion over a prefetch budget assigned to the storage system; and moving a requested data item and one or more additional data items within the look-ahead window from the storage system to the cache memory responsive to one the requested data item and/or the additional data items within the look-ahead window not being in the cache memory.Type: ApplicationFiled: April 29, 2019Publication date: October 29, 2020Inventors: Jonas F. Dias, Rômulo Teixeira de Abreu Pinho, Adriana Bechara Prado, Vinícius Michel Gottin, Tiago Salviano Calmon, Eduardo Vera Sousa, Owen Martin
-
Publication number: 20200044938Abstract: Techniques are provided for allocation of shared computing resources using a classifier chain. An exemplary method comprises obtaining an application for execution in a shared computing environment having multiple resources with multiple combinations of one or more hardware types; obtaining discriminative features for the application; obtaining a trained machine learning classifier chain, wherein the trained machine learning classifier chain comprises multiple classifiers, wherein the multiple classifiers comprise a classifier for each combination of hardware types; and generating, using the at least one trained machine learning classifier chain, a prediction of the combination of hardware types needed to satisfy one or more service level agreement requirements for the application to be executed in the shared computing environment.Type: ApplicationFiled: July 31, 2018Publication date: February 6, 2020Inventors: Jonas F. Dias, Adriana Bechara Prado
-
Publication number: 20200026577Abstract: Techniques are provided for allocation of shared computing resources using source code feature extraction and cluster-based training of machine learning models. An exemplary method comprises: obtaining a source code corpus with source code segments for execution in a shared computing environment; extracting discriminative features from the source code segments in the source code corpus; obtaining a trained machine learning model, wherein the trained machine learning model is trained using samples of source code segments from clusters derived from clustering the source code corpus based on (i) a term frequency metric, and/or (ii) observed values of execution metrics; and generating, using the trained model, a prediction of an allocation of one or more resources of the shared computing environment needed to satisfy service level agreement requirements for source code to be executed in the shared computing environment.Type: ApplicationFiled: July 19, 2018Publication date: January 23, 2020Inventors: Jonas F. Dias, Adriana Bechara Prado, Tiago Salviano Calmon