Patents by Inventor Avrilia Floratou

Avrilia Floratou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEM AND METHOD FOR SEMANTIC AWARE DATA SCIENCE

Publication number: 20240126521

Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.

Type: Application

Filed: December 27, 2023

Publication date: April 18, 2024

Inventors: Avrilia FLORATOU, Andreas Christian MUELLER, Dalitso Hansini BANDA, Joyce Yu CAHOON, Anja GRUENHEID, Neha GODWAL
System and method for semantic aware data science

Patent number: 11900085

Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.

Type: Grant

Filed: March 11, 2022

Date of Patent: February 13, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Avrilia Floratou, Andreas Christian Mueller, Dalitso Hansini Banda, Joyce Yu Cahoon, Anja Gruenheid, Neha Godwal
SYSTEM AND METHOD FOR SCALABLE DATA PROCESSING OPERATIONS

Publication number: 20240037097

Abstract: Systems, methods, and devices are described for performing scalable data processing operations. A queue that includes a translatable portion comprising indications of data processing operations translatable to data queries and a non-translatable portion comprising indications of non-translatable data processing operations is maintained. A determination that a first data processing operation of a first code block statement is translatable to a database query is made. An indication of the first data processing operation is included in the translatable portion of the queue. Responsive to a determination that a second data processing operation of a second code block statement is undeferrable, the translatable portion of the queue is compiled into a database query. An execution of the database query to be executed by a database engine to generate a query result is caused. A result dataset corresponding to the query result is transmitted to an application configured to analyze the result dataset.

Type: Application

Filed: October 13, 2023

Publication date: February 1, 2024

Inventors: Kameswara Venkatesh EMANI, Avrilia FLORATOU, Carlo Aldo CURINO, Karthik Saligrama RAMACHANDRA, Alekh JINDAL
TRACKING PROVENANCE IN DATA SCIENCE SCRIPTS

Publication number: 20230394369

Abstract: Embodiments described herein enable tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.

Type: Application

Filed: August 21, 2023

Publication date: December 7, 2023

Inventors: Avrilia FLORATOU, Ashvin AGRAWAL, MohammadHossein NAMAKI, Subramaniam Venkatraman KRISHNAN, Fotios PSALLIDAS, Yinghui WU
LINGUISTIC SCHEMA MAPPING VIA SEMI-SUPERVISED LEARNING

Publication number: 20230385649

Abstract: Linguistic schema mapping via semi-supervised learning is used to map a customer schema to a particular industry-specific schema (ISS). The customer schema is received and a corresponding ISS is identified. An attribute in the customer schema is selected for labeling. Candidate pairs are generated that include the first attribute and one or more second attributes which may describe the first attribute. A featurizer determines similarities between the first attribute and second attribute in each generated pair, one or more suggested labels are generated by a machine learning (ML) model, and one of the suggested labels is applied to the first attribute.

Type: Application

Filed: May 28, 2022

Publication date: November 30, 2023

Inventors: Avrilia FLORATOU, Joyce Yu CAHOON, Subramaniam Venkatraman KRISHNAN, Andreas C. MUELLER, Dalitso Hansini BANDA, Fotis PSALLIDAS, Jignesh PATEL, Yunjia ZHANG
System and method for scalable data processing operations

Patent number: 11829359

Abstract: Systems, methods, and devices are described for performing scalable data processing operations. A queue that includes a translatable portion comprising indications of data processing operations translatable to data queries and a non-translatable portion comprising indications of non-translatable data processing operations is maintained. A determination that a first data processing operation of a first code block statement is translatable to a database query is made. An indication of the first data processing operation is included in the translatable portion of the queue. Responsive to a determination that a second data processing operation of a second code block statement is undeferrable, the translatable portion of the queue is compiled into a database query. An execution of the database query to be executed by a database engine to generate a query result is caused. A result dataset corresponding to the query result is transmitted to an application configured to analyze the result dataset.

Type: Grant

Filed: July 29, 2022

Date of Patent: November 28, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Kameswara Venkatesh Emani, Avrilia Floratou, Carlo Aldo Curino, Karthik Saligrama Ramachandra, Alekh Jindal
Mitigating slow instances in large-scale streaming pipelines

Patent number: 11822454

Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.

Type: Grant

Filed: August 25, 2022

Date of Patent: November 21, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
Tracking provenance in data science scripts

Patent number: 11775862

Abstract: A system enables tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.

Type: Grant

Filed: January 14, 2020

Date of Patent: October 3, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Avrilia Floratou, Ashvin Agrawal, MohammadHossein Namaki, Subramaniam Venkatraman Krishnan, Fotios Psallidas, Yinghui Wu
SYSTEM AND METHOD FOR SEMANTIC AWARE DATA SCIENCE

Publication number: 20230289154

Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.

Type: Application

Filed: March 11, 2022

Publication date: September 14, 2023

Inventors: Avrilia FLORATOU, Andreas Christian MUELLER, Dalitso Hansini BANDA, Joyce Yu CAHOON, Anja GRUENHEID, Neha GODWAL
MITIGATING SLOW INSTANCES IN LARGE-SCALE STREAMING PIPELINES

Publication number: 20220405186

Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.

Type: Application

Filed: August 25, 2022

Publication date: December 22, 2022

Inventors: Ashvin AGRAWAL, Avrilia FLORATOU, Ke WANG, Daniel E. MUSGRAVE
Cache and I/O management for analytics over disaggregated stores

Patent number: 11474945

Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.

Type: Grant

Filed: June 2, 2021

Date of Patent: October 18, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
Mitigating slow instances in large-scale streaming pipelines

Patent number: 11461213

Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.

Type: Grant

Filed: October 31, 2019

Date of Patent: October 4, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
CACHE AND I/O MANAGEMENT FOR ANALYTICS OVER DISAGGREGATED STORES

Publication number: 20210286728

Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.

Type: Application

Filed: June 2, 2021

Publication date: September 16, 2021

Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
Transforming an ontology query to an SQL query

Patent number: 11093541

Abstract: A computer-implemented method according to one embodiment includes receiving an ontology language query, receiving a mapping of an ontology to a relational database, and generating a structured query language (SQL) query, utilizing the ontology language query and the mapping of the ontology to the relational database.

Type: Grant

Filed: July 18, 2016

Date of Patent: August 17, 2021

Assignee: International Business Machines Corporation

Inventors: Avrilia Floratou, Fatma Ozcan
TRACKING PROVENANCE IN DATA SCIENCE SCRIPTS

Publication number: 20210216905

Abstract: Embodiments described herein enable tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.

Type: Application

Filed: January 14, 2020

Publication date: July 15, 2021

Inventors: Avrilia Floratou, Ashvin Agrawal, MohammadHossein Namaki, Subramaniam Venkatraman Krishnan, Fotios Psallidas, Yinghui Wu
Cache and I/O management for analytics over disaggregated stores

Patent number: 11055225

Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.

Type: Grant

Filed: October 22, 2019

Date of Patent: July 6, 2021

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
MITIGATING SLOW INSTANCES IN LARGE-SCALE STREAMING PIPELINES

Publication number: 20210133075

Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.

Type: Application

Filed: October 31, 2019

Publication date: May 6, 2021

Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
CACHE AND I/O MANAGEMENT FOR ANALYTICS OVER DISAGGREGATED STORES

Publication number: 20210096996

Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.

Type: Application

Filed: October 22, 2019

Publication date: April 1, 2021

Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
Accessing tables with heterogeneous partitions

Patent number: 10642741

Abstract: A computer-implemented method according to one embodiment includes receiving a request for data, locating the data at one or more partitions of a heterogeneously partitioned table, determining an access method associated with each of the one or more partitions, and requesting the data from the one or more partitions, utilizing the access method associated with each of the one or more partitions.

Type: Grant

Filed: February 6, 2017

Date of Patent: May 5, 2020

Assignee: International Business Machines Corporation

Inventors: Avrilia Floratou, Fatma Ozcan, Mir H. Pirahesh, Navneet S. Potti
Caching policies for selection and replacement of objects

Patent number: 10067885

Abstract: In one embodiment, a computer-implemented method includes inserting a set of accessed objects into a cache, where the set of accessed objects varies in size. An object includes a set of object components, and responsive to receiving a request to access the object, it is determined that the object does not fit into the cache given the set of accessed objects and a total size of the cache. A heuristic algorithm is applied, by a computer processor, to identify in the set of object components one or more object components for insertion into the cache. The heuristic algorithm considers at least a priority of the object compared to priorities of one or more objects in the set of accessed objects. The one or more object components are inserted into the cache.

Type: Grant

Filed: November 22, 2016

Date of Patent: September 4, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Avrilia Floratou, Uday B. Kale, Nimrod Megiddo, Fatma Ozcan, Navneet S. Potti

1 2 next