Patents by Inventor Avrilia Floratou

Avrilia Floratou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240126521
    Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.
    Type: Application
    Filed: December 27, 2023
    Publication date: April 18, 2024
    Inventors: Avrilia FLORATOU, Andreas Christian MUELLER, Dalitso Hansini BANDA, Joyce Yu CAHOON, Anja GRUENHEID, Neha GODWAL
  • Patent number: 11900085
    Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.
    Type: Grant
    Filed: March 11, 2022
    Date of Patent: February 13, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Avrilia Floratou, Andreas Christian Mueller, Dalitso Hansini Banda, Joyce Yu Cahoon, Anja Gruenheid, Neha Godwal
  • Publication number: 20240037097
    Abstract: Systems, methods, and devices are described for performing scalable data processing operations. A queue that includes a translatable portion comprising indications of data processing operations translatable to data queries and a non-translatable portion comprising indications of non-translatable data processing operations is maintained. A determination that a first data processing operation of a first code block statement is translatable to a database query is made. An indication of the first data processing operation is included in the translatable portion of the queue. Responsive to a determination that a second data processing operation of a second code block statement is undeferrable, the translatable portion of the queue is compiled into a database query. An execution of the database query to be executed by a database engine to generate a query result is caused. A result dataset corresponding to the query result is transmitted to an application configured to analyze the result dataset.
    Type: Application
    Filed: October 13, 2023
    Publication date: February 1, 2024
    Inventors: Kameswara Venkatesh EMANI, Avrilia FLORATOU, Carlo Aldo CURINO, Karthik Saligrama RAMACHANDRA, Alekh JINDAL
  • Publication number: 20230394369
    Abstract: Embodiments described herein enable tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.
    Type: Application
    Filed: August 21, 2023
    Publication date: December 7, 2023
    Inventors: Avrilia FLORATOU, Ashvin AGRAWAL, MohammadHossein NAMAKI, Subramaniam Venkatraman KRISHNAN, Fotios PSALLIDAS, Yinghui WU
  • Publication number: 20230385649
    Abstract: Linguistic schema mapping via semi-supervised learning is used to map a customer schema to a particular industry-specific schema (ISS). The customer schema is received and a corresponding ISS is identified. An attribute in the customer schema is selected for labeling. Candidate pairs are generated that include the first attribute and one or more second attributes which may describe the first attribute. A featurizer determines similarities between the first attribute and second attribute in each generated pair, one or more suggested labels are generated by a machine learning (ML) model, and one of the suggested labels is applied to the first attribute.
    Type: Application
    Filed: May 28, 2022
    Publication date: November 30, 2023
    Inventors: Avrilia FLORATOU, Joyce Yu CAHOON, Subramaniam Venkatraman KRISHNAN, Andreas C. MUELLER, Dalitso Hansini BANDA, Fotis PSALLIDAS, Jignesh PATEL, Yunjia ZHANG
  • Patent number: 11829359
    Abstract: Systems, methods, and devices are described for performing scalable data processing operations. A queue that includes a translatable portion comprising indications of data processing operations translatable to data queries and a non-translatable portion comprising indications of non-translatable data processing operations is maintained. A determination that a first data processing operation of a first code block statement is translatable to a database query is made. An indication of the first data processing operation is included in the translatable portion of the queue. Responsive to a determination that a second data processing operation of a second code block statement is undeferrable, the translatable portion of the queue is compiled into a database query. An execution of the database query to be executed by a database engine to generate a query result is caused. A result dataset corresponding to the query result is transmitted to an application configured to analyze the result dataset.
    Type: Grant
    Filed: July 29, 2022
    Date of Patent: November 28, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Kameswara Venkatesh Emani, Avrilia Floratou, Carlo Aldo Curino, Karthik Saligrama Ramachandra, Alekh Jindal
  • Patent number: 11822454
    Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.
    Type: Grant
    Filed: August 25, 2022
    Date of Patent: November 21, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
  • Patent number: 11775862
    Abstract: A system enables tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.
    Type: Grant
    Filed: January 14, 2020
    Date of Patent: October 3, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Avrilia Floratou, Ashvin Agrawal, MohammadHossein Namaki, Subramaniam Venkatraman Krishnan, Fotios Psallidas, Yinghui Wu
  • Publication number: 20230289154
    Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.
    Type: Application
    Filed: March 11, 2022
    Publication date: September 14, 2023
    Inventors: Avrilia FLORATOU, Andreas Christian MUELLER, Dalitso Hansini BANDA, Joyce Yu CAHOON, Anja GRUENHEID, Neha GODWAL
  • Publication number: 20220405186
    Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.
    Type: Application
    Filed: August 25, 2022
    Publication date: December 22, 2022
    Inventors: Ashvin AGRAWAL, Avrilia FLORATOU, Ke WANG, Daniel E. MUSGRAVE
  • Patent number: 11474945
    Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.
    Type: Grant
    Filed: June 2, 2021
    Date of Patent: October 18, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
  • Patent number: 11461213
    Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: October 4, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
  • Publication number: 20210286728
    Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.
    Type: Application
    Filed: June 2, 2021
    Publication date: September 16, 2021
    Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
  • Patent number: 11093541
    Abstract: A computer-implemented method according to one embodiment includes receiving an ontology language query, receiving a mapping of an ontology to a relational database, and generating a structured query language (SQL) query, utilizing the ontology language query and the mapping of the ontology to the relational database.
    Type: Grant
    Filed: July 18, 2016
    Date of Patent: August 17, 2021
    Assignee: International Business Machines Corporation
    Inventors: Avrilia Floratou, Fatma Ozcan
  • Publication number: 20210216905
    Abstract: Embodiments described herein enable tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.
    Type: Application
    Filed: January 14, 2020
    Publication date: July 15, 2021
    Inventors: Avrilia Floratou, Ashvin Agrawal, MohammadHossein Namaki, Subramaniam Venkatraman Krishnan, Fotios Psallidas, Yinghui Wu
  • Patent number: 11055225
    Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.
    Type: Grant
    Filed: October 22, 2019
    Date of Patent: July 6, 2021
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
  • Publication number: 20210133075
    Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.
    Type: Application
    Filed: October 31, 2019
    Publication date: May 6, 2021
    Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
  • Publication number: 20210096996
    Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.
    Type: Application
    Filed: October 22, 2019
    Publication date: April 1, 2021
    Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
  • Patent number: 10642741
    Abstract: A computer-implemented method according to one embodiment includes receiving a request for data, locating the data at one or more partitions of a heterogeneously partitioned table, determining an access method associated with each of the one or more partitions, and requesting the data from the one or more partitions, utilizing the access method associated with each of the one or more partitions.
    Type: Grant
    Filed: February 6, 2017
    Date of Patent: May 5, 2020
    Assignee: International Business Machines Corporation
    Inventors: Avrilia Floratou, Fatma Ozcan, Mir H. Pirahesh, Navneet S. Potti
  • Patent number: 10067885
    Abstract: In one embodiment, a computer-implemented method includes inserting a set of accessed objects into a cache, where the set of accessed objects varies in size. An object includes a set of object components, and responsive to receiving a request to access the object, it is determined that the object does not fit into the cache given the set of accessed objects and a total size of the cache. A heuristic algorithm is applied, by a computer processor, to identify in the set of object components one or more object components for insertion into the cache. The heuristic algorithm considers at least a priority of the object compared to priorities of one or more objects in the set of accessed objects. The one or more object components are inserted into the cache.
    Type: Grant
    Filed: November 22, 2016
    Date of Patent: September 4, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Avrilia Floratou, Uday B. Kale, Nimrod Megiddo, Fatma Ozcan, Navneet S. Potti