Patents by Inventor Avrilia Floratou
Avrilia Floratou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240126521Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.Type: ApplicationFiled: December 27, 2023Publication date: April 18, 2024Inventors: Avrilia FLORATOU, Andreas Christian MUELLER, Dalitso Hansini BANDA, Joyce Yu CAHOON, Anja GRUENHEID, Neha GODWAL
-
Patent number: 11900085Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.Type: GrantFiled: March 11, 2022Date of Patent: February 13, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Avrilia Floratou, Andreas Christian Mueller, Dalitso Hansini Banda, Joyce Yu Cahoon, Anja Gruenheid, Neha Godwal
-
Publication number: 20240037097Abstract: Systems, methods, and devices are described for performing scalable data processing operations. A queue that includes a translatable portion comprising indications of data processing operations translatable to data queries and a non-translatable portion comprising indications of non-translatable data processing operations is maintained. A determination that a first data processing operation of a first code block statement is translatable to a database query is made. An indication of the first data processing operation is included in the translatable portion of the queue. Responsive to a determination that a second data processing operation of a second code block statement is undeferrable, the translatable portion of the queue is compiled into a database query. An execution of the database query to be executed by a database engine to generate a query result is caused. A result dataset corresponding to the query result is transmitted to an application configured to analyze the result dataset.Type: ApplicationFiled: October 13, 2023Publication date: February 1, 2024Inventors: Kameswara Venkatesh EMANI, Avrilia FLORATOU, Carlo Aldo CURINO, Karthik Saligrama RAMACHANDRA, Alekh JINDAL
-
Publication number: 20230394369Abstract: Embodiments described herein enable tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.Type: ApplicationFiled: August 21, 2023Publication date: December 7, 2023Inventors: Avrilia FLORATOU, Ashvin AGRAWAL, MohammadHossein NAMAKI, Subramaniam Venkatraman KRISHNAN, Fotios PSALLIDAS, Yinghui WU
-
Publication number: 20230385649Abstract: Linguistic schema mapping via semi-supervised learning is used to map a customer schema to a particular industry-specific schema (ISS). The customer schema is received and a corresponding ISS is identified. An attribute in the customer schema is selected for labeling. Candidate pairs are generated that include the first attribute and one or more second attributes which may describe the first attribute. A featurizer determines similarities between the first attribute and second attribute in each generated pair, one or more suggested labels are generated by a machine learning (ML) model, and one of the suggested labels is applied to the first attribute.Type: ApplicationFiled: May 28, 2022Publication date: November 30, 2023Inventors: Avrilia FLORATOU, Joyce Yu CAHOON, Subramaniam Venkatraman KRISHNAN, Andreas C. MUELLER, Dalitso Hansini BANDA, Fotis PSALLIDAS, Jignesh PATEL, Yunjia ZHANG
-
Patent number: 11829359Abstract: Systems, methods, and devices are described for performing scalable data processing operations. A queue that includes a translatable portion comprising indications of data processing operations translatable to data queries and a non-translatable portion comprising indications of non-translatable data processing operations is maintained. A determination that a first data processing operation of a first code block statement is translatable to a database query is made. An indication of the first data processing operation is included in the translatable portion of the queue. Responsive to a determination that a second data processing operation of a second code block statement is undeferrable, the translatable portion of the queue is compiled into a database query. An execution of the database query to be executed by a database engine to generate a query result is caused. A result dataset corresponding to the query result is transmitted to an application configured to analyze the result dataset.Type: GrantFiled: July 29, 2022Date of Patent: November 28, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Kameswara Venkatesh Emani, Avrilia Floratou, Carlo Aldo Curino, Karthik Saligrama Ramachandra, Alekh Jindal
-
Patent number: 11822454Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.Type: GrantFiled: August 25, 2022Date of Patent: November 21, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
-
Patent number: 11775862Abstract: A system enables tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.Type: GrantFiled: January 14, 2020Date of Patent: October 3, 2023Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Avrilia Floratou, Ashvin Agrawal, MohammadHossein Namaki, Subramaniam Venkatraman Krishnan, Fotios Psallidas, Yinghui Wu
-
Publication number: 20230289154Abstract: Systems, methods, and devices are described for enabling a user to import a library into a computer program under development. The library includes a data storage interface, one or more semantic objects, and one or more data manipulation or data analysis operations. A user is able to reference code of the library within the computer program under development to generate a dataset from data obtained via the data storage interface and associate the one or more semantic objects with the dataset to generate a semantically-annotated dataset. Systems, methods, and devices enable, based on the importing: the user to invoke a semantic-guided operation of the library that utilizes the semantically-annotated dataset to infer an aspect of a data manipulation or data analysis operation to be performed on the semantically-annotated dataset; or the suggestion of a data manipulation or data analysis operation to the user based on the semantically-annotated dataset.Type: ApplicationFiled: March 11, 2022Publication date: September 14, 2023Inventors: Avrilia FLORATOU, Andreas Christian MUELLER, Dalitso Hansini BANDA, Joyce Yu CAHOON, Anja GRUENHEID, Neha GODWAL
-
Publication number: 20220405186Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.Type: ApplicationFiled: August 25, 2022Publication date: December 22, 2022Inventors: Ashvin AGRAWAL, Avrilia FLORATOU, Ke WANG, Daniel E. MUSGRAVE
-
Patent number: 11474945Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.Type: GrantFiled: June 2, 2021Date of Patent: October 18, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
-
Patent number: 11461213Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.Type: GrantFiled: October 31, 2019Date of Patent: October 4, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
-
Publication number: 20210286728Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.Type: ApplicationFiled: June 2, 2021Publication date: September 16, 2021Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
-
Patent number: 11093541Abstract: A computer-implemented method according to one embodiment includes receiving an ontology language query, receiving a mapping of an ontology to a relational database, and generating a structured query language (SQL) query, utilizing the ontology language query and the mapping of the ontology to the relational database.Type: GrantFiled: July 18, 2016Date of Patent: August 17, 2021Assignee: International Business Machines CorporationInventors: Avrilia Floratou, Fatma Ozcan
-
Publication number: 20210216905Abstract: Embodiments described herein enable tracking machine learning (“ML”) model data provenance. In particular, a computing device is configured to accept ML model code that, when executed, instantiates and trains an ML model, to parse the ML model code into a workflow intermediate representation (WIR), to semantically annotate the WIR to provide an annotated WIR, and to identify, based on the annotated WIR and ML API corresponding to the ML model code, data from at least one data source that is relied upon by the ML model code when training the ML model. A WIR may be generated from an abstract syntax tree (AST) based on the ML model code, generating provenance relationships (PRs) based at least in part on relationships between nodes of the AST, wherein a PR comprises one or more input variables, an operation, a caller, and one or more output variables.Type: ApplicationFiled: January 14, 2020Publication date: July 15, 2021Inventors: Avrilia Floratou, Ashvin Agrawal, MohammadHossein Namaki, Subramaniam Venkatraman Krishnan, Fotios Psallidas, Yinghui Wu
-
Patent number: 11055225Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.Type: GrantFiled: October 22, 2019Date of Patent: July 6, 2021Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
-
Publication number: 20210133075Abstract: A system is described herein for mitigating slow process instances in a streaming application. The system includes a slow process instance candidate identifier configured to identify, based on a relative watermark latency, a set of slow process instance candidates from among a plurality of process instances that comprise the streaming application. The system further includes a set of filters configured to remove false positives from the set of slow process instance candidates. The filters account for window operations performed by the process instances as well as stabilization time needed for downstream process instances to stabilize after a slow upstream process instance is mitigated by a mitigation implementer, which may also be included in the system.Type: ApplicationFiled: October 31, 2019Publication date: May 6, 2021Inventors: Ashvin Agrawal, Avrilia Floratou, Ke Wang, Daniel E. Musgrave
-
Publication number: 20210096996Abstract: Methods, systems, apparatuses, and computer program products are provided for prefetching data. A workload analyzer may identify job characteristics for a plurality of previously executed jobs in a workload executing on a cluster of one or more compute resources. For each job, identified job characteristics may include identification of an input dataset and an input bandwidth characteristic for the input dataset. A future workload predictor may identify future jobs expected to execute on the cluster based at least on the identified job characteristics. A cache assignment determiner may determine a cache assignment that identifies a prefetch dataset for at least one of the future jobs. A network bandwidth allocator may determine a network bandwidth assignment for the prefetch dataset. A plan instructor may instruct a compute resource of the cluster to load data to a cache local to the cluster according to the cache assignment and the network bandwidth assignment.Type: ApplicationFiled: October 22, 2019Publication date: April 1, 2021Inventors: Virajith Jalaparti, Sriram S. Rao, Christopher W. Douglas, Ashvin Agrawal, Avrilia Floratou, Ishai Menache, Srikanth Kandula, Mainak Ghosh, Joseph Naor
-
Patent number: 10642741Abstract: A computer-implemented method according to one embodiment includes receiving a request for data, locating the data at one or more partitions of a heterogeneously partitioned table, determining an access method associated with each of the one or more partitions, and requesting the data from the one or more partitions, utilizing the access method associated with each of the one or more partitions.Type: GrantFiled: February 6, 2017Date of Patent: May 5, 2020Assignee: International Business Machines CorporationInventors: Avrilia Floratou, Fatma Ozcan, Mir H. Pirahesh, Navneet S. Potti
-
Patent number: 10067885Abstract: In one embodiment, a computer-implemented method includes inserting a set of accessed objects into a cache, where the set of accessed objects varies in size. An object includes a set of object components, and responsive to receiving a request to access the object, it is determined that the object does not fit into the cache given the set of accessed objects and a total size of the cache. A heuristic algorithm is applied, by a computer processor, to identify in the set of object components one or more object components for insertion into the cache. The heuristic algorithm considers at least a priority of the object compared to priorities of one or more objects in the set of accessed objects. The one or more object components are inserted into the cache.Type: GrantFiled: November 22, 2016Date of Patent: September 4, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Avrilia Floratou, Uday B. Kale, Nimrod Megiddo, Fatma Ozcan, Navneet S. Potti