Patents by Inventor Jingren Zhou

Jingren Zhou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20170351735
    Abstract: Runtime statistics from the actual performance of operations on a set of data are collected and utilized to dynamically modify the execution plan for processing a set of data. The operations performed are modified to include statistics collection operations, the statistics being tailored to the specific operations being quantified. Optimization policy defines how often optimization is attempted and how much more efficient an execution plan should be to justify transitioning from the current one. Optimization is based on the collected runtime statistics but also takes into account already materialized intermediate data to gain further optimization by avoiding reprocessing.
    Type: Application
    Filed: August 23, 2017
    Publication date: December 7, 2017
    Inventors: Nicolas Bruno, Jingren Zhou
  • Patent number: 9836701
    Abstract: A method for machine learning a data set in a data processing framework is disclosed. A forest is trained with the data set that generates a plurality of trees in parallel. Each tree includes leaf nodes having a constant weight. A discriminative value for each leaf node is learned with a supervised model. The forest is reconstructed with the discriminative values replacing the constant weight for each leaf node.
    Type: Grant
    Filed: August 13, 2014
    Date of Patent: December 5, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Weizhu Chen, Wei Lin, Jingren Zhou
  • Publication number: 20170339202
    Abstract: A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based query language. A script written in the extended query language can be translated into an optimal parallel continuous execution plan. Execution of the plan can be orchestrated by a streaming job manager which schedules vertices on available computing machines. The streaming job manager can monitor overall job execution. Fault tolerance can be provided by tracking execution progress and data dependencies in each vertex. In the event of a failure, another instance of the failed vertex can be scheduled. An optimal recovery point can be determined based on checkpoints and data dependencies.
    Type: Application
    Filed: April 7, 2017
    Publication date: November 23, 2017
    Inventors: Jingren Zhou, Zhengping Qian, Peter Zabback, Wei Lin
  • Patent number: 9792325
    Abstract: Runtime statistics from the actual performance of operations on a set of data are collected and utilized to dynamically modify the execution plan for processing a set of data. The operations performed are modified to include statistics collection operations, the statistics being tailored to the specific operations being quantified. Optimization policy defines how often optimization is attempted and how much more efficient an execution plan should be to justify transitioning from the current one. Optimization is based on the collected runtime statistics but also takes into account already materialized intermediate data to gain further optimization by avoiding reprocessing.
    Type: Grant
    Filed: August 25, 2013
    Date of Patent: October 17, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Nicolas Bruno, Jingren Zhou
  • Patent number: 9641580
    Abstract: A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based query language. A script written in the extended query language can be translated into an optimal parallel continuous execution plan. Execution of the plan can be orchestrated by a streaming job manager which schedules vertices on available computing machines. The streaming job manager can monitor overall job execution. Fault tolerance can be provided by tracking execution progress and data dependencies in each vertex. In the event of a failure, another instance of the failed vertex can be scheduled. An optimal recovery point can be determined based on checkpoints and data dependencies.
    Type: Grant
    Filed: July 1, 2014
    Date of Patent: May 2, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jingren Zhou, Zhengping Qian, Peter Zabback, Wei Lin
  • Publication number: 20170090958
    Abstract: Processing a job request for multiple versions of a distributed computing service. The service processing node does this by at least interleavingly (e.g., via time sharing with rapid context switching, or by actually concurrently) running a first runtime library associated with a first service version of the distributed computerized service and a second runtime library associated with a different service version of the distributed computerized service. While running the first runtime library, job requests of a first service version may be at least partially processed using a first set of one or more executables that interact with the first runtime library. While running the second runtime library, job requests of a second service version may be at least partially processed using a second set of one or more executables that interact with the second runtime library.
    Type: Application
    Filed: June 29, 2016
    Publication date: March 30, 2017
    Inventors: Zhicheng Yin, Xiaoyu Chen, Tao Guan, Paul Michael Brett, Nan Zhang, Jaliya N. Ekanayake, Eric Boutin, Anna Korsun, Jingren Zhou, Haibo Lin, Pavel N. Iakovenko
  • Publication number: 20170094020
    Abstract: Processing received job requests for a multi-versioned distributed computerized service. For each received job request, the job request is channeled to an appropriate service processing node that depends on the version of the distributed computing service that is to handle the job request. A version of the distributed computing service is assigned to the incoming job request. A service processing node that runs a runtime library for the assigned service version is then identified. The identified service processing node also has an appropriate set of one or more executables that allows the service processing node to plan an appropriate role (e.g., compiler, scheduler, worker) in the distributed computing service. The job request is then dispatched to the identified service processing node.
    Type: Application
    Filed: June 29, 2016
    Publication date: March 30, 2017
    Inventors: Zhicheng Yin, Xiaoyu Chen, Tao Guan, Paul Michael Brett, Nan Zhang, Jaliya N. Ekanayake, Eric Boutin, Anna Korsun, Jingren Zhou, Haibo Lin, Pavel N. Iakovenko
  • Patent number: 9442760
    Abstract: A job scheduler that schedules ready tasks amongst a cluster of servers. Each job might be managed by one scheduler. In that case, there are multiple job schedulers which conduct scheduling for different jobs concurrently. To identify a suitable server for a given task, the job scheduler uses expected server performance information received from multiple servers. For instance, the server performance information might include expected performance parameters for tasks of particular categories if assigned to the server. The job management component then identifies a particular task category for a given task, determines which of the servers can perform the task by a suitable estimated completion time, and then assigns based on the estimated completion time. The job management component also uses cluster-level information in order to determine which server to assign a task to.
    Type: Grant
    Filed: October 3, 2014
    Date of Patent: September 13, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eric Boutin, Jaliya Ekanayake, Wei Lin, Bin Shi, Jingren Zhou
  • Publication number: 20160098292
    Abstract: A job scheduler that schedules ready tasks amongst a cluster of servers. Each job might be managed by one scheduler. In that case, there are multiple job schedulers which conduct scheduling for different jobs concurrently. To identify a suitable server for a given task, the job scheduler uses expected server performance information received from multiple servers. For instance, the server performance information might include expected performance parameters for tasks of particular categories if assigned to the server. The job management component then identifies a particular task category for a given task, determines which of the servers can perform the task by a suitable estimated completion time, and then assigns based on the estimated completion time. The job management component also uses cluster-level information in order to determine which server to assign a task to.
    Type: Application
    Filed: October 3, 2014
    Publication date: April 7, 2016
    Inventors: Eric Boutin, Jaliya Ekanayake, Wei Lin, Bin Shi, Jingren Zhou
  • Publication number: 20160048771
    Abstract: A method for machine learning a data set in a data processing framework is disclosed. A forest is trained with the data set that generates a plurality of trees in parallel. Each tree includes leaf nodes having a constant weight. A discriminative value for each leaf node is learned with a supervised model. The forest is reconstructed with the discriminative values replacing the constant weight for each leaf node.
    Type: Application
    Filed: August 13, 2014
    Publication date: February 18, 2016
    Inventors: Weizhu Chen, Wei Lin, Jingren Zhou
  • Patent number: 9235446
    Abstract: The use of statistics collected during the parallel distributed execution of the tasks of a job may be used to optimize the performance of the task or similar recurring tasks. An execution plan for a job is initially generated, in which the execution plan includes tasks. Statistics regarding operations performed in the tasks are collected while the tasks are executed via parallel distributed execution. Another execution plan is then generated for another recurring job, in which the additional execution plan has at least one task in common with the execution plan for the job. The additional execution plan is subsequently optimized based at least on the statistics to produce an optimized execution plan.
    Type: Grant
    Filed: June 22, 2012
    Date of Patent: January 12, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Nicolas Bruno, Jingren Zhou, Srikanth Kandula, Sameer Agarwal, Ming-Chuan Wu
  • Publication number: 20160006779
    Abstract: A low-latency cloud-scale computation environment includes a query language, optimization, scheduling, fault tolerance and fault recovery. An event model can be used to extend a declarative query language so that temporal analysis of event of an event stream can be performed. Extractors and outputters can be used to define and implement functions that extend the capabilities of the event-based query language. A script written in the extended query language can be translated into an optimal parallel continuous execution plan. Execution of the plan can be orchestrated by a streaming job manager which schedules vertices on available computing machines. The streaming job manager can monitor overall job execution. Fault tolerance can be provided by tracking execution progress and data dependencies in each vertex. In the event of a failure, another instance of the failed vertex can be scheduled. An optimal recovery point can be determined based on checkpoints and data dependencies.
    Type: Application
    Filed: July 1, 2014
    Publication date: January 7, 2016
    Inventors: Jingren Zhou, Zhengping Qian, Peter Zabback, Wei Lin
  • Patent number: 8996464
    Abstract: A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined.
    Type: Grant
    Filed: June 11, 2012
    Date of Patent: March 31, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jingren Zhou, Nicolas Bruno, Wei Lin
  • Publication number: 20150058316
    Abstract: Runtime statistics from the actual performance of operations on a set of data are collected and utilized to dynamically modify the execution plan for processing a set of data. The operations performed are modified to include statistics collection operations, the statistics being tailored to the specific operations being quantified. Optimization policy defines how often optimization is attempted and how much more efficient an execution plan should be to justify transitioning from the current one. Optimization is based on the collected runtime statistics but also takes into account already materialized intermediate data to gain further optimization by avoiding reprocessing.
    Type: Application
    Filed: August 25, 2013
    Publication date: February 26, 2015
    Applicant: Microsoft Corporation
    Inventors: Nicolas Bruno, Jingren Zhou
  • Patent number: 8966486
    Abstract: A distributed job-processing environment including a server, or servers, capable of receiving and processing user-submitted job queries for data sets on backend storage servers. The server identifies computational tasks to be completed on the job as well as a time frame to complete some of the computational tasks. Computational tasks may include, without limitation, preprocessing, parsing, importing, verifying dependencies, retrieving relevant metadata, checking syntax and semantics, optimizing, compiling, and running. The server performs the computational tasks, and once the time frame expires, a message is transmitted to the user indicating which tasks have been completed. The rest of the computational tasks are subsequently performed, and eventually, job results are transmitted to the user.
    Type: Grant
    Filed: May 3, 2011
    Date of Patent: February 24, 2015
    Assignee: Microsoft Corporation
    Inventors: Thomas Phan, Jingren Zhou
  • Publication number: 20140297680
    Abstract: Embodiments of the present invention allow multiple data streams to be analyzed as a single data set. The single data set may be described as a stream set herein. The multiple streams that are included in the stream set may be specified through a user script or query. For example, a query may be used to gather all streams created within a date range. The query could include one or more filters to gather certain information from the data streams or to exclude certain data streams that otherwise are in the query's range. A stream may be an unstructured byte stream of data. The stream may be created by append-only writing to the end of the stream. The stream could also be a structured stream that includes metadata that defines column structure and affinity/clustering information.
    Type: Application
    Filed: June 11, 2013
    Publication date: October 2, 2014
    Inventors: EDWARD JOHN TRIOU, JR., FEI XU, HIREN PATEL, JINGREN ZHOU
  • Patent number: 8819017
    Abstract: Embodiments of the present invention relate to systems, methods, and computer-storage media for affinitizing datasets based on efficient query processing. In one embodiment, a plurality of datasets within a data stream is received. The data stream is partitioned based on efficient query processing. Once the data stream is partitioned, an affinity identifier is assigned to datasets based on the partitioning of the dataset. Further, when datasets are broken into extents, the affinity identifier of the parent dataset is retained in the resulting extent. The affinity identifier of each extent is then referenced to preferentially store extents having common affinity identifiers within close proximity of one other across a data center.
    Type: Grant
    Filed: October 15, 2010
    Date of Patent: August 26, 2014
    Assignee: Microsoft Corporation
    Inventors: Jingren Zhou, Patrick James Helland, Jonathan Forbes, Yaron Burd
  • Patent number: 8745037
    Abstract: An optimizer uses comprehensive reasoning regarding partitioning, sorting, and grouping properties for query optimization. When optimizing an input query expression, logical exploration generates alternative logical expressions. Physical optimization explores physical operator alternatives for logical operators. Required partitioning, sorting, and grouping properties of inputs to physical operators are determined. Additionally, delivered partitioning, sorting, and grouping properties of outputs from physical operators are determined. In some embodiments, enforcer rules are employed to modify structural property requirements to introduce alternatives for consideration. Property matching identifies valid execution plans in which the delivered partitioning, sorting, and grouping properties satisfy corresponding required partitioning, sorting, and grouping properties. An execution plan having the lowest cost is selected as the optimized execution plan.
    Type: Grant
    Filed: December 17, 2009
    Date of Patent: June 3, 2014
    Assignee: Microsoft Corporation
    Inventors: Jingren Zhou, Pre-Ake Larson, Ronnie Ira Chaiken
  • Publication number: 20130346988
    Abstract: The use of statistics collected during the parallel distributed execution of the tasks of a job may be used to optimize the performance of the task or similar recurring tasks. An execution plan for a job is initially generated, in which the execution plan includes tasks. Statistics regarding operations performed in the tasks are collected while the tasks are executed via parallel distributed execution. Another execution plan is then generated for another recurring job, in which the additional execution plan has at least one task in common with the execution plan for the job. The additional execution plan is subsequently optimized based at least on the statistics to produce an optimized execution plan.
    Type: Application
    Filed: June 22, 2012
    Publication date: December 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Nicolas Bruno, Jingren Zhou, Srikanth Kandula, Sameer Agarwal, Ming-Chuan Wu
  • Publication number: 20130332446
    Abstract: A repartitioning optimizer identifies alternative repartitioning strategies and selects optimal ones, accounting for network transfer utilization and partition sizes in addition to traditional metrics. If prior partitioning was hash-based, the repartitioning optimizer can determine whether a hash-based repartitioning can result in not every computing device providing data to every other computing device. If prior partitioning was range-based, the repartitioning optimizer can determine whether a range-based repartitioning can generate similarly sized output partitions while aligning input and output partition boundaries, increasing the number of computing devices that do not provide data to every other computing device. Individual computing devices, as they are performing a repartitioning, assign a repartitioning index to each individual data element, which represents the computing device to which such a data element is destined.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Jingren Zhou, Nicolas Bruno, Wei Lin