Patents by Inventor Gaurav Jagtiani
Gaurav Jagtiani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12277040Abstract: In-place recovery of fatal system errors at virtualization hosts. A device identifies an occurrence of a fatal system error in the first instance of a host operating system (OS) executing in a computer system. The device determines to perform an in-place recovery for the fatal system error. The device performs the in-place recovery, including pausing the execution of a virtual machine (VM) by the first instance of the host OS, preserving a state of the VM within system memory of the computer system, and resuming the execution of the VM by a second instance of the host OS executing in the computer system based on the state of the VM that is preserved within the system memory of the computer system.Type: GrantFiled: June 7, 2023Date of Patent: April 15, 2025Assignee: Microsoft Technology Licensing, LLCInventors: Binit Ranjan Mishra, Mukhtar Ahmed, Christina Marianne Curlette, Steven Adrian West, Gaurav Jagtiani, Naga Kiran Govindaraju, James George Cavalaris, Drew Douglas Cross, Jason Stewart Wohlgemuth, James Anthony Schwartz, Jr., Jennifer Marie Bourlier, Sri Harsha Kanukuntla, Emma Sutherland Boyd, Scott Chao-Chueh Lee, Vijaybalaji Madhanagopal, Terence Kwok Tak Chan, Yuri Dotsenko, Peter Hanpeng Jiang, Aacer Hatem Daken, Emily Nicole Wilson, Emily Cara Clemens, Cody Dean Hartwig, Raz Meir Aloni, Sharon Scarlet Tang, Minsang Kim, Shen Wang
-
Publication number: 20250004882Abstract: A computer system identifies an event from a management system log associated with a first container host. The presence of the event in the management system log is indicative that the first container host identified a fatal system error at the first container host. Based on the event, the computer system determines that a first instance of a container that is provisioned at the first container host has been isolated to the first container host. Based on the first instance of the container having been isolated to the first container host, the computer system instructs a second container host to provision a second instance of the container at the second container host.Type: ApplicationFiled: June 28, 2023Publication date: January 2, 2025Inventors: Shekhar AGRAWAL, Abhay Sudhir KETKAR, Gaurav JAGTIANI, Binit Ranjan MISHRA, Emma Sutherland BOYD, Scott Chao-Chueh LEE, James Anthony SCHWARTZ, JR., Hari R. PULAPAKA, Karan MEHRA, Shailesh Padmakar JOSHI, Jason Stewart WOHLGEMUTH, David WIMMEL
-
Publication number: 20240338282Abstract: In-place recovery of fatal system errors at virtualization hosts. A device identifies an occurrence of a fatal system error in the first instance of a host operating system (OS) executing in a computer system. The device determines to perform an in-place recovery for the fatal system error. The device performs the in-place recovery, including pausing the execution of a virtual machine (VM) by the first instance of the host OS, preserving a state of the VM within system memory of the computer system, and resuming the execution of the VM by a second instance of the host OS executing in the computer system based on the state of the VM that is preserved within the system memory of the computer system.Type: ApplicationFiled: June 7, 2023Publication date: October 10, 2024Inventors: Binit Ranjan MISHRA, Mukhtar AHMED, Christina Marianne CURLETTE, Steven Adrian WEST, Gaurav JAGTIANI, Naga Kiran GOVINDARAJU, James George CAVALARIS, Drew Douglas CROSS, Jason Stewart WOHLGEMUTH, James Anthony SCHWARTZ, JR., Jennifer Marie BOURLIER, Sri Harsha KANUKUNTLA, Emma Sutherland BOYD, Scott Chao-Chueh LEE, Vijaybalaji MADHANAGOPAL, Terence Kwok Tak CHAN, Yuri DOTSENKO, Peter Hanpeng JIANG, Aacer Hatem DAKEN, Emily Nicole WILSON, Emily Cara CLEMENS, Cody Dean HARTWIG, Raz Meir ALONI, Sharon Scarlet TANG, Minsang KIM, Shen WANG
-
Patent number: 12028223Abstract: A computer implemented method includes receiving telemetry data corresponding to capacity health of nodes in a cloud based computing system. The received telemetry data is processed via a prediction engine to provide predictions of capacity health at multiple dimensions of the cloud based computing system. Node recoverability information is received and node recovery execution is initiated as a function of the representations of capacity health and node recoverability information.Type: GrantFiled: June 6, 2022Date of Patent: July 2, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Shandan Zhou, Sam Prakash Bheri, Karthikeyan Subramanian, Yancheng Chen, Gaurav Jagtiani, Abhay Sudhir Ketkar, Hemant Malik, Thomas Moscibroda, Shweta Balkrishna Patil, Luke Rafael Rodriguez, Dalianna Victoria Vaysman
-
Publication number: 20240201767Abstract: The present disclosure relates to utilizing a host failure recovery system to efficiently and accurately determine the health of host devices. For example, the host failure recovery system detects when a host server is failing by utilizing a power failure detection model that determines whether a host server is operating in a healthy power state or an unhealthy power state. In particular, the host failure recovery system utilizes a multi-layer power failure detection model that determines power-draw failure events on a host device. The failure detection model determines, with high confidence, the health of a host device based on power-draw signals and/or usage characteristics of the host device. Additionally, the host failure recovery system can initiate a quick recovery of a failing host device.Type: ApplicationFiled: December 20, 2022Publication date: June 20, 2024Inventors: Emma Sutherland BOYD, Shekhar AGRAWAL, Amruta Bhalchandra PATHAK, Yu YAO, Aravind Narayanan KRISHNAMOORTHY, Derek James BOYER, Binit Ranjan MISHRA, Gaurav JAGTIANI, Abhay Sudhir KETKAR, Tri Minh TRAN
-
Publication number: 20230396511Abstract: A computer implemented method includes receiving telemetry data corresponding to capacity health of nodes in a cloud based computing system. The received telemetry data is processed via a prediction engine to provide predictions of capacity health at multiple dimensions of the cloud based computing system. Node recoverability information is received and node recovery execution is initiated as a function of the representations of capacity health and node recoverability information.Type: ApplicationFiled: June 6, 2022Publication date: December 7, 2023Inventors: Shandan ZHOU, Sam Prakash BHERI, Karthikeyan SUBRAMANIAN, Yancheng CHEN, Gaurav JAGTIANI, Abhay Sudhir KETKAR, Hemant MALIK, Thomas MOSCIBRODA, Shweta Balkrishna PATIL, Luke Rafael RODRIGUEZ, Dalianna Victoria VAYSMAN
-
Patent number: 10810096Abstract: Various techniques for deferred server recovery are disclosed herein. In one embodiment, a method includes receiving a notification of a fault from a host in the computing system. The host is performing one or more computing tasks for one or more users. The method can then include determining whether recovery of the fault in the received notification is deferrable on the host. In response to determining that the fault in the received notification is deferrable, the method includes setting a time delay to perform a pending recovery operation on the host at a later time and disallowing additional assignment of computing tasks to the host.Type: GrantFiled: May 21, 2018Date of Patent: October 20, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Nic Allen, Gaurav Jagtiani
-
Publication number: 20200150972Abstract: A method for opportunistically performing an action in a cloud computing system may include detecting a reboot event corresponding to a computing entity in the cloud computing system. The computing entity may be, for example, a host machine in the cloud computing system or a virtual machine in the cloud computing system. The method may also include causing the computing entity to be held in a stopped state and performing the action while the computing entity is being held in the stopped state, thereby eliminating a need to perform the action at a future time subsequent to the reboot event. The nature of the action is such that it would affect the computing entity if the action were performed subsequent to the reboot event. The method may also include causing the computing entity to be started after the action has been performed.Type: ApplicationFiled: November 9, 2018Publication date: May 14, 2020Inventors: Abhay Sudhir KETKAR, Gaurav JAGTIANI, Ajay MANI, Richard Thomas RUSSO, Shweta Balkrishna PATIL, James Cameron WHITE
-
Patent number: 10547516Abstract: Methods, systems, and computer program products are described herein for minimizing the downtime for nodes in a network-accessible server set. The downtime may be minimized by determining an optimal timeout value for which a fabric controller waits to perform a recovery action. The optimal timeout value may be determined for each cluster in the network-accessible server set. The optimal timeout value advantageously reduces the overall downtime for customer workloads running on a node for which contact has been lost. The optimal timeout value for each cluster may be based on a predictive model based on the observed historical patterns of the nodes within that cluster. In the event that an optimal timeout value is not determined for a particular cluster (e.g., due to a lack of observed historical patterns), the fabric controller may fall back to a less than optimal timeout value.Type: GrantFiled: June 30, 2017Date of Patent: January 28, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Sathyanarayana Singh, Gaurav Jagtiani, Rohit Pandey, Durmus Ugur Karatay, Gil Lapid Shafriri
-
Patent number: 10496503Abstract: Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources.Type: GrantFiled: November 13, 2017Date of Patent: December 3, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Gaurav Jagtiani, Abhishek Singh, Ajay Mani, Akram Hassan, Thiruvengadam Venketesan, Saad Syed, Sushant Pramod Rewaskar, Wei Zhao
-
Publication number: 20190007278Abstract: Methods, systems, and computer program products are described herein for minimizing the downtime for nodes in a network-accessible server set. The downtime may be minimized by determining an optimal timeout value for which a fabric controller waits to perform a recovery action. The optimal timeout value may be determined for each cluster in the network-accessible server set. The optimal timeout value advantageously reduces the overall downtime for customer workloads running on a node for which contact has been lost. The optimal timeout value for each cluster may be based on a predictive model based on the observed historical patterns of the nodes within that cluster. In the event that an optimal timeout value is not determined for a particular cluster (e.g., due to a lack of observed historical patterns), the fabric controller may fall back to a less than optimal timeout value.Type: ApplicationFiled: June 30, 2017Publication date: January 3, 2019Inventors: Sathyanarayana SINGH, Gaurav JAGTIANI, Rohit PANDEY, Durmus Ugur KARATAY, Gil Lapid SHAFRIRI
-
Publication number: 20180267872Abstract: Various techniques for deferred server recovery are disclosed herein. In one embodiment, a method includes receiving a notification of a fault from a host in the computing system. The host is performing one or more computing tasks for one or more users. The method can then include determining whether recovery of the fault in the received notification is deferrable on the host. In response to determining that the fault in the received notification is deferrable, the method includes setting a time delay to perform a pending recovery operation on the host at a later time and disallowing additional assignment of computing tasks to the host.Type: ApplicationFiled: May 21, 2018Publication date: September 20, 2018Inventors: Nic Allen, Gaurav Jagtiani
-
Patent number: 10007586Abstract: Various techniques for deferred server recovery are disclosed herein. In one embodiment, a method includes receiving a notification of a fault from a host in the computing system. The host is performing one or more computing tasks for one or more users. The method can then include determining whether recovery of the fault in the received notification is deferrable on the host. In response to determining that the fault in the received notification is deferrable, the method includes setting a time delay to perform a pending recovery operation on the host at a later time and disallowing additional assignment of computing tasks to the host.Type: GrantFiled: March 10, 2016Date of Patent: June 26, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Nic Allen, Gaurav Jagtiani
-
Patent number: 9940210Abstract: Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources.Type: GrantFiled: June 26, 2015Date of Patent: April 10, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Gaurav Jagtiani, Abhishek Singh, Ajay Mani, Akram Hassan, Thiruvengadam Venketesan, Saad Syed, Sushant Pramod Rewaskar, Wei Zhao
-
Publication number: 20180067830Abstract: Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources.Type: ApplicationFiled: November 13, 2017Publication date: March 8, 2018Inventors: Gaurav Jagtiani, Abhishek Singh, Ajay Mani, Akram Hassan, Thiruvengadam Venketesan, Saad Syed, Sushant Pramod Rewaskar, Wei Zhao
-
Publication number: 20170199795Abstract: Various techniques for deferred server recovery are disclosed herein. In one embodiment, a method includes receiving a notification of a fault from a host in the computing system. The host is performing one or more computing tasks for one or more users. The method can then include determining whether recovery of the fault in the received notification is deferrable on the host. In response to determining that the fault in the received notification is deferrable, the method includes setting a time delay to perform a pending recovery operation on the host at a later time and disallowing additional assignment of computing tasks to the host.Type: ApplicationFiled: March 10, 2016Publication date: July 13, 2017Inventors: Nic Allen, Gaurav Jagtiani
-
Publication number: 20150293821Abstract: Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources.Type: ApplicationFiled: June 26, 2015Publication date: October 15, 2015Inventors: Gaurav Jagtiani, Abhishek Singh, Ajay Mani, Akram Hassan, Thiruvengadam Venketesan, Saad Syed, Sushant Pramod Rewaskar, Wei Zhao
-
Patent number: 9141487Abstract: Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources.Type: GrantFiled: January 15, 2013Date of Patent: September 22, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Gaurav Jagtiani, Abhishek Singh, Ajay Mani, Akram Hassan, Thiruvengadam Venketesan, Saad Syed, Sushant Pramod Rewaskar, Wei Zhao
-
Publication number: 20140201564Abstract: Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources.Type: ApplicationFiled: January 15, 2013Publication date: July 17, 2014Applicant: Microsoft CorporationInventors: Gaurav Jagtiani, Abhishek Singh, Ajay Mani, Akram Hassan, Thiruvengadam Venketesan, Saad Syed, Sushant Pramod Rewaskar, Wei Zhao