Using multiple thermal points to enable component level power and thermal management
A component in a computer includes multiple functional unit blocks (FUB). Each FUB may be associated with a sensor and may be managed individually. When the sensor detects that a problem associated with a particular FUB may arise, a controller may be used to adjust operation of the FUB instead of operation of the entire component.
This application claims benefit of U.S. patent application Ser. No. 10/350,712, filed Jan. 24, 2003
FIELD OF THE INVENTIONThe present invention relates to the field of computing systems, more particularly relating to methods and apparatus for component management including one or more of thermal management, performance management, and power management.
BACKGROUNDDesigners of computing systems such as, for example, mobile computer systems, are faced with a delicate balance. They seek to increase performance of the computer systems but at the same time control power consumption and temperature caused by components of the computer systems. The components may include, for example, a processor, chipsets, etc.
Typically, a processor has a discrete operating point, characterized by a given frequency and power. The frequency may be some multiple of an external clock delivered to the processor. The power consumed by the processor may be a function of the frequency and voltage applied to the processor. As the voltage level is increased, the frequency may be increased, resulting in a nonlinear increase in power consumption. An increase in the power consumption may cause an increase in temperature. When the temperature is too high, the processor may fail. Typically, to decrease the temperature, the voltage and frequency pair may be adjusted to decrease the power consumption of the processor.
Similarly, chipsets may receive clock signals and may operate at a certain frequency. During normal operation, the chipsets may cause a rise in temperature, and when the temperature is too high, operation of the chipsets may also fail. More recent chipsets may include a mechanism (e.g., throttling) to lower the clock-frequency to control the temperature generated by the chipsets. In addition to adjusting the frequency, heat sinks, airflows or combinations of heat sinks and airflows may also be used as thermal solutions to control the temperature generated by the chipsets and by the processor.
Although the above techniques provide some forms of thermal solutions, one common theme among them is that the solutions apply to the entire component (e.g., processor) at the expense of the performance of the component as a whole, and thus may not be efficient.
BRIEF DESCRIPTION OF THE DRAWINGSThe following drawings disclose various embodiments of the present invention for purposes of illustration only and are not intended to limit the scope of the invention.
In one embodiment, a method for monitoring multiple functional unit blocks (FUB) of a component is disclosed. Each FUB may be associated with a sensor. When the sensor detects that operation of a particular FUB may be affected, a controller associated with the FUB may perform appropriate adjustment relating to the FUB.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures, processes and devices are shown in block diagram form or are referred to in a summary manner in order to provide an explanation without undue detail.
As used herein, the term “when” may be used to indicate the temporal nature of an event. For example, the phrase “event ‘A’ occurs when event ‘B’ occurs” is to be interpreted to mean that event A may occur before, during, or after the occurrence of event B, but is nonetheless associated with the occurrence of event B. For example, event A occurs when event B occurs if event A occurs in response to the occurrence of event B or in response to a signal indicating that event B has occurred, is occurring, or will occur.
Modern computer components (e.g., processors, chipsets, etc.) are designed with increasing frequency and power density for higher performance. Their performance may be limited by the amount of heat that can be extracted using the available cooling technology or power/frequency throttling techniques. Typically, each of the computer components may have multiple FUBs. Each FUB may perform a different function and may potentially be a hot spot of the component when the FUB reaches a certain thermal point. Currently, when a FUB becomes a hot spot, power throttling is applied to the entire component to reduce the temperature of the entire component.
For another embodiment, each of the eight FUBs 205-240 may be associated with a sensor (not shown) to monitor its operating condition. There may be a different sensor for each FUB. Alternatively, two or more FUBs may share the same sensor. For one embodiment, each of the eight FUBs 205-240 of the component 200 may be managed independently of the other FUBs. Managing the FUBs may include, for example, monitoring and throttling the operating condition of the FUBs. For example, a sensor may monitor and send operating condition information of a FUB to a controller, and when necessary the controller may throttle the power and/or the frequency applied to the FUB. Managing the FUB may also include performing other operations that may help controlling the operating condition of the FUB. It may be noted that a component may only have one FUB. In this case, managing the only FUB is similar to managing the entire component.
For example, when the sensor associated with the FUB 205 is a thermal sensor, and it detects that the temperature of the FUB 205 violates a certain temperature threshold, appropriate actions may be taken to reduce the temperature of the FUB 205. This may include, for example, throttling the applied voltage and/or frequency or adjusting the power applied to the FUB 205. The temperature threshold may be predetermined, or it may be determined dynamically. Being able to independently manage the FUB 205 may enable the neighboring FUBs 210-240 to continue to operate at their normal levels of performance.
It may be noted that when the component 200 is a processor, the component 200 may also include multiple execution cores and other manageable resources on the same silicon die. For example, the component 200 may be a processor that supports Hyperthreading Technology (HT) to provide multithreading and parallel execution capabilities. Hyperthreading Technology is developed by Intel Corporation of Santa Clara, Calif. In this example, the processor that supports HT may include multiple execution cores (or logical processors) on the same processor die. Each of these execution cores and resources may also be managed individually as a FUB to enable better management of its operating condition. Other components in the computer system may also be managed based on their FUBs using the techniques described herein.
It may be noted when the controller is on the same die as the component (as illustrated in
For one embodiment, inputs from each FUB of the component may be viewed as a bit setting indicating their respective condition. For example, in the case where the controller is external to the component, the component may export status information from each FUB in the form of a data packet, perhaps using multiple bits to represent the status information for each FUB. For example, it may be possible to use two (2) bits to represent the status information. Other number of bits may also be used for different levels of control. In the current example, the component has four (4) FUBs, and two (2) bits are used to define the different possible status information, as shown in the following table.
The example table above shows that each FUB may have a “Normal” operating mode, a “Hot” mode where some action is required, and a “Critical Hot” where immediate action is required. Immediate action may include shutting down the component, or the component may result in damage. The bit settings may be defined to indicate more exact temperatures of each FUB, as measured in Celsius degrees, for example. Each FUB may have a different thermal point or operating threshold at which adjusting or corrective action may need to be taken.
When the threshold is violated, the process flows from block 410 to block 415 where the controller may perform one or more operations to adjust the operating condition of the FUB. This may include, for example, decreasing one or more of frequency, voltage, thermal, power, and performance throttling of the FUB. At block 420, a test is made to determine if there exists any dependent FUB. When there is a dependent FUB, the process flows from block 420 to block 425 where the operating condition of the dependent FUB may also be decreased. When there is no dependent FUB, the process continues at block 405 where the controller receives updated operating condition information from the sensor.
At block 625, the controller adjusts the operating condition of the one or more FUBs that neighbor the first FUB. For example, when the temperature of the first FUB violates a temperature threshold, it may be possible to indirectly reduce the temperature of the first FUB by reducing the temperature of its neighboring FUBs. The process then continues at block 605 where the controller receives updated operating condition information from the sensor.
The processor 705 may or may not include multiple logical processors. For example, the processor 705 may support HyperThreading Technology (HT) and may include two logical processors 706 and 707. The chipset 710 may include a graphics controller 712, a memory controller 713, and an input output (I/O) controller 714. Clock generator 715 may provide clock signals to the processor 705, the chipset 710, and the memory 720. It may also provide clock signals to other components in the computer system 700. Each of these components may include multiple FUBs, and the operating condition of each of the FUBs may be individually managed, as described above. The computer system 700 may be powered by an alternating current (AC) power source (not shown) or by a direct current (DC) power source (not shown) using one or more batteries.
The computer system 700 may include a storage device 728 that may include a machine-readable medium on which is stored sequences of instructions (e.g., software application) embodying any one, or all, of the embodiments described herein. Execution of the sequences of instruction may cause the processor 705 to perform operations according to embodiments of the invention. The sequences of instructions may be loaded into the memory 720 from the storage device 728 or from one or more other digital processing systems (e.g. a server computer system) over a network connection (not shown). The sequences of instructions may be stored concurrently in several storage devices (e.g. DRAM and a hard disk, such as virtual memory). The sequences of instructions may also reside, completely or at least partially, within the memory 720 and/or within the processor 705.
In other embodiments, hard-wired circuitry may be used in place of or in combination with the sequences of instructions to implement various aspects of the invention. Thus, the present invention is not limited to any specific combination of hardware circuitry and software, nor to any particular source for the instructions executed by the computer or digital processing system.
Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention as set forth in the claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Claims
45. A method, comprising:
- monitoring one or more regions of an integrated circuit using a sensor associated with each of the one or more regions; and
- when a sensor associated with a first region detects that the first region violates an operating threshold, adjusting operating condition of the first region.
46. The method of claim 45, wherein the integrated circuit is capable of generating heat.
47. The method of claim 46, wherein the integrated circuit is a processor or a chipset.
48. The method of claim 45, wherein adjusting the operating condition of the first region comprises:
- if the operating condition of the first region can be adjusted, adjusting the operating condition of the first region; otherwise adjusting operating condition of a second region, the second region being a neighbor of the first region.
49. The method of claim 48, further comprising:
- if a third region depends on the first region and the operating condition of the first region is adjusted, adjusting operating condition of the third region.
50. The method of claim 45, wherein operating condition of a fourth region is not impacted by adjusting operating condition of the first region.
51. The method of claim 45, wherein the sensors are thermal sensors, and wherein the operating threshold is a temperature threshold.
52. The method of claim 45, wherein adjusting operating condition of the first region comprises:
- throttling frequency applied to the first region.
53. The method of claim 52, wherein adjusting operating condition of the first region further comprises:
- throttling voltage applied to the first region.
54. The method of claim 45, wherein adjusting operating condition of the first region comprises:
- throttling power consumption of the first region.
55. A method, comprising:
- when a first region of a heat generating integrated circuit in a computer system violates an operating threshold, throttling operating condition of the first region instead of throttling operating condition of the integrated circuit.
56. The method of claim 55, wherein throttling the operating condition of the first region includes throttling operating condition of a second region of the integrated circuit when the second region depends on the first region.
57. The method of claim 55, wherein throttling the operating condition of the first region includes throttling operating condition of a third region of the integrated circuit when throttling the operating condition of the third region enables the first region to not violate the operating threshold.
58. The method of claim 55, wherein the heat-generating integrated circuit is a processor.
59. The method of claim 55, wherein the operating threshold is a thermal threshold.
60. The method of claim 55, wherein throttling the operating condition of the first region includes throttling power consumption of the first region.
61. A system, comprising:
- a heat-generating component having multiple regions, each of the regions associated with a sensor to monitor its operating condition; and
- a controller coupled to the heat-generating component, wherein the controller is to manage operating condition of each of the multiple regions based on operating condition information provided by the sensor associated with each of the multiple regions.
62. The system of claim 61, wherein the controller and the heat-generating component share a single die.
63. The system of claim 61, wherein the controller and the heat-generating component are on different dies.
64. The system of claim 61, wherein the sensors are thermal sensors, and wherein when the operating condition information of a first region detected by its corresponding thermal sensor indicates that temperature of the first region violates a temperature threshold, the controller is to manage the operating condition of the first region by reducing the temperature of the first region.
65. The system of claim 61, wherein when operating condition of a first region violates a threshold, the controller is to manage the operating condition of the first region by throttling the operating condition of the first region.
66. The system of claim 65, wherein the controller is to throttle the operating condition of the first region instead of throttling operating condition of the heat-generating component.
67. The system of claim 65, wherein the controller is to throttle the operating condition of the first region by adjusting operating condition of a second region.
68. The system of claim 67, wherein the operating condition of the second region is throttled when it is not desirable to throttle the operating condition of the first region.
69. The system of claim 65, wherein the controller is to further throttle operating condition of a third region when the third region depends on the first region.
70. A computer readable medium comprising executable instructions which, when executed in a processing system, causes the processing system to perform a method, comprising:
- monitoring one or more regions of an integrated circuit (IC) using a sensor associated with each of the one or more regions; and
- when a sensor associated with a first region of the IC detects that the first region violates an operating threshold, adjusting operating condition of the first region.
71. The computer readable medium of claim 70, wherein the IC is capable of generating heat.
72. The computer readable medium of claim 71, wherein the IC is a processor or a chipset.
73. The computer readable medium of claim 70, wherein adjusting the operating condition of the first region comprises:
- if the operating condition of the first region can be adjusted, adjusting the operating condition of the first region, otherwise adjusting operating condition of a second region of the IC, the second region being a neighbor of the first region.
74. The computer readable medium of claim 73, further comprising:
- if a third region of the IC depends on the first region and the operating condition of the first region is adjusted, adjusting operating condition of the third region.
75. The computer readable medium of claim 70, wherein operating condition of a fourth region of the IC is not impacted by adjusting operating condition of the first region of the IC.
76. The computer readable medium of claim 70, wherein the sensors are thermal sensors, and wherein the operating threshold is a temperature threshold.
77. The computer readable medium of claim 70, wherein adjusting operating condition of the first region comprises:
- throttling frequency applied to the first region.
78. The computer readable medium of claim 77, wherein adjusting operating condition of the first region further comprises:
- throttling voltage applied to the first region.
79. The computer readable medium of claim 70, wherein adjusting operating condition of the first region comprises:
- throttling power consumption of the first region.
80. A computer readable medium comprising executable instructions which, when executed in a processing system, causes the processing system to perform a method, comprising:
- when a first functional unit block (FUB) of a heat generating integrated circuit (IC) in a computer system violates an operating threshold, throttling operating condition of the first FUB instead of throttling operating condition of the (IC).
81. The computer readable medium of claim 80, wherein throttling the operating condition of the first FUB includes throttling operating condition of a second FUB of the IC when the second region depends on the first region.
82. The computer readable medium of claim 80, wherein throttling the operating condition of the first region includes throttling operating condition of a third region of the IC when throttling the operating condition of the third region enables the first region to not violate the operating threshold.
83. The computer readable medium of claim 80, wherein the heat-generating IC is a processor.
84. The computer readable medium of claim 80, wherein the operating threshold is a thermal threshold.
85. The computer readable medium of claim 80, wherein throttling the operating condition of the first region includes throttling power consumption of the first region.
86. An integrated circuit (IC), comprising:
- two or more regions, each of the regions associated with a different sensor; and
- a controller associated with the two or more regions, wherein the controller is to receive operating condition information from the sensor for each of the two or more regions, and wherein the controller is capable of adjusting operating condition of each of the two or more regions based on their operating condition information.
87. The IC of claim 86, wherein the controller is to adjust the operating condition of a first region when the operating condition information of the first region indicates that the first region is a hot spot.
88. The apparatus of claim 87, wherein the controller is to adjust the operating condition of a first region rather than the operating condition of all of the two or more regions.
Type: Application
Filed: Jan 13, 2006
Publication Date: Jun 8, 2006
Inventor: Kelan Silvester (Portland, OR)
Application Number: 11/332,003
International Classification: G06F 1/26 (20060101);