Patents by Inventor Per H. Hammarlund
Per H. Hammarlund has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7502912Abstract: A method and apparatus for rescheduling operations in a processor. More particularly, the present invention relates to optimally using a scheduler resource in a processor by analyzing, predicting, and sorting the write order of instructions into the scheduler so that the duration the instructions sit idle in the scheduler is minimized. The analyses, prediction, and sorting may be done between an instruction queue and a scheduler by using delay units. The prediction can be based on history (latency, dependency, and resource) or on a general prediction scheme.Type: GrantFiled: December 30, 2003Date of Patent: March 10, 2009Assignee: Intel CorporationInventors: Avinash Sodani, Per H. Hammarlund, Stephan J. Jourdan
-
Patent number: 7502892Abstract: Embodiments of the present invention relate to cache coherency. In an embodiment of the invention, a cache includes one or more cache lines. A store pipeline may retrieve a tag associated with one of the cache lines. The data associated with the cache line may not retrieved and the cache line may be updated if, based on the tag, the cache line is determined to be in a modified or exclusive state.Type: GrantFiled: December 30, 2003Date of Patent: March 10, 2009Assignee: Intel CorporationInventors: Per H. Hammarlund, Hermann W. Gartler
-
Patent number: 7174428Abstract: Embodiments of the present invention provide a method, apparatus and system for memory renaming. In one embodiment, a decode unit may decode a load instruction. If the load instruction is predicted to be memory renamed, the load instruction may have a predicted store identifier associated with the load instruction. The decode unit may transform the load instruction that is predicted to be memory renamed into a data move instruction and a load check instruction. The data move instruction may read data from the cache based on the predicted store identifier and load check instruction may compare an identifier associated with an identified source store with the predicted store identifier. A retirement unit may retire the load instruction if the predicted store identifier matches an identifier associated with the identified source store.Type: GrantFiled: December 29, 2003Date of Patent: February 6, 2007Assignee: Intel CorporationInventors: Sebastien Hily, Per H. Hammarlund, Avinash Sodani
-
Patent number: 7130965Abstract: Embodiments of the present invention relate to a memory management scheme and apparatus that enables efficient cache memory management. The method includes writing an entry to a store buffer at execute time; determining if the entry's address is in a first-level cache associated with the store buffer before retirement; and setting a status bit associated with the entry in said store buffer, if the address is in the cache in either exclusive or modified state. The method further includes immediately writing the entry to the first-level cache at or after retirement when the status bit is set; and de-allocating the entry from said store buffer at retirement. The method further may comprise resetting the status bit if the cacheline is allocated over or is evicted from the cache before the store buffer entry attempts to write to the cache.Type: GrantFiled: December 23, 2003Date of Patent: October 31, 2006Assignee: Intel CorporationInventors: Per H. Hammarlund, Stephan Jourdan, Sebastien Hily, Aravindh Baktha, Hermann Gartler
-
Patent number: 6990551Abstract: A system and method for reducing linear address aliasing is described. In one embodiment, a portion of a linear address is combined with a process identifier, e.g., a page directory base pointer to form an adjusted-linear address. The page directory base pointer is unique to a process and combining it with a portion of the linear address produces an adjusted-linear address that provides a high probability of no aliasing. A portion of the adjusted-linear address is used to search an adjusted-linear-addressed cache memory for a data block specified by the linear address. If the data block does not reside in the adjusted-linear-addressed cache memory, then a replacement policy selects one of the cache lines in the adjusted-linear-addressed cache memory and replaces the data block of the selected cache line with a data block located at a physical address produced from translating the linear address.Type: GrantFiled: August 13, 2004Date of Patent: January 24, 2006Assignee: Intel CorporationInventors: Herbert H. J. Hum, Stephan J. Jourdan, Per H. Hammarlund
-
Patent number: 6981129Abstract: Breaking replay dependency loops in a processor using a rescheduled replay queue. The processor comprises a replay queue to receive a plurality of instructions, and an execution unit to execute the plurality of instructions. A scheduler is coupled between the replay queue and the execution unit. The scheduler speculatively schedules instructions for execution and increments a counter for each of the plurality of instructions to reflect the number of times each of the plurality of instructions has been executed. The scheduler also dispatches each instruction to the execution unit either when the counter does not exceed a maximum number of replays or, if the counter exceeds the maximum number of replays, when the instruction is safe to execute. A checker is coupled to the execution unit to determine whether each instruction has executed successfully. The checker is also coupled to the replay queue to communicate to the replay queue each instruction that has not executed successfully.Type: GrantFiled: November 2, 2000Date of Patent: December 27, 2005Assignee: Intel CorporationInventors: Darrell D. Boggs, Douglas M. Carmean, Per H. Hammarlund, Francis X. McKeen, David J. Sager, Ronak Singhal
-
Patent number: 6877086Abstract: Rescheduling multiple micro-operations in a processor using a replay queue. The processor comprises a replay queue to receive a plurality of instructions and an execution unit to execute the plurality of instructions. A scheduler is coupled between the replay queue and the execution unit. The scheduler speculatively schedules instructions for execution and dispatches each instruction to the execution unit. A checker is coupled to the execution unit to determine whether each instruction has executed successfully. The checker is also coupled to the replay queue to communicate to the replay queue each instruction that has not executed successfully.Type: GrantFiled: November 2, 2000Date of Patent: April 5, 2005Assignee: Intel CorporationInventors: Darrell D. Boggs, Douglas M. Carmean, Per H. Hammarlund, Francis X. McKeen, David J. Sager, Ronak Singhal
-
Patent number: 6798364Abstract: A method and apparatus for variable length coding is described. A method comprises receiving a group of data having a group of set values, identifying a group of positions of the group of set values within the group of data without branching, for each of the group of positions, encoding a run of non-set values preceding each of the group of positions.Type: GrantFiled: February 5, 2002Date of Patent: September 28, 2004Assignee: Intel CorporationInventors: Yen-Kuang Chen, Matthew J. Holliman, Herbert Hum, Per H. Hammarlund, Thomas Huff, William W. Macy
-
Patent number: 6675282Abstract: A system and method for storing only one copy of a data block that is shared by two or more processes is described. In one embodiment, a global/non-global predictor predicts whether a data block, specified by a linear address, is shared or not shared by two or more processes. If the data block is predicted to be non-shared, then a portion of the linear address referencing the data block is combined with a process identifier that is unique to form a global/non-global linear address. If the data block is predicted to be shared, then the global/non-global linear address is the linear address itself. If the prediction as to whether or not the data block is shared is incorrect, then the actual value of whether or not the data block is shared is used in computing a corrected global/non-global linear address.Type: GrantFiled: February 12, 2003Date of Patent: January 6, 2004Assignee: Intel CorporationInventors: Herbert H. J. Hum, Stephan J. Jourdan, Deborrah Marr, Per H. Hammarlund
-
Patent number: 6643747Abstract: A request received from a requester to access a processor cache or register file or the like is buffered, by storing requestor identification, request type, address, and a status of the request. This buffered request may be forwarded to the cache if it has the highest priority among a number of buffered requests that also wish to access the cache. The priority is a function of at least the requestor identification, the requester type, and the status of the request. For buffered requests which include a read, the buffered request is deleted after, not before, receiving an indication that the requestor has received the data read from the cache.Type: GrantFiled: December 27, 2000Date of Patent: November 4, 2003Assignee: Intel CorporationInventors: Per H. Hammarlund, Douglas M. Carmean, Michael D. Upton
-
Publication number: 20030146858Abstract: A method and apparatus for variable length coding is described. A method comprises receiving a group of data having a group of set values, identifying a group of positions of the group of set values within the group of data without branching, for each of the group of positions, encoding a run of non-set values preceding each of the group of positions.Type: ApplicationFiled: February 5, 2002Publication date: August 7, 2003Inventors: Yen-Kuang Chen, Matthew J. Holliman, Herbert Hum, Per H. Hammarlund, Thomas Huff, William W. Macy
-
Publication number: 20030120892Abstract: A system and method for storing only one copy of a data block that is shared by two or more processes is described. In one embodiment, a global/non-global predictor predicts whether a data block, specified by a linear address, is shared or not shared by two or more processes. If the data block is predicted to be non-shared, then a portion of the linear address referencing the data block is combined with a process identifier that is unique to form a global/non-global linear address. If the data block is predicted to be shared, then the global/non-global linear address is the linear address itself. If the prediction as to whether or not the data block is shared is incorrect, then the actual value of whether or not the data block is shared is used in computing a corrected global/non-global linear address.Type: ApplicationFiled: February 12, 2003Publication date: June 26, 2003Applicant: Intel CorporationInventors: Herbert H.J. Hum, Stephan J. Jourdan, Deborrah Marr, Per H. Hammarlund
-
Patent number: 6560690Abstract: A system and method for storing only one copy of a data block that is shared by two or more processes is described. In one embodiment, a global/non-global predictor predicts whether a data block, specified by a linear address, is shared or not shared by two or more processes. If the data block is predicted to be non-shared, then a portion of the linear address referencing the data block is combined with a process identifier that is unique to form a global/non-global linear address. If the data block is predicted to be shared, then the global/non-global linear address is the linear address itself. If the prediction as to whether or not the data block is shared is incorrect, then the actual value of whether or not the data block is shared is used in computing a corrected global/non-global linear address.Type: GrantFiled: December 29, 2000Date of Patent: May 6, 2003Assignee: Intel CorporationInventors: Herbert H. J. Hum, Stephan J. Jourdan, Deborrah Marr, Per H. Hammarlund
-
Publication number: 20020087824Abstract: A system and method for reducing linear address aliasing is described. In one embodiment, a portion of a linear address is combined with a process identifier, e.g., a page directory base pointer to form an adjusted-linear address. The page directory base pointer is unique to a process and combining it with a portion of the linear address produces an adjusted-linear address that provides a high probability of no aliasing. A portion of the adjusted-linear address is used to search an adjusted-linear-addressed cache memory for a data block specified by the linear address. If the data block does not reside in the adjusted-linear-addressed cache memory, then a replacement policy selects one of the cache lines in the adjusted-linear-addressed cache memory and replaces the data block of the selected cache line with a data block located at a physical address produced from translating the linear address.Type: ApplicationFiled: December 29, 2000Publication date: July 4, 2002Inventors: Herbert H.J. Hum, Stephan J. Jourdan, Per H. Hammarlund
-
Publication number: 20020087795Abstract: A system and method for storing only one copy of a data block that is shared by two or more processes is described. In one embodiment, a global/non-global predictor predicts whether a data block, specified by a linear address, is shared or not shared by two or more processes. If the data block is predicted to be non-shared, then a portion of the linear address referencing the data block is combined with a process identifier that is unique to form a global/non-global linear address. If the data block is predicted to be shared, then the global/non-global linear address is the linear address itself. If the prediction as to whether or not the data block is shared is incorrect, then the actual value of whether or not the data block is shared is used in computing a corrected global/non-global linear address.Type: ApplicationFiled: December 29, 2000Publication date: July 4, 2002Inventors: Herbert H.J. Hum, Stephan J. Jourdan, Deborrah Marr, Per H. Hammarlund
-
Publication number: 20020083244Abstract: A request received from a requester to access a processor cache or register file or the like is buffered, by storing requestor identification, request type, address, and a status of the request. This buffered request may be forwarded to the cache if it has the highest priority among a number of buffered requests that also wish to access the cache. The priority is a function of at least the requestor identification, the requester type, and the status of the request. For buffered requests which include a read, the buffered request is deleted after, not before, receiving an indication that the requestor has received the data read from the cache.Type: ApplicationFiled: December 27, 2000Publication date: June 27, 2002Inventors: Per H. Hammarlund, Douglas M. Carmean, Michael D. Upton
-
Patent number: 6105111Abstract: A cache technique for maximizing cache efficiency by assigning ages to elements which access the cache, is described. In one embodiment, the cache technique includes receiving a first element of a first type by a cache and writing the first element in a set of the cache. The first element has a first age. The cache technique further includes receiving a second element of a second type by the cache and writing the second element in the set of the cache. The second element has a middle age, where the first age is a more recently used age than the middle age. In another embodiment, the cache technique includes receiving a first element of a first stream by a cache and writing the first element in a set of the cache. The first element has a first age. The cache technique further includes receiving a second element of a second stream by the cache and writing the second element in the set of the cache. The second element has a middle age, where the first age is a more recently used age than the middle age.Type: GrantFiled: March 31, 1998Date of Patent: August 15, 2000Assignee: Intel CorporationInventors: Per H. Hammarlund, Glenn J. Hinton