Patents by Inventor Hazim Shafi
Hazim Shafi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20100251160Abstract: Methods and systems are disclosed for measuring performance event rates at a computer and reporting the performance event rates using timelines. A particular method tracks, for a time period, the occurrences of a particular event at a computer. Event rates corresponding to different time segments within the time period are calculated, and the time segments are assigned colors based on their associated event rates. The event rates are used to display a colored timeline for the time period, including displaying a colored timeline portion for each time segment in its associated color.Type: ApplicationFiled: March 26, 2009Publication date: September 30, 2010Applicant: Microsoft CorporationInventors: Hazim Shafi, Khaled S. Sedky
-
Publication number: 20100223600Abstract: A thread execution analyzer analyzes blocking events of threads in a program using execution data and callstacks collected at the blocking events. The thread execution analyzer attempts to identify an application programming interface (API) responsible for each blocking event and provides blocking analysis information to a user. The blocking analysis information may be used by a developer of the program to understand the causes of blocking events that occur for threads of the program.Type: ApplicationFiled: February 27, 2009Publication date: September 2, 2010Applicant: Microsoft CorporationInventors: Hazim Shafi, Brian Adelberg, Khaled S. Sedky
-
Patent number: 7698508Abstract: A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.Type: GrantFiled: February 14, 2007Date of Patent: April 13, 2010Assignee: International Business Machines CorporationInventors: Ramakrishnan Rajamony, Hazim Shafi, William Evan Speight, Lixin Zhang
-
Patent number: 7657729Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.Type: GrantFiled: July 13, 2006Date of Patent: February 2, 2010Assignee: International Business Machines CorporationInventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
-
Publication number: 20090319996Abstract: Thread blocking synchronization event analysis software uses kernel context switch data and thread unblocking data to form a visualization of thread synchronization behavior. The visualization provides interactive access to source code responsible for thread blocking, identifies blocking threads and blocked threads, summarizes execution delays due to synchronization and lists corresponding APIs and objects, correlates thread synchronization events with application program phases, and otherwise provides information associated with thread synchronization. The visualization may operate within an integrated development environment.Type: ApplicationFiled: June 23, 2008Publication date: December 24, 2009Applicant: MICROSOFT CORPORATIONInventors: Hazim Shafi, Brian Adelberg, Maria Blees, Paulo Janotti, Khaled Sedky
-
Patent number: 7487297Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.Type: GrantFiled: June 6, 2006Date of Patent: February 3, 2009Assignee: International Business Machines CorporationInventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
-
Publication number: 20080263284Abstract: Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.Type: ApplicationFiled: June 24, 2008Publication date: October 23, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dilma Menezes da Silva, Elmootazbellah Nabil Elnozahy, Orran Yaakov Krieger, Hazim Shafi, Xiaowei Shen, Balaram Sinharoy, Robert Brett Tremaine
-
Patent number: 7437517Abstract: Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.Type: GrantFiled: January 11, 2005Date of Patent: October 14, 2008Assignee: International Business Machines CorporationInventors: Dilma Menezes da Silva, Elmootazbellah Nabil Elnozahy, Orran Yaakov Krieger, Hazim Shafi, Xiaowei Shen, Balaram Sinharoy, Robert Brett Tremaine
-
Patent number: 7409504Abstract: A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.Type: GrantFiled: October 6, 2005Date of Patent: August 5, 2008Assignee: International Business Machines CorporationInventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Edward Williams, Kenneth Lee Wright
-
Patent number: 7395407Abstract: The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership.Type: GrantFiled: October 14, 2005Date of Patent: July 1, 2008Assignee: International Business Machines CorporationInventors: Xiaowei Shen, Hazim Shafi
-
Patent number: 7380068Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.Type: GrantFiled: October 27, 2005Date of Patent: May 27, 2008Assignee: International Business Machines CorporationInventors: Hazim Shafi, William E. Speight
-
Patent number: 7370155Abstract: A method and data processing system for sequentially coupling successive, homogenous processor requests for a cache line in a chain before the data is received in the cache of a first processor within the chain. Chained intermediate coherency states are assigned to track the chain of processor requests and subsequent access permission provided, prior to receipt of the data at the first processor starting the chain. The chained intermediate coherency state assigned identifies the processor operation and a directional identifier identifies the processor to which the cache line is to be forwarded. When the data is received at the cache of the first processor within the chain, the first processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chain is immediately stopped when a non-homogenous operation is snooped by the last-in-chain processor.Type: GrantFiled: October 6, 2005Date of Patent: May 6, 2008Assignee: International Business Machines CorporationInventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Edward Williams, Kenneth Lee Wright
-
Publication number: 20080046736Abstract: A data processing system includes a system memory and a cache hierarchy that caches contents of the system memory. According to one method of data processing, a storage modifying operation having a cacheable target real memory address is received. A determination is made whether or not the storage modifying operation has an associated bypass indication. In response to determining that the storage modifying operation has an associated bypass indication, the cache hierarchy is bypassed, and an update indicated by the storage modifying operation is performed in the system memory. In response to determining that the storage modifying operation does not have an associated bypass indication, the update indicated by the storage modifying operation is performed in the cache hierarchy.Type: ApplicationFiled: August 3, 2006Publication date: February 21, 2008Inventors: Ravi K. Arimilli, Francis P. O'Connell, Hazim Shafi, Derek E. Williams, Lixin Zhang
-
Publication number: 20080016330Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.Type: ApplicationFiled: July 13, 2006Publication date: January 17, 2008Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
-
Publication number: 20070283101Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.Type: ApplicationFiled: June 6, 2006Publication date: December 6, 2007Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
-
Patent number: 7281092Abstract: A system and method of managing cache hierarchies with adaptive mechanisms. A preferred embodiment of the present invention includes, in response to selecting a data block for eviction from a memory cache (the source cache) out of a collection of memory caches, examining a data structure to determine whether an entry exists that indicates that the data block has been evicted from the source memory cache, or another peer cache, to a slower cache or memory and subsequently retrieved from the slower cache or memory into the source memory cache or other peer cache. Also, a preferred embodiment of the present invention includes, in response to determining the entry exists in the data structure, selecting a peer memory cache out of the collection of memory caches at the same level in the hierarchy to receive the data block from the source memory cache upon eviction.Type: GrantFiled: June 2, 2005Date of Patent: October 9, 2007Assignee: International Business Machines CorporationInventors: Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
-
Publication number: 20070136535Abstract: A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.Type: ApplicationFiled: February 14, 2007Publication date: June 14, 2007Inventors: Ramakrishnan Rajamony, Hazim Shafi, William Speight, Lixin Zhang
-
Publication number: 20070101067Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.Type: ApplicationFiled: October 27, 2005Publication date: May 3, 2007Inventors: Hazim Shafi, William Speight
-
Publication number: 20070088919Abstract: The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership.Type: ApplicationFiled: October 14, 2005Publication date: April 19, 2007Applicant: International Business MachinesInventors: Xiaowei Shen, Hazim Shafi
-
Publication number: 20070083716Abstract: A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.Type: ApplicationFiled: October 6, 2005Publication date: April 12, 2007Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Williams, Kenneth Wright