Patents by Inventor Hazim Shafi

Hazim Shafi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MEASUREMENT AND REPORTING OF PERFORMANCE EVENT RATES

Publication number: 20100251160

Abstract: Methods and systems are disclosed for measuring performance event rates at a computer and reporting the performance event rates using timelines. A particular method tracks, for a time period, the occurrences of a particular event at a computer. Event rates corresponding to different time segments within the time period are calculated, and the time segments are assigned colors based on their associated event rates. The event rates are used to display a colored timeline for the time period, including displaying a colored timeline portion for each time segment in its associated color.

Type: Application

Filed: March 26, 2009

Publication date: September 30, 2010

Applicant: Microsoft Corporation

Inventors: Hazim Shafi, Khaled S. Sedky
THREAD EXECUTION ANALYZER

Publication number: 20100223600

Abstract: A thread execution analyzer analyzes blocking events of threads in a program using execution data and callstacks collected at the blocking events. The thread execution analyzer attempts to identify an application programming interface (API) responsible for each blocking event and provides blocking analysis information to a user. The blocking analysis information may be used by a developer of the program to understand the causes of blocking events that occur for threads of the program.

Type: Application

Filed: February 27, 2009

Publication date: September 2, 2010

Applicant: Microsoft Corporation

Inventors: Hazim Shafi, Brian Adelberg, Khaled S. Sedky
System and method for reducing unnecessary cache operations

Patent number: 7698508

Abstract: A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.

Type: Grant

Filed: February 14, 2007

Date of Patent: April 13, 2010

Assignee: International Business Machines Corporation

Inventors: Ramakrishnan Rajamony, Hazim Shafi, William Evan Speight, Lixin Zhang
Efficient multiple-table reference prediction mechanism

Patent number: 7657729

Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.

Type: Grant

Filed: July 13, 2006

Date of Patent: February 2, 2010

Assignee: International Business Machines Corporation

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
ANALYSIS OF THREAD SYNCHRONIZATION EVENTS

Publication number: 20090319996

Abstract: Thread blocking synchronization event analysis software uses kernel context switch data and thread unblocking data to form a visualization of thread synchronization behavior. The visualization provides interactive access to source code responsible for thread blocking, identifies blocking threads and blocked threads, summarizes execution delays due to synchronization and lists corresponding APIs and objects, correlates thread synchronization events with application program phases, and otherwise provides information associated with thread synchronization. The visualization may operate within an integrated development environment.

Type: Application

Filed: June 23, 2008

Publication date: December 24, 2009

Applicant: MICROSOFT CORPORATION

Inventors: Hazim Shafi, Brian Adelberg, Maria Blees, Paulo Janotti, Khaled Sedky
Dynamically adjusting a pre-fetch distance to enable just-in-time prefetching within a processing system

Patent number: 7487297

Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.

Type: Grant

Filed: June 6, 2006

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
Methods and Arrangements to Manage On-Chip Memory to Reduce Memory Latency

Publication number: 20080263284

Abstract: Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.

Type: Application

Filed: June 24, 2008

Publication date: October 23, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Dilma Menezes da Silva, Elmootazbellah Nabil Elnozahy, Orran Yaakov Krieger, Hazim Shafi, Xiaowei Shen, Balaram Sinharoy, Robert Brett Tremaine
Methods and arrangements to manage on-chip memory to reduce memory latency

Patent number: 7437517

Abstract: Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.

Type: Grant

Filed: January 11, 2005

Date of Patent: October 14, 2008

Assignee: International Business Machines Corporation

Inventors: Dilma Menezes da Silva, Elmootazbellah Nabil Elnozahy, Orran Yaakov Krieger, Hazim Shafi, Xiaowei Shen, Balaram Sinharoy, Robert Brett Tremaine
Chained cache coherency states for sequential non-homogeneous access to a cache line with outstanding data response

Patent number: 7409504

Abstract: A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.

Type: Grant

Filed: October 6, 2005

Date of Patent: August 5, 2008

Assignee: International Business Machines Corporation

Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Edward Williams, Kenneth Lee Wright
Mechanisms and methods for using data access patterns

Patent number: 7395407

Abstract: The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership.

Type: Grant

Filed: October 14, 2005

Date of Patent: July 1, 2008

Assignee: International Business Machines Corporation

Inventors: Xiaowei Shen, Hazim Shafi
System and method for contention-based cache performance optimization

Patent number: 7380068

Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.

Type: Grant

Filed: October 27, 2005

Date of Patent: May 27, 2008

Assignee: International Business Machines Corporation

Inventors: Hazim Shafi, William E. Speight
Chained cache coherency states for sequential homogeneous access to a cache line with outstanding data response

Patent number: 7370155

Abstract: A method and data processing system for sequentially coupling successive, homogenous processor requests for a cache line in a chain before the data is received in the cache of a first processor within the chain. Chained intermediate coherency states are assigned to track the chain of processor requests and subsequent access permission provided, prior to receipt of the data at the first processor starting the chain. The chained intermediate coherency state assigned identifies the processor operation and a directional identifier identifies the processor to which the cache line is to be forwarded. When the data is received at the cache of the first processor within the chain, the first processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chain is immediately stopped when a non-homogenous operation is snooped by the last-in-chain processor.

Type: Grant

Filed: October 6, 2005

Date of Patent: May 6, 2008

Assignee: International Business Machines Corporation

Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Edward Williams, Kenneth Lee Wright
Data Processing System and Method for Reducing Cache Pollution by Write Stream Memory Access Patterns

Publication number: 20080046736

Abstract: A data processing system includes a system memory and a cache hierarchy that caches contents of the system memory. According to one method of data processing, a storage modifying operation having a cacheable target real memory address is received. A determination is made whether or not the storage modifying operation has an associated bypass indication. In response to determining that the storage modifying operation has an associated bypass indication, the cache hierarchy is bypassed, and an update indicated by the storage modifying operation is performed in the system memory. In response to determining that the storage modifying operation does not have an associated bypass indication, the update indicated by the storage modifying operation is performed in the cache hierarchy.

Type: Application

Filed: August 3, 2006

Publication date: February 21, 2008

Inventors: Ravi K. Arimilli, Francis P. O'Connell, Hazim Shafi, Derek E. Williams, Lixin Zhang
Efficient Multiple-Table Reference Prediction Mechanism

Publication number: 20080016330

Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.

Type: Application

Filed: July 13, 2006

Publication date: January 17, 2008

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
Just-In-Time Prefetching

Publication number: 20070283101

Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.

Type: Application

Filed: June 6, 2006

Publication date: December 6, 2007

Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
System and method of managing cache hierarchies with adaptive mechanisms

Patent number: 7281092

Abstract: A system and method of managing cache hierarchies with adaptive mechanisms. A preferred embodiment of the present invention includes, in response to selecting a data block for eviction from a memory cache (the source cache) out of a collection of memory caches, examining a data structure to determine whether an entry exists that indicates that the data block has been evicted from the source memory cache, or another peer cache, to a slower cache or memory and subsequently retrieved from the slower cache or memory into the source memory cache or other peer cache. Also, a preferred embodiment of the present invention includes, in response to determining the entry exists in the data structure, selecting a peer memory cache out of the collection of memory caches at the same level in the hierarchy to receive the data block from the source memory cache upon eviction.

Type: Grant

Filed: June 2, 2005

Date of Patent: October 9, 2007

Assignee: International Business Machines Corporation

Inventors: Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
System and Method for Reducing Unnecessary Cache Operations

Publication number: 20070136535

Abstract: A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.

Type: Application

Filed: February 14, 2007

Publication date: June 14, 2007

Inventors: Ramakrishnan Rajamony, Hazim Shafi, William Speight, Lixin Zhang
System and method for contention-based cache performance optimization

Publication number: 20070101067

Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.

Type: Application

Filed: October 27, 2005

Publication date: May 3, 2007

Inventors: Hazim Shafi, William Speight
Mechanisms and methods for using data access patterns

Publication number: 20070088919

Abstract: The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership.

Type: Application

Filed: October 14, 2005

Publication date: April 19, 2007

Applicant: International Business Machines

Inventors: Xiaowei Shen, Hazim Shafi
Chained cache coherency states for sequential non-homogeneous access to a cache line with outstanding data response

Publication number: 20070083716

Abstract: A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.

Type: Application

Filed: October 6, 2005

Publication date: April 12, 2007

Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Williams, Kenneth Wright

prev 1 2 3 4 next