Patents by Inventor Hazim Shafi

Hazim Shafi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20100251160
    Abstract: Methods and systems are disclosed for measuring performance event rates at a computer and reporting the performance event rates using timelines. A particular method tracks, for a time period, the occurrences of a particular event at a computer. Event rates corresponding to different time segments within the time period are calculated, and the time segments are assigned colors based on their associated event rates. The event rates are used to display a colored timeline for the time period, including displaying a colored timeline portion for each time segment in its associated color.
    Type: Application
    Filed: March 26, 2009
    Publication date: September 30, 2010
    Applicant: Microsoft Corporation
    Inventors: Hazim Shafi, Khaled S. Sedky
  • Publication number: 20100223600
    Abstract: A thread execution analyzer analyzes blocking events of threads in a program using execution data and callstacks collected at the blocking events. The thread execution analyzer attempts to identify an application programming interface (API) responsible for each blocking event and provides blocking analysis information to a user. The blocking analysis information may be used by a developer of the program to understand the causes of blocking events that occur for threads of the program.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 2, 2010
    Applicant: Microsoft Corporation
    Inventors: Hazim Shafi, Brian Adelberg, Khaled S. Sedky
  • Patent number: 7698508
    Abstract: A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.
    Type: Grant
    Filed: February 14, 2007
    Date of Patent: April 13, 2010
    Assignee: International Business Machines Corporation
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, William Evan Speight, Lixin Zhang
  • Patent number: 7657729
    Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.
    Type: Grant
    Filed: July 13, 2006
    Date of Patent: February 2, 2010
    Assignee: International Business Machines Corporation
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Publication number: 20090319996
    Abstract: Thread blocking synchronization event analysis software uses kernel context switch data and thread unblocking data to form a visualization of thread synchronization behavior. The visualization provides interactive access to source code responsible for thread blocking, identifies blocking threads and blocked threads, summarizes execution delays due to synchronization and lists corresponding APIs and objects, correlates thread synchronization events with application program phases, and otherwise provides information associated with thread synchronization. The visualization may operate within an integrated development environment.
    Type: Application
    Filed: June 23, 2008
    Publication date: December 24, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Hazim Shafi, Brian Adelberg, Maria Blees, Paulo Janotti, Khaled Sedky
  • Patent number: 7487297
    Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.
    Type: Grant
    Filed: June 6, 2006
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Publication number: 20080263284
    Abstract: Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
    Type: Application
    Filed: June 24, 2008
    Publication date: October 23, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dilma Menezes da Silva, Elmootazbellah Nabil Elnozahy, Orran Yaakov Krieger, Hazim Shafi, Xiaowei Shen, Balaram Sinharoy, Robert Brett Tremaine
  • Patent number: 7437517
    Abstract: Methods, systems, and media for reducing memory latency seen by processors by providing a measure of control over on-chip memory (OCM) management to software applications, implicitly and/or explicitly, via an operating system are contemplated. Many embodiments allow part of the OCM to be managed by software applications via an application program interface (API), and part managed by hardware. Thus, the software applications can provide guidance regarding address ranges to maintain close to the processor to reduce unnecessary latencies typically encountered when dependent upon cache controller policies. Several embodiments utilize a memory internal to the processor or on a processor node so the memory block used for this technique is referred to as OCM.
    Type: Grant
    Filed: January 11, 2005
    Date of Patent: October 14, 2008
    Assignee: International Business Machines Corporation
    Inventors: Dilma Menezes da Silva, Elmootazbellah Nabil Elnozahy, Orran Yaakov Krieger, Hazim Shafi, Xiaowei Shen, Balaram Sinharoy, Robert Brett Tremaine
  • Patent number: 7409504
    Abstract: A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.
    Type: Grant
    Filed: October 6, 2005
    Date of Patent: August 5, 2008
    Assignee: International Business Machines Corporation
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Edward Williams, Kenneth Lee Wright
  • Patent number: 7395407
    Abstract: The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership.
    Type: Grant
    Filed: October 14, 2005
    Date of Patent: July 1, 2008
    Assignee: International Business Machines Corporation
    Inventors: Xiaowei Shen, Hazim Shafi
  • Patent number: 7380068
    Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.
    Type: Grant
    Filed: October 27, 2005
    Date of Patent: May 27, 2008
    Assignee: International Business Machines Corporation
    Inventors: Hazim Shafi, William E. Speight
  • Patent number: 7370155
    Abstract: A method and data processing system for sequentially coupling successive, homogenous processor requests for a cache line in a chain before the data is received in the cache of a first processor within the chain. Chained intermediate coherency states are assigned to track the chain of processor requests and subsequent access permission provided, prior to receipt of the data at the first processor starting the chain. The chained intermediate coherency state assigned identifies the processor operation and a directional identifier identifies the processor to which the cache line is to be forwarded. When the data is received at the cache of the first processor within the chain, the first processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chain is immediately stopped when a non-homogenous operation is snooped by the last-in-chain processor.
    Type: Grant
    Filed: October 6, 2005
    Date of Patent: May 6, 2008
    Assignee: International Business Machines Corporation
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Edward Williams, Kenneth Lee Wright
  • Publication number: 20080046736
    Abstract: A data processing system includes a system memory and a cache hierarchy that caches contents of the system memory. According to one method of data processing, a storage modifying operation having a cacheable target real memory address is received. A determination is made whether or not the storage modifying operation has an associated bypass indication. In response to determining that the storage modifying operation has an associated bypass indication, the cache hierarchy is bypassed, and an update indicated by the storage modifying operation is performed in the system memory. In response to determining that the storage modifying operation does not have an associated bypass indication, the update indicated by the storage modifying operation is performed in the cache hierarchy.
    Type: Application
    Filed: August 3, 2006
    Publication date: February 21, 2008
    Inventors: Ravi K. Arimilli, Francis P. O'Connell, Hazim Shafi, Derek E. Williams, Lixin Zhang
  • Publication number: 20080016330
    Abstract: A method and an apparatus for enabling a prefetch engine to detect and support hardware prefetching with different streams in received accesses. Multiple (simple) history tables are provided within (or associated with) the prefetch engine. Each of the multiple tables is utilized to detect different access patterns. The tables are indexed by different parts of the address and are accessed in a preset order to reduce the interference between different patterns. When an address does not fit the patterns of a first table, the address is passed to the next table to be checked for a match of different patterns. In this manner, different patterns may be detected at different tables within a single prefetch engine.
    Type: Application
    Filed: July 13, 2006
    Publication date: January 17, 2008
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Publication number: 20070283101
    Abstract: A method and an apparatus for performing just-in-time data prefetching within a data processing system comprising a processor, a cache or prefetch buffer, and at least one memory storage device. The apparatus comprises a prefetch engine having means for issuing a data prefetch request for prefetching a data cache line from the memory storage device for utilization by the processor. The apparatus further comprises logic/utility for dynamically adjusting a prefetch distance between issuance by the prefetch engine of the data prefetch request and issuance by the processor of a demand (load request) targeting the data/cache line being returned by the data prefetch request, so that a next data prefetch request for a subsequent cache line completes the return of the data/cache line at effectively the same time that a demand for that subsequent data/cache line is issued by the processor.
    Type: Application
    Filed: June 6, 2006
    Publication date: December 6, 2007
    Inventors: Wael R. El-Essawy, Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Patent number: 7281092
    Abstract: A system and method of managing cache hierarchies with adaptive mechanisms. A preferred embodiment of the present invention includes, in response to selecting a data block for eviction from a memory cache (the source cache) out of a collection of memory caches, examining a data structure to determine whether an entry exists that indicates that the data block has been evicted from the source memory cache, or another peer cache, to a slower cache or memory and subsequently retrieved from the slower cache or memory into the source memory cache or other peer cache. Also, a preferred embodiment of the present invention includes, in response to determining the entry exists in the data structure, selecting a peer memory cache out of the collection of memory caches at the same level in the hierarchy to receive the data block from the source memory cache upon eviction.
    Type: Grant
    Filed: June 2, 2005
    Date of Patent: October 9, 2007
    Assignee: International Business Machines Corporation
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, William E. Speight, Lixin Zhang
  • Publication number: 20070136535
    Abstract: A system and method for cache management in a data processing system. The data processing system includes a processor and a memory hierarchy. The memory hierarchy includes at least an upper memory cache, at least a lower memory cache, and a write-back data structure. In response to replacing data from the upper memory cache, the upper memory cache examines the write-back data structure to determine whether or not the data is present in the lower memory cache. If the data is present in the lower memory cache, the data is replaced in the upper memory cache without casting out the data to the lower memory cache.
    Type: Application
    Filed: February 14, 2007
    Publication date: June 14, 2007
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, William Speight, Lixin Zhang
  • Publication number: 20070101067
    Abstract: A data processing unit, method, and computer-usable medium for contention-based cache performance optimization. Two or more processing cores are coupled by an interconnect. Coupled to the interconnect is a memory hierarchy that includes a collection of caches. Resource utilization over a time interval is detected over the interconnect. Responsive to detecting a threshold of resource utilization of the interconnect, a functional mode of a cache from the collection of caches is selectively enabled.
    Type: Application
    Filed: October 27, 2005
    Publication date: May 3, 2007
    Inventors: Hazim Shafi, William Speight
  • Publication number: 20070088919
    Abstract: The present invention comprises a data access pattern interface that allows software to specify one or more data access patterns such as stream access patterns, pointer-chasing patterns and producer-consumer patterns. Software detects a data access pattern for a memory region and passes the data access pattern information to hardware via proper data access pattern instructions defined in the data access pattern interface. Hardware maintains the data access pattern information properly when the data access pattern instructions are executed. Hardware can then use the data access pattern information to dynamically detect data access patterns for a memory region throughout the program execution, and voluntarily invoke appropriate memory and cache operations such as pre-fetch, pre-send, acquire-ownership and release-ownership.
    Type: Application
    Filed: October 14, 2005
    Publication date: April 19, 2007
    Applicant: International Business Machines
    Inventors: Xiaowei Shen, Hazim Shafi
  • Publication number: 20070083716
    Abstract: A method for sequentially coupling successive processor requests for a cache line before the data is received in the cache of a first coupled processor. Both homogenous and non-homogenous operations are chained to each other, and the coherency protocol includes several new intermediate coherency responses associated with the chained states. Chained coherency states are assigned to track the chain of processor requests and the grant of access permission prior to receipt of the data at the first processor. The chained coherency states also identify the address of the receiving processor. When data is received at the cache of the first processor within the chain, the processor completes its operation on (or with) the data and then forwards the data to the next processor in the chain. The chained coherency protocol frees up address bus bandwidth by reducing the number of retries.
    Type: Application
    Filed: October 6, 2005
    Publication date: April 12, 2007
    Inventors: Ramakrishnan Rajamony, Hazim Shafi, Derek Williams, Kenneth Wright