Patents by Inventor Volker Strumpen
Volker Strumpen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7933940Abstract: Parallel prefix circuits for computing a cyclic segmented prefix operation with a mesh topology are disclosed. In one embodiment of the present invention, the elements (prefix nodes) of the mesh are arranged in row-major order. Values are accumulated toward the center of the mesh and partial results are propagated outward from the center of the mesh to complete the cyclic segmented prefix operation. This embodiment has been shown to be time-optimal. In another embodiment of the present invention, the prefix nodes are arranged such that the prefix node corresponding to the last element in the array is located at the center of the array. This alternative embodiment is not only time-optimal when accounting for wire-lengths (and therefore propagation delays), but it is also asympotically optimal in terms of minimizing the number of segmented prefix operators.Type: GrantFiled: April 20, 2006Date of Patent: April 26, 2011Assignee: International Business Machines CorporationInventors: Matteo Frigo, Volker Strumpen
-
Publication number: 20100122034Abstract: A tile for use in a tiled storage array provides re-organization of values within the tile array without requiring sophisticated global control. The tiles operate to move a requested value to a front-most storage element of the tile array according to a global systolic clock. The previous occupant of the front-most location is moved or swapped backward according to the systolic clock, and the new occupant is moved forward according to the systolic clock, according to the operation of the tiles, while providing for multiple in-flight access requests within the tile array. The placement heuristic that moves the values is determined according to the position of the tiles within the array and the behavior of the tiles. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the tile array.Type: ApplicationFiled: November 13, 2008Publication date: May 13, 2010Applicant: International Business Machines CorporationInventors: Volker Strumpen, Matteo Frigo
-
Publication number: 20100122012Abstract: Systolic networks within a tiled storage array provide for movement of requested values to a front-most tile, while making space for the requested values at the front-most tile by moving other values away. A first and second information pathway provide different linear pathways through the tiles. The movement of other values, requests for values and responses to requests is controlled according to a clocking logic that governs the movement on the first and second information pathways according to a systolic duty cycle. The first information pathway may be a move-to-front network of a spiral cache, crossing the spiral push-back network which forms the push-back network. The systolic duty cycle may be a three-phase duty cycle, or a two-phase duty cycle may be provided if the storage tiles support a push-back swap operation.Type: ApplicationFiled: December 17, 2009Publication date: May 13, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Fadi H. Gebara, Jeremy D. Schaub, Volker Strumpen
-
Publication number: 20100122031Abstract: A spiral cache memory provides low access latency for frequently-accessed values by self-organizing to always move a requested value to a front-most storage tile of the spiral. If the spiral cache needs to eject a value to make space for a value moved to the front-most tile, space is made by ejecting a value from the cache to a backing store. A buffer along with flow control logic is used to prevent overflow of writes of ejected values to the generally slow backing store. The tiles in the spiral cache may be single storage locations or be organized as some form of cache memory such as direct-mapped or set-associative caches. Power consumption of the spiral cache can be reduced by dividing the cache into an active and inactive partition, which can be adjusted on a per-tile basis. Tile-generated or global power-down decisions can set the size of the partitions.Type: ApplicationFiled: November 13, 2008Publication date: May 13, 2010Applicant: International Business Machines CorporationInventors: Volker Strumpen, Matteo Frigo
-
Publication number: 20100122033Abstract: An integrated memory system with a spiral cache responds to requests for values at a first external interface coupled to a particular storage location in the cache in a time period determined by the proximity of the requested values to the particular storage location. The cache supports multiple outstanding in-flight requests directed to the same address using an issue table that tracks multiple outstanding requests and control logic that applies the multiple requests to the same address in the order received by the cache memory. The cache also includes a backing store request table that tracks push-back write operations issued from the cache memory when the cache memory is full and a new value is provided from the external interface, and the control logic to prevent multiple copies of the same value from being loaded into the cache or a copy being loaded before a pending push-back has been completed.Type: ApplicationFiled: December 17, 2009Publication date: May 13, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Fadi H. Gebara, Jeremy D. Schaub, Volker Strumpen
-
Publication number: 20100122057Abstract: A tiled storage array provides reduction in access latency for frequently-accessed values by re-organizing to always move a requested value to a front-most storage element of array. The previous occupant of the front-most location is moved backward according to a systolic pulse, and the new occupant is moved forward according to the systolic pulse, preserving the uniqueness of the stored values within the array, and providing for multiple in-flight access requests within the array. The placement heuristic that moves the values according to the systolic pulse can be implemented by control logic within identical tiles, so that the placement heuristic moves the values according to the position of the tiles within the array. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the array.Type: ApplicationFiled: November 13, 2008Publication date: May 13, 2010Applicant: International Business Machines CorporationInventors: Volker Strumpen, Matteo Frigo
-
Publication number: 20100122035Abstract: A spiral cache memory provides reduction in access latency for frequently-accessed values by self-organizing to always move a requested value to a front-most central storage element of the spiral. The occupant of the central location is swapped backward, which continues backward through the spiral until an empty location is swapped-to, or the last displaced value is cast out of the last location in the spiral. The elements in the spiral may be cache memories or single elements. The resulting cache memory is self-organizing and for the one-dimensional implementation has a worst-case access time proportional to N, where N is the number of tiles in the spiral. A k-dimensional spiral cache has a worst-case access time proportional to N1/k. Further, a spiral cache system provides a basis for a non-inclusive system of cache memory, which reduces the amount of space and power consumed by a cache memory of a given size.Type: ApplicationFiled: November 13, 2008Publication date: May 13, 2010Applicant: International Business Machines CorporationInventors: Volker Strumpen, Matteo Frigo
-
Publication number: 20090326840Abstract: A method, system and computer program product for generating device fingerprints and authenticating devices uses initial states of internal storage cells after each of a number multiple power cycles for each of a number of device temperatures to generate a device fingerprint. The device fingerprint may include pairs of expected values for each of the internal storage cells and a corresponding probability that the storage cell will assume the expected value. Storage cells that have expected values varying over the multiple temperatures may be excluded from the fingerprint. A device is authenticated by a similarity algorithm that uses a match of the expected values from a known fingerprint with power-up values from an unknown device, weighting the comparisons by the probability for each cell to compute a similarity measure.Type: ApplicationFiled: June 26, 2008Publication date: December 31, 2009Applicant: International Business Machines CorporationInventors: Fadi H. Gebara, Joonsoo Kim, Jeremy D. Schaub, Volker Strumpen
-
Publication number: 20070260663Abstract: Parallel prefix circuits for computing a cyclic segmented prefix operation with a mesh topology are disclosed. In one embodiment of the present invention, the elements (prefix nodes) of the mesh are arranged in row-major order. Values are accumulated toward the center of the mesh and partial results are propagated outward from the center of the mesh to complete the cyclic segmented prefix operation. This embodiment has been shown to be time-optimal. In another embodiment of the present invention, the prefix nodes are arranged such that the prefix node corresponding to the last element in the array is located at the center of the array. This alternative embodiment is not only time-optimal when accounting for wire-lengths (and therefore propagation delays), but it is also asympotically optimal in terms of minimizing the number of segmented prefix operators.Type: ApplicationFiled: April 20, 2006Publication date: November 8, 2007Inventors: Matteo Frigo, Volker Strumpen
-
Publication number: 20060230409Abstract: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new threads and having a novel operational semantics. If a hardware thread is available to shepherd a forked thread, the fork and join instructions have thread creation and termination/synchronization semantics, respectively. If no hardware thread is available, however, the fork and join instructions assume subroutine call and return semantics respectively. The link register of the processor is used to determine whether a given join instruction should be treated as a thread synchronization operation or as a return from subroutine operation.Type: ApplicationFiled: April 7, 2005Publication date: October 12, 2006Inventors: Matteo Frigo, Ahmed Gheith, Volker Strumpen
-
Publication number: 20060230408Abstract: A method and processor architecture for achieving a high level of concurrency and latency hiding in an “infinite-thread processor architecture” with a limited number of hardware threads is disclosed. A preferred embodiment defines “fork” and “join” instructions for spawning new context-switched threads. Context switching is used to hide the latency of both memory-access operations (i.e., loads and stores) and arithmetic/logical operations. When an operation executing in a thread incurs a latency having the potential to delay the instruction pipeline, the latency is hidden by performing a context switch to a different thread. When the result of the operation becomes available, a context switch back to that thread is performed to allow the thread to continue.Type: ApplicationFiled: April 7, 2005Publication date: October 12, 2006Inventors: Matteo Frigo, Ahmed Gheith, Volker Strumpen