Patents by Inventor David Kruckemyer

David Kruckemyer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160188472
    Abstract: A distributed implementation for cache coherence comprises distinct agent interface units, coherency controllers, and memory interface units. The distinct units are separated logically and physically. Units are interconnected, and communicate with each other, by a transport network. Different organizations of connectivity are possible and chosen based on system performance and physical floorplan design constraints. The cache coherence subsystem is designed using software that exports a description of the system in a hardware description language.
    Type: Application
    Filed: July 23, 2015
    Publication date: June 30, 2016
    Applicant: ARTERIS, INC.
    Inventors: Craig Stephen Forrest, David A. Kruckemyer, David M. Parry
  • Publication number: 20160188470
    Abstract: A system and method for performing coherent cache snoops whereby a single or limited number of sharing coherent agents are snooped for a data access. A directory may store information identifying which coherent agents have a shared copy of a cache line. If more than one might be in a shared state, one is promoted to an owner state within the directory. Accesses to the shared cache line are responded to by a snoop to just one, or a number less than all, of the caching agents sharing the cache line.
    Type: Application
    Filed: December 31, 2014
    Publication date: June 30, 2016
    Inventors: David A. Kruckemyer, Craig Stephen Forrest
  • Patent number: 9280479
    Abstract: A memory system having increased throughput is disclosed. Specifically, the memory system includes a first level write combining queue that reduces the number of data transfers between a level one cache and a level two cache. In addition, a second level write merging buffer can further reduce the number of data transfers within the memory system. The first level write combining queue receives data from the level one cache. The second level write merging buffer receives data from the first level write combining queue. The level two cache receives data from both the first level write combining queue and the second level write merging buffer. Specifically, the first level write combining queue combines multiple store transactions from the load store units to associated addresses. In addition, the second level write merging buffer merges data from the first level write combining queue.
    Type: Grant
    Filed: May 22, 2012
    Date of Patent: March 8, 2016
    Assignee: Applied Micro Circuits Corporation
    Inventors: David A. Kruckemyer, John Gregory Favor, Matthew W. Ashcraft
  • Publication number: 20150188832
    Abstract: A multiple channel data transfer system (10) includes a source (12) that generates data packets with sequence numbers for transfer over multiple request channels (14). Data packets are transferred over the multiple request channels (14) through a network (16) to a destination (18). The destination (18) re-orders the data packets received over the multiple request channels (14) into a proper sequence in response to the sequence numbers to facilitate data processing. The destination (18) provides appropriate reply packets to the source (12) over multiple response channels (20) to control the flow of data packets from the source (12).
    Type: Application
    Filed: March 2, 2015
    Publication date: July 2, 2015
    Inventors: Randal G. Martin, Steven C. Miller, Mark D. Stadler, David A. Kruckemyer
  • Patent number: 8971329
    Abstract: A multiple channel data transfer system (10) includes a source (12) that generates data packets with sequence numbers for transfer over multiple request channels (14). Data packets are transferred over the multiple request channels (14) through a network (16) to a destination (18). The destination (18) re-orders the data packets received over the multiple request channels (14) into a proper sequence in response to the sequence numbers to facilitate data processing. The destination (18) provides appropriate reply packets to the source (12) over multiple response channels (20) to control the flow of data packets from the source (12).
    Type: Grant
    Filed: November 18, 2008
    Date of Patent: March 3, 2015
    Assignee: Silicon Graphics International Corp.
    Inventors: Randal G. Martin, Steven C. Miller, Mark D. Stadler, David A. Kruckemyer
  • Patent number: 8850121
    Abstract: A load/store unit with an outstanding load miss buffer and a load miss result buffer is configured to read data from a memory system having a level one cache. Missed load instructions are stored in the outstanding load miss buffer. The load/store unit retrieves data for multiple dependent missed load instructions using a single cache access and stores the data in the load miss result buffer. The outstanding load miss buffer stores a first missed load instruction in a first primary entry. Additional missed load instructions that are dependent on the first missed load instructions are stored in dependent entries of the first primary entry or in shared entries. If a shared entry is used for a missed load instruction the shared entry is associated with the primary entry.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: September 30, 2014
    Assignee: Applied Micro Circuits Corporation
    Inventors: Matthew W. Ashcraft, John Gregory Favor, David A. Kruckemyer
  • Patent number: 8806135
    Abstract: A load/store unit with an outstanding load miss buffer and a load miss result buffer is configured to read data from a memory system having a level one cache. Missed load instructions are stored in the outstanding load miss buffer. The load/store unit retrieves data for multiple dependent missed load instructions using a single cache access and stores the data in the load miss result buffer. When missed load instructions are reissued from the outstanding load miss buffer, data for the missed load instructions are read from the load miss result buffer rather than the level one cache. Because the data is stored in the load miss result buffer, other instructions that may change the data in level one cache do not cause data hazards with the missed load instructions.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: August 12, 2014
    Assignee: Applied Micro Circuits Corporation
    Inventors: Matthew W. Ashcraft, John Gregory Favor, David A. Kruckemyer
  • Patent number: 8793435
    Abstract: A load/store unit with an outstanding load miss buffer and a load miss result buffer is configured to read data from a memory system having a level one cache. Missed load instructions are stored in the outstanding load miss buffer. The load/store unit retrieves data for multiple dependent missed load instructions using a single memory access and stores the data in the load miss result buffer. The load miss result buffer includes dependent data lines, dependent data selection circuits, shared data lines and shared data selection circuits. The dependent data selection circuits are configured to select a subset of data from the memory system for storing in an associated dependent data line. Similarly, the shared data selection circuits are configured to select a subset of data from the memory system for storing in an associated shared data line.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: July 29, 2014
    Assignee: Applied Micro Circuits Corporation
    Inventors: Matthew W. Ashcraft, John Gregory Favor, David A. Kruckemyer
  • Patent number: 7453878
    Abstract: A multiple channel data transfer system (10) includes a source (12) that generates data packets with sequence numbers for transfer over multiple request channels (14). Data packets are transferred over the multiple request channels (14) through a network (16) to a destination (18). The destination (18) re-orders the data packets received over the multiple request channels (14) into a proper sequence in response to the sequence numbers to facilitate data processing. The destination (18) provides appropriate reply packets to the source (12) over multiple response channels (20) to control the flow of data packets from the source (12).
    Type: Grant
    Filed: July 20, 2001
    Date of Patent: November 18, 2008
    Assignee: Silicon Graphics, Inc.
    Inventors: Randal G. Martin, Steven C. Miller, Mark D. Stadler, David A. Kruckemyer
  • Patent number: 7437597
    Abstract: A write-back cache has error-correction code (ECC) fields storing ECC bits for cache lines. Clean cache lines are re-fetched from memory when an ECC error is detected. Dirty cache lines are corrected using the ECC bits or signal an uncorrectable error. The type of ECC code stored is different for clean and dirty lines. Clean lines use an error-detection code that can detect longer multi-bit errors than the error correction code used by dirty lines. Dirty lines use a correction code that can correct a bit error in the dirty line, while the detection code for clean lines may not be able to correct any errors. Dirty lines' ECC is optimized for correction while clean lines' ECC is optimized for detection. A single-error-correction, double-error-detection (SECDED) code may be used for dirty lines while a triple-error-detection code is used for clean lines.
    Type: Grant
    Filed: May 18, 2005
    Date of Patent: October 14, 2008
    Assignee: Azul Systems, Inc.
    Inventors: David A. Kruckemyer, Kevin B. Normoyle, Jack H. Choquette
  • Patent number: 7366847
    Abstract: A multi-processor, multi-cache system has filter pipes that store entries for request messages sent to a central coherency controller. The central coherency controller orders requests from filter pipes using coherency rules but does not track completion of invalidations. The central coherency controller reads snoop tags to identify sharing caches having a copy of a requested cache line. The central coherency controller sends an ordering message to the requesting filter pipe. The ordering message has an invalidate count indicating the number of sharing caches. Each sharing cache receives an invalidation message from the central coherency controller, invalidates its copy of the cache line, and sends an invalidation acknowledgement message to the requesting filter pipe. The requesting filter pipe decrements the invalidate count until all sharing caches have acknowledged invalidation.
    Type: Grant
    Filed: February 6, 2006
    Date of Patent: April 29, 2008
    Assignee: Azul Systems, Inc.
    Inventors: David A. Kruckemyer, Kevin B. Normoyle, Robert G. Hathaway
  • Patent number: 7296141
    Abstract: A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. The second tag may equal the first tag if the branch delay slot is unconditional for that branch, and may equal a different tag if the branch delay slot is conditional for the branch. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages. If the tag in a pipeline stage matches the first tag, the instruction is not cancelled. If the tag mismatches, the instruction is cancelled.
    Type: Grant
    Filed: August 18, 2004
    Date of Patent: November 13, 2007
    Assignee: Broadcom Corporation
    Inventor: David A. Kruckemyer
  • Patent number: 7269714
    Abstract: A processor is described which includes a first pipeline, a second pipeline, and a control circuit. The first pipeline includes a first stage at which instruction results are committed to architected state. The first stage is separated from an issue stage of the first pipeline by a first number of stages. The second pipeline includes a second stage at which an exception is reportable, wherein the second stage is separated from the issue stage of the second pipeline by a second number of stages which is greater than the first number. The control circuit is configured to inhibit co-issuance of a first instruction to the first pipeline and a second instruction to the second pipeline if the first instruction is subsequent to the second instruction in program order.
    Type: Grant
    Filed: February 4, 2002
    Date of Patent: September 11, 2007
    Assignee: Broadcom Corporation
    Inventors: Tse-Yu Yeh, David A. Kruckemyer, Robert Rogenmoser
  • Publication number: 20070186054
    Abstract: A multi-processor, multi-cache system has filter pipes that store entries for request messages sent to a central coherency controller. The central coherency controller orders requests from filter pipes using coherency rules but does not track completion of invalidations. The central coherency controller reads snoop tags to identify sharing caches having a copy of a requested cache line. The central coherency controller sends an ordering message to the requesting filter pipe. The ordering message has an invalidate count indicating the number of sharing caches. Each sharing cache receives an invalidation message from the central coherency controller, invalidates its copy of the cache line, and sends an invalidation acknowledgement message to the requesting filter pipe. The requesting filter pipe decrements the invalidate count until all sharing caches have acknowledged invalidation.
    Type: Application
    Filed: February 6, 2006
    Publication date: August 9, 2007
    Inventors: David Kruckemyer, Kevin Normoyle, Robert Hathaway
  • Patent number: 7225300
    Abstract: Several cluster chips and a shared main memory are connected by interconnect buses. Each cluster chip has multiple processors using multiple level-2 local caches, two memory controllers and two snoop tag partitions. The interconnect buses connect all local caches to all snoop tag partitions on all cluster chips. Each snoop tag partition has all the system's snoop tags for a partition of the main memory space. The snoop index is a subset of the cache index, with remaining chip-select and interleave address bits selecting which of the snoop tag partitions on the multiple cluster chips stores snoop tags for that address. The number of snoop entries in a snoop set is equal to a total number of cache entries in one cache index for all local caches on all cluster chips. Cache coherency request processing is distributed among the snoop tag partitions on different cluster chips, reducing bottlenecks.
    Type: Grant
    Filed: September 15, 2004
    Date of Patent: May 29, 2007
    Assignee: Azul Systems, Inc
    Inventors: Jack H. Choquette, David A. Kruckemyer, Robert G. Hathaway
  • Patent number: 7219216
    Abstract: A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages to determine which instructions to cancel. The assignment of tags for a fetch group of concurrently fetched instructions may be performed in parallel. A plurality of branch sequence numbers may be generated, and one of the plurality may be selected for each instruction responsive to the cumulative number of branch instructions preceding that instruction within the fetch group. The selection may be further responsive to whether or not the instruction is in a conditional delay slot.
    Type: Grant
    Filed: January 28, 2005
    Date of Patent: May 15, 2007
    Assignee: Broadcom Corporation
    Inventor: David A. Kruckemyer
  • Patent number: 7203827
    Abstract: A link address/sequential address generation circuit is provided for generating a link/sequential address. The circuit receives the most significant bits of at least two addresses: a first address of a first set of bytes including a branch instruction and a second address of a second set of bytes contiguous to the first set. The least significant bits of the branch PC (those bits not included in the most significant bits of the addresses received by the circuit) are used to generate the least significant bits of the link/sequential address and to select one of the first address and the second address to supply the most significant bits.
    Type: Grant
    Filed: March 1, 2005
    Date of Patent: April 10, 2007
    Assignee: Broadcom Corporation
    Inventors: David A. Kruckemyer, Daniel C. Murray
  • Patent number: 6993632
    Abstract: A system may include two or more agents, at least some of which may cache data. In response to a read transaction, a caching agent may snoop its cached data and provide a response in a response phase of the transaction. Particularly, the response may include an exclusive indication used to represent both exclusive and modified states within that agent. In one embodiment, the agent responding exclusive may be responsible for providing the data for a read transaction, and may transmit an indication of which of the exclusive or modified state that agent had the data in concurrent with transmitting the data.
    Type: Grant
    Filed: June 1, 2004
    Date of Patent: January 31, 2006
    Assignee: Broadcom Corporation
    Inventors: David A. Kruckemyer, Joseph B. Rowlands
  • Patent number: 6976152
    Abstract: An apparatus for a processor includes a first scoreboard, a second scoreboard, and a control circuit coupled to the first scoreboard and the second scoreboard. The control circuit is configured to update the first scoreboard to indicate that a write is pending for a first destination register of a first instruction in response to issuing the first instruction into a first pipeline. The control circuit is configured to update the second scoreboard to indicate that the write is pending for the first destination register in response to the first instruction passing a first stage of the pipeline. Replay may be signaled for a given instruction at the first stage. In response to a replay of a second instruction, the control circuit is configured to copy a contents of the second scoreboard to the first scoreboard. In various embodiments, additional scoreboards may be used for detecting different types of dependencies.
    Type: Grant
    Filed: February 4, 2002
    Date of Patent: December 13, 2005
    Assignee: Broadcom Corporation
    Inventors: Tse-Yu Yeh, David A. Kruckemyer, Randel P. Blake-Campos, Robert Rogenmoser, Robert Stepanian
  • Patent number: 6971038
    Abstract: A processor may include an execution circuit, an issue circuit coupled to the execution circuit, and a clock tree for clocking circuitry in the processor. The issue circuit issues an instruction to the execution circuit, and generates a control signal responsive to whether or not the instruction is issued to the execution circuit. The execution circuit includes at least a first subcircuit and a second subcircuit. A portion of the clock tree supplies a plurality of clocks to the execution circuit, including at least a first clock clocking the first subcircuit and at least a second clock clocking the second subcircuit. The portion of the clock tree is coupled to receive the control signal for collectively conditionally gating the plurality of clock, and is also configured to individually conditionally gate at least some of the plurality of clocks responsive to activity in the respective subcircuits of the execution circuit.
    Type: Grant
    Filed: February 1, 2002
    Date of Patent: November 29, 2005
    Assignee: Broadcom Corporation
    Inventors: Sribalan Santhanam, Vincent R. von Kaenel, David A. Kruckemyer