Patents by Inventor Gerard R. Williams, III
Gerard R. Williams, III has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9383806Abstract: An apparatus for performing instruction throttling for a multi-processor system is disclosed. The apparatus may include a power estimation circuit, a table, a comparator, and a finite state machine. The power estimation circuit may be configured to receive information on high power instructions issued to a first processor and a second processor, and generate a power estimate dependent upon the received information. The table may be configured to store one or more pre-determined power threshold values, and the comparator may be configured to compare the power estimate with at least one of the pre-determined power threshold values. The finite state machine may be configured to adjust the throttle level of the first and second processors dependent upon the result of the comparison.Type: GrantFiled: April 17, 2013Date of Patent: July 5, 2016Assignee: Apple Inc.Inventors: Wei-Han Lien, Gerard R Williams, III, Rohit Kumar, Sandeep Gupta, Suresh Periyacheri, Shih-Chieh R Wen
-
Publication number: 20160147290Abstract: In an embodiment, an integrated circuit may include one or more processors. Each processor may include multiple processor cores, and each core has a different design/implementation and performance level. For example, a core may be implemented for high performance, and another core may be implemented at a lower maximum performance, but may be optimized for efficiency. Additionally, in some embodiments, some features of the instruction set architecture implemented by the processor may be implemented in only one of the cores that make up the processor. If such a feature is invoked by a code sequence while a different core is active, the processor may swap cores to the core the implements the feature. Alternatively, an exception may be taken and an exception handler may be executed to identify the feature and activate the corresponding core.Type: ApplicationFiled: November 20, 2014Publication date: May 26, 2016Inventors: David J. Williamson, Gerard R. Williams, III, James N. Hardage, Jr., Richard F. Russo
-
Publication number: 20160147289Abstract: In an embodiment, an integrated circuit may include one or more processors. Each processor may include multiple processor cores, and each core has a different design/implementation and performance level. For example, a core may be implemented for high performance, but may have higher minimum voltage at which it operates correctly. Another core may be implemented at a lower maximum performance, but may be optimized for efficiency and may operate correctly at a lower minimum voltage. The processor may support multiple processor states (PStates). Each PState may specify an operating point and may be mapped to one of the processor cores. During operation, one of the cores is active: the core to which the current PState is mapped. If a new PState is selected and is mapped to a different core, the processor may automatically context switch the processor state to the newly-selected core and may begin execution on that core.Type: ApplicationFiled: November 20, 2014Publication date: May 26, 2016Inventors: David J. Williamson, Gerard R. Williams, III
-
Patent number: 9336003Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.Type: GrantFiled: January 25, 2013Date of Patent: May 10, 2016Assignee: Apple Inc.Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
-
Patent number: 9317285Abstract: A system and method for efficiently reducing the power consumption of register file accesses. A processor is operable to execute instructions with two or more data types, each with an associated size and alignment. Data operands for a first data type use operand sizes equal to an entire width of a physical register within a physical register file. Data operands for a second data type use operand sizes less than an entire width of a physical register. Accesses of the physical register file for operands associated with a non-full-width data type do not access a full width of the physical registers. A given numerical value may be bypassed for the portion of the physical register that is not accessed.Type: GrantFiled: April 30, 2012Date of Patent: April 19, 2016Assignee: Apple Inc.Inventors: Sandeep Gupta, Conrado Blasco-Allue, John H. Mylius, Gerard R. Williams, III, James B. Keller
-
Patent number: 9311100Abstract: A circuit for implementing a branch target buffer. The branch target buffer may include a memory that stores a plurality of entries. Each entry may include a tag value, a target value, and a prediction accuracy value. A received index value corresponding to an indirect branch instruction may be used to select one of entries of the plurality of entries, and a received tag value may then be compared to the tag value of the selected entries in the memory. An entry in the memory may be selected in response to a determination that the received tag does not match the tag value of compared entries. The selected entry may be allocated to the indirect instruction branch dependent upon the prediction accuracy values of the plurality of entries.Type: GrantFiled: January 7, 2013Date of Patent: April 12, 2016Assignee: Apple Inc.Inventors: Sandeep Gupta, Shyam Sundar, Wei-Han Lien, Gerard R. Williams, III, Conrado Blasco-Allue
-
Patent number: 9280471Abstract: Systems, processors, and methods for sharing an agent's private cache with other agents within a SoC. Many agents in the SoC have a private cache in addition to the shared caches and memory of the SoC. If an agent's processor is shut down or operating at less than full capacity, the agent's private cache can be shared with other agents. When a requesting agent generates a memory request and the memory request misses in the memory cache, the memory cache can allocate the memory request in a separate agent's cache rather than allocating the memory request in the memory cache.Type: GrantFiled: November 15, 2013Date of Patent: March 8, 2016Assignee: Apple Inc.Inventors: Manu Gulati, Harshavardhan Kaushikkar, Gurjeet S. Saund, Wei-Han Lien, Gerard R. Williams, III, Sukalpa Biswas, Brian P. Lilly, Shinye Shiu
-
Publication number: 20160055099Abstract: A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.Type: ApplicationFiled: November 2, 2015Publication date: February 25, 2016Inventors: Brian P. Lilly, Gerard R. Williams, III, Mahnaz Sadoughi-Yarandi, Perumal R. Subramonium, Hari S. Kannan, Prashant Jain
-
Publication number: 20160048395Abstract: In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, a branch direction predictor may be updated responsive to a misprediction and also responsive to the branch prediction being within a threshold of transitioning between predictions. To avoid a lookup to determine if the threshold update is to be performed, the branch predictor may detect the threshold update during prediction, and may transmit an indication with the branch.Type: ApplicationFiled: October 27, 2015Publication date: February 18, 2016Inventors: Ian D. Kountanis, Gerard R. Williams, III, James B. Keller
-
Patent number: 9223577Abstract: Various techniques for processing instructions that specify multiple destinations. A first portion of a processor pipeline is configured to split a multi-destination instruction into a plurality of single-destination operations. A second portion of the pipeline is configured to process the plurality of single-destination operations. A third portion of the pipeline is configured to merge the plurality of single-destination operations into one or more multi-destination operations. The one or more multi-destination operations may be performed. The first portion of the pipeline may include a decode unit. The second portion of the pipeline may include a map unit, which may in turn include circuitry configured to maintain a list of free architectural registers and a mapping table that maps physical registers to architectural registers. The third portion of the pipeline may comprise a dispatch unit. In some embodiments, this may provide certain advantages such as reduced area and/or power consumption.Type: GrantFiled: September 26, 2012Date of Patent: December 29, 2015Assignee: Apple Inc.Inventors: John H. Mylius, Gerard R. Williams, III, James B. Keller, Fang Liu, Shyam Sundar
-
Patent number: 9201658Abstract: In an embodiment, a processor may be configured to fetch N instruction bytes from an instruction cache (a “fetch group”), even if the fetch group crosses a cache line boundary. A branch predictor may be configured to produce branch predictions for up to M branches in the fetch group, where M is a maximum number of branches that may be included in the fetch group. In an embodiment, branch prediction values from multiple entries in each table may be read and respective branch prediction values may be combined to form branch predictions for up to M branches in the fetch group.Type: GrantFiled: September 24, 2012Date of Patent: December 1, 2015Assignee: Apple Inc.Inventors: Ian D. Kountanis, Gerard R. Williams, III, James B. Keller
-
Patent number: 9176879Abstract: A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.Type: GrantFiled: July 19, 2013Date of Patent: November 3, 2015Assignee: Apple Inc.Inventors: Brian P. Lilly, Gerard R. Williams, III, Mahnaz Sadoughi-Yarandi, Perumal R. Subramonium, Hari S. Kannan, Prashant Jain
-
Patent number: 9128725Abstract: Methods and apparatuses for managing load-store dependencies in an out-of-order processor. A load store dependency predictor may include a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes a counter to indicate a strength of the dependency prediction. If the counter is above a threshold, a dependency is enforced for the load-store pair. If the counter is below the threshold, the dependency is not enforced for the load-store pair. When a store is dispatched, the table is searched, and any matching entries in the table are armed. If a load is dispatched, matches on an armed entry, and the counter is above the threshold, then the load will wait to issue until the corresponding store issues.Type: GrantFiled: May 4, 2012Date of Patent: September 8, 2015Assignee: Apple Inc.Inventors: Stephan G. Meier, John H. Mylius, Gerard R. Williams, III, Suparn Vats
-
Patent number: 9128857Abstract: Techniques are disclosed related to flushing one or more data caches. In one embodiment an apparatus includes a processing element, a first cache associated with the processing element, and a circuit configured to copy modified data from the first cache to a second cache in response to determining an activity level of the processing element. In this embodiment, the apparatus is configured to alter a power state of the first cache after the circuit copies the modified data. The first cache may be at a lower level in a memory hierarchy relative to the second cache. In one embodiment, the circuit is also configured to copy data from the second cache to a third cache or a memory after a particular time interval. In some embodiments, the circuit is configured to copy data while one or more pipeline elements of the apparatus are in a low-power state.Type: GrantFiled: January 4, 2013Date of Patent: September 8, 2015Assignee: Apple Inc.Inventors: Brian P. Lilly, Gerard R. Williams, III
-
Patent number: 9098418Abstract: Processors and methods for coordinating prefetch units at multiple cache levels. A single, unified training mechanism is utilized for training on streams generated by a processor core. Prefetch requests are sent from the core to lower level caches, and a packet is sent with each prefetch request. The packet identifies the stream ID of the prefetch request and includes relevant training information for the particular stream ID. The lower level caches generate prefetch requests based on the received training information.Type: GrantFiled: March 20, 2012Date of Patent: August 4, 2015Assignee: Apple Inc.Inventors: Hari S. Kannan, Brian P. Lilly, Gerard R. Williams, III, Mahnaz Sadoughi-Yarandi, Perumal R. Subramoniam, Pradeep Kanapathipillai
-
Patent number: 9043554Abstract: Systems, processors, and methods for keeping uncacheable data coherent. A processor includes a multi-level cache hierarchy, and uncacheable load memory operations can be cached at any level of the cache hierarchy. If an uncacheable load misses in the L2 cache, then allocation of the uncacheable load will be restricted to a subset of the ways of the L2 cache. If an uncacheable store memory operation hits in the L1 cache, then the hit cache line can be updated with the data from the memory operation. If the uncacheable store misses in the L1 cache, then the uncacheable store is sent to a core interface unit. Multiple contiguous store misses are merged into larger blocks of data in the core interface unit before being sent to the L2 cache.Type: GrantFiled: December 21, 2012Date of Patent: May 26, 2015Assignee: Apple Inc.Inventors: Brian P. Lilly, Gerard R. Williams, III, Perumal R. Subramoniam, Pradeep Kanapathipillai
-
Publication number: 20150143044Abstract: Systems, processors, and methods for sharing an agent's private cache with other agents within a SoC. Many agents in the SoC have a private cache in addition to the shared caches and memory of the SoC. If an agent's processor is shut down or operating at less than full capacity, the agent's private cache can be shared with other agents. When a requesting agent generates a memory request and the memory request misses in the memory cache, the memory cache can allocate the memory request in a separate agent's cache rather than allocating the memory request in the memory cache.Type: ApplicationFiled: November 15, 2013Publication date: May 21, 2015Applicant: APPLE INC.Inventors: Manu Gulati, Harshavardhan Kaushikkar, Gurjeet S. Saund, Wei-Han Lien, Gerard R. Williams, III, Sukalpa Biswas, Brian P. Lilly, Shinye Shiu
-
Patent number: 9015422Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetcher in which patterns may include wild cards for some cache blocks. The wild card may match any access for the corresponding cache block (e.g. no access, demand access, prefetch, successful prefetch, etc.). Furthermore, patterns with irregular strides and/or irregular access patterns may be included in the matching patterns and may be detected for prefetch generation. In an embodiment, the AMPM prefetcher may implement a chained access map for large streaming prefetches. If a stream is detected, the AMPM prefetcher may allocate a pair of map entries for the stream and may reuse the pair for subsequent access map regions within the stream. In some embodiments, a quality factor may be associated with each access map and may control the rate of prefetch generation.Type: GrantFiled: July 16, 2013Date of Patent: April 21, 2015Assignee: Apple Inc.Inventors: Stephan G. Meier, Gerard R. Williams, III, Hari S. Kannan, Pavlos Konas
-
Publication number: 20150026413Abstract: In an embodiment, a processor may implement an access map-pattern match (AMPM)-based prefetcher in which patterns may include wild cards for some cache blocks. The wild card may match any access for the corresponding cache block (e.g. no access, demand access, prefetch, successful prefetch, etc.). Furthermore, patterns with irregular strides and/or irregular access patterns may be included in the matching patterns and may be detected for prefetch generation. In an embodiment, the AMPM prefetcher may implement a chained access map for large streaming prefetches. If a stream is detected, the AMPM prefetcher may allocate a pair of map entries for the stream and may reuse the pair for subsequent access map regions within the stream. In some embodiments, a quality factor may be associated with each access map and may control the rate of prefetch generation.Type: ApplicationFiled: July 16, 2013Publication date: January 22, 2015Inventors: Stephan G. Meier, Gerard R. Williams, III, Hari S. Kannan, Pavlos Konas
-
Publication number: 20150026404Abstract: A mechanism for evicting a cache line from a cache memory includes first selecting for eviction a least recently used cache line of a group of invalid cache lines. If all cache lines are valid, selecting for eviction a least recently used cache line of a group of cache lines in which no cache line of the group of cache lines is also stored within a higher level cache memory such as the L1 cache, for example. Lastly, if all cache lines are valid and there are no non-inclusive cache lines, selecting for eviction the least recently used cache line stored in the cache memory.Type: ApplicationFiled: July 19, 2013Publication date: January 22, 2015Applicant: Apple Inc.Inventors: Brian P. Lilly, Gerard R. Williams, III, Mahnaz Sadoughi-Yarandi, Perumal R. Subramonium, Hari S. Kannan, Prashant Jain