Patents Examined by Cheng-Yuan Tseng
  • Patent number: 11803378
    Abstract: A method for processing a medical image is provided. The method may include obtaining the medical image, and processing the medical image using a processing program. The processing program may include one or more optimized computation units. The one or more optimized computation units may be optimized by an instruction set supported by the at least one CPU. The instruction set may be configured to optimize at least one of an operation time of the processing program, a resource of the at least one CPU occupied by the processing program, and a count of instructions included in the processing program.
    Type: Grant
    Filed: June 6, 2022
    Date of Patent: October 31, 2023
    Assignee: SHANGHAI UNITED IMAGING HEALTHCARE CO., LTD.
    Inventors: Wanli Teng, Yecheng Han, Yong E
  • Patent number: 11783167
    Abstract: Some embodiments provide a neural network inference circuit for executing a neural network that includes multiple layers of computation nodes. At least a subset of the layers include non-convolutional layers. The neural network inference circuit includes multiple cores with memories that store input values for the layers. The cores are grouped into multiple clusters. For each cluster, the neural network inference circuit includes a set of processing circuits for receiving input values from the cores of the cluster and executing the computation nodes of the non-convolutional layers.
    Type: Grant
    Filed: August 21, 2019
    Date of Patent: October 10, 2023
    Assignee: PERCEIVE CORPORATION
    Inventors: Jung Ko, Kenneth Duong, Steven L. Teig
  • Patent number: 11782711
    Abstract: Systems, apparatuses, and methods related to dynamic precision bit string accumulation are described. Dynamic bit string accumulation can be performed using an edge computing device. In an example method, dynamic precision bit string accumulation can include performing an iteration of a recursive operation using a first bit string and a second bit string and determining that a result of the iteration of the recursive operation contains a quantity of bits in a particular bit sub-set of the result that is greater than a threshold quantity of bits associated with the particular bit sub-set. The method can further include writing a result of the iteration of the recursive operation to a first register and writing at least a portion of the bits associated with the particular bit sub-set of the result to a second register.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Vijay S. Ramesh, Richard C. Murphy
  • Patent number: 11775310
    Abstract: A processing system includes a system interconnect, a processor coupled to communicate with other components in the processing system through the system interconnect, distributed general purpose registers (GPRs) in the processing system wherein a first subset of the distributed GPRs is located in the processor and a second subset of the distributed GPRs is located in the processing system and external to the processor, and a first set of conductors directly connected between the processor and the second subsets of the distributed GPRs. An instruction execution pipeline in the processor accesses any register in the first and second subsets of the distributed GPRs as part of the processor's GPRs during instruction execution in the processor, in which the second subset of the distributed GPRs is accessed through the first conductor.
    Type: Grant
    Filed: November 16, 2021
    Date of Patent: October 3, 2023
    Assignee: NXP B.V.
    Inventors: Michael Andrew Fischer, Kevin Bruce Traylor
  • Patent number: 11775454
    Abstract: Embodiments of the present invention include a drive-to-drive storage system comprising a host server having a host CPU and a host storage drive, one or more remote storage drives, and a peer-to-peer link connecting the host storage drive to the one or more remote storage drives. The host storage drive includes a processor and a memory, wherein the memory has stored thereon instructions that, when executed by the processor, causes the processor to transfer data from the host storage drive via the peer-to-peer link to the one or more remote storage drives when the host CPU issues a write command.
    Type: Grant
    Filed: May 2, 2022
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Oscar P. Pinto, Robert Brennan
  • Patent number: 11762665
    Abstract: A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: September 19, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory F. Grohoski, Manish K. Shah, Kin Hing Leung
  • Patent number: 11762797
    Abstract: A system includes a fabric switch including a motherboard, a baseboard management controller (BMC), a network switch configured to transport network signals, and a PCIe switch configured to transport PCIe signals; a midplane; and a plurality of device ports. Each of the plurality of device ports is configured to connect a storage device to the motherboard of the fabric switch over the midplane and carry the network signals and the PCIe signals over the midplane. The storage device is configurable in multiple modes based a protocol established over a fabric connection between the system and the storage device.
    Type: Grant
    Filed: October 5, 2020
    Date of Patent: September 19, 2023
    Inventors: Sompong Paul Olarig, Fred Worley, Son Pham
  • Patent number: 11762662
    Abstract: A graphics processing device comprises a set of compute units to execute multiple threads of a workload, a cache coupled with the set of compute units, and a prefetcher to prefetch instructions associated with the workload. The prefetcher is configured to use a thread dispatch command that is used to dispatch threads to execute a kernel to prefetch instructions, parameters, and/or constants that will be used during execution of the kernel. Prefetch operations for the kernel can then occur concurrently with thread dispatch operations.
    Type: Grant
    Filed: October 25, 2021
    Date of Patent: September 19, 2023
    Assignee: Intel Corporation
    Inventors: James Valerio, Vasanth Ranganathan, Joydeep Ray, Pradeep Ramani
  • Patent number: 11763141
    Abstract: A neural processing unit (NPU) is described. The NPU includes an NPU direct memory access (NDMA) core. The NDMA core includes a read engine having a read buffer. The NDMA core also includes a write engine having a write buffer. The NPU also includes a controller. The controller is configured to direct the NDMA core to perform hardware memory bandwidth optimization for reading/writing NDMA data in the read buffer and/or NDMA data in the write buffer. The NDMA core is also configured to transparently combine NDMA transaction requests for a data stripe to increase local access to available tensors in artificial neural networks.
    Type: Grant
    Filed: April 4, 2022
    Date of Patent: September 19, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Jinxia Bai, Rosario Cammarota, Michael Goldfarb
  • Patent number: 11755365
    Abstract: A method of scheduling tasks in a processor comprises receiving a plurality of tasks that are ready to be executed, i.e. all their dependencies have been met and all the resources required to execute the task are available, and adding the received tasks to a task queue (or “task pool”). The number of tasks that are executing is monitored and in response to determining that an additional task can be executed by the processor, a task is selected from the task pool based at least in part on a comparison of indications of resources used by tasks being executed and indications of resources used by individual tasks in the task pool and the selected task is then sent for execution.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: September 12, 2023
    Assignee: Imagination Technologies Limited
    Inventors: Isuru Herath, Richard Broadhurst
  • Patent number: 11748103
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: February 15, 2022
    Date of Patent: September 5, 2023
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 11748599
    Abstract: Techniques including receiving a first set of values for processing by a machine learning (ML) network, storing a first portion of the first set of values in an on-chip memory, processing the first portion of the first set of values in a first layer of the ML network to generate a second portion of a second set of values, overwriting the stored first portion with the generated second portion, processing the second portion in a second layer of the ML network to generate a third portion of a third set of values, storing the third portion, repeating the steps of storing the first portion, processing the first portion, overwriting the stored first portion, processing the second portion, and storing the third portion for a fourth portion of the first set of values until all portions of the first set of values are processed to generate the third set of values.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: September 5, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Kumar Desappan, Mihir Narendra Mody, Pramod Kumar Swami, Anshu Jain, Rishabh Garg
  • Patent number: 11748279
    Abstract: A system on chip, an access command routing method, and a terminal are disclosed. The system on chip includes an IP core and a bus. The IP core is configured to: obtain, based on an access address corresponding to an access command, an address range configuration identifier corresponding to the access address; and transmit the access command and the address range configuration identifier to the bus, where the address range configuration identifier is used by the bus to route the access command. The bus is configured to route the access command to a system cache or an external memory based on the address range configuration identifier.
    Type: Grant
    Filed: August 4, 2021
    Date of Patent: September 5, 2023
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Shiming He, Bo Sun, Wenmin Zhou, Zhiqiang Zhang
  • Patent number: 11740911
    Abstract: A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.
    Type: Grant
    Filed: May 6, 2022
    Date of Patent: August 29, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory F. Grohoski, Manish K. Shah, Kin Hing Leung
  • Patent number: 11741026
    Abstract: An accelerator, a method of operating the accelerator, and an electronic device including the accelerator. A method of operating the accelerator configured to perform a target operation includes packing input data with a data layout determined based on a word width of a memory in the accelerator and a spatial size of a filter to be applied to the target operation and storing the packed input data in the memory, and performing the target operation between a portion of the input data stored in a same word in the memory and weights of the filter.
    Type: Grant
    Filed: February 23, 2021
    Date of Patent: August 29, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hanmin Park, Hyung-Dal Kwon, Jaehyeong Sim, Seungwook Lee, Jae-Eon Jo
  • Patent number: 11734207
    Abstract: The present disclosure generally relates to utilizing a port scheduler within a data storage device controller to schedule data transfers and determine which port should be utilized for each data packet transferred. The data storage device comprises a multi-port system on a host interface. The port scheduler can consider the following factors for example: link workload, idle time for each port, link power state, throughput for each port, speed of each link, priority of data transfer, and quality of service (QoS). Based upon an analysis of one or more of the factors, the port scheduler can transfer data on a port that is not associated with the data to ensure efficient multi-port usage.
    Type: Grant
    Filed: February 2, 2022
    Date of Patent: August 22, 2023
    Assignee: Western Digital Technologies, Inc.
    Inventors: Shay Benisty, Judah Gamliel Hahn, Avichay Haim Hodes
  • Patent number: 11726929
    Abstract: An accelerator, an operation method of the accelerator, and an accelerator system including the accelerator are disclosed. The operation method includes receiving one or more workloads assigned by a host controller, determining reuse data of the workloads based on hardware resource information and/or a memory access cost of the accelerator when a plurality of processing units included in the accelerator performs the workloads, and providing a result of performing the workloads.
    Type: Grant
    Filed: February 2, 2021
    Date of Patent: August 15, 2023
    Assignees: Samsung Electronics Co., Ltd., SNU R&DB FOUNDATION
    Inventors: Seung Wook Lee, Soojung Ryu, Jintaek Kang, Sunjung Lee
  • Patent number: 11727257
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
    Type: Grant
    Filed: January 24, 2022
    Date of Patent: August 15, 2023
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
  • Patent number: 11720351
    Abstract: Disclosed are systems and methods related to providing for the optimized software implementations of artificial intelligence (“AI”) networks. The system receives operations (“ops”) consisting of a set of instructions to be performed within an AI network. The system then receives microkernels implementing one or more instructions to be performed within the AI network for a specific hardware component. Next, the system generates a kernel for each of the operations. Generating the kernel for each of the operations includes configuring input data to be received from the AI network; detecting a specific hardware component to be used; selecting one or more microkernels to be invoked by the kernel based on the detection of the specific hardware component; and configuring output data to be sent to the AI network as a result of the invocation of the microkernel(s).
    Type: Grant
    Filed: March 17, 2021
    Date of Patent: August 8, 2023
    Assignee: OnSpecta, Inc.
    Inventors: Victor Jakubiuk, Sebastian Kaczor
  • Patent number: 11720509
    Abstract: A system includes a fabric switch including a motherboard, a baseboard management controller (BMC), a network switch configured to transport network signals, and a PCIe switch configured to transport PCIe signals; a midplane; and a plurality of device ports. Each of the plurality of device ports is configured to connect a storage device to the motherboard of the fabric switch over the midplane and carry the network signals and the PCIe signals over the midplane. The storage device is configurable in multiple modes based a protocol established over a fabric connection between the system and the storage device.
    Type: Grant
    Filed: October 5, 2020
    Date of Patent: August 8, 2023
    Inventors: Sompong Paul Olarig, Fred Worley, Son Pham