Patents by Inventor Shorin Kyo

Shorin Kyo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11256543
    Abstract: A processor and an instruction scheduling method for X-channel interleaved multi-threading, where X is an integer greater than one. The processor includes a decoding unit and a processing unit. The decoding unit is configured to obtain one instruction from each of Z predefined threads in each cyclic period, decode the Z obtained instructions to obtain Z decoding results, and send the Z decoding results to the processing unit, where each cyclic period includes X sending periods, one decoding result is sent to the processing unit in each sending period, a decoding result of the Z decoding results may be repeatedly sent by the decoding unit in a plurality of sending periods, wherein 1?Z<X or Z=X, and wherein Z is an integer. The processing unit (32) is configured to execute the instruction based on the decoding result.
    Type: Grant
    Filed: September 20, 2019
    Date of Patent: February 22, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Shorin Kyo, Ye Gao, Shinri Inamori
  • Publication number: 20200012524
    Abstract: A processor and an instruction scheduling method for X-channel interleaved multi-threading, where X is an integer greater than one. The processor includes a decoding unit and a processing unit. The decoding unit is configured to obtain one instruction from each of Z predefined threads in each cyclic period, decode the Z obtained instructions to obtain Z decoding results, and send the Z decoding results to the processing unit, where each cyclic period includes X sending periods, one decoding result is sent to the processing unit in each sending period, a decoding result of the Z decoding results may be repeatedly sent by the decoding unit in a plurality of sending periods, wherein 1?Z?X or Z=X, and wherein Z is an integer. The processing unit (32) is configured to execute the instruction based on the decoding result.
    Type: Application
    Filed: September 20, 2019
    Publication date: January 9, 2020
    Inventors: Shorin Kyo, Ye Gao, Shinri Inamori
  • Patent number: 10114639
    Abstract: An arithmetic device which controls a parallel arithmetic operation includes a global memory, a plurality of compute units, each of the compute units including a local memory and a plurality of processing elements, and each of the processing elements including a private memory and processing data blocks stored in the private memory, an attribute group holding unit which includes a specific attribute which includes a parameter indicative of a size of the data block, an arithmetic attribute which includes a parameter indicating whether the data block is a data relevant to processing, and indicating a transfer order when the data block is data relevant to processing, and a policy attribute which includes a parameter indicative of how to execute a transfer of the data block and how to execute processing of the data block.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: October 30, 2018
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Shorin Kyo
  • Publication number: 20170228232
    Abstract: An arithmetic device which controls a parallel arithmetic operation includes a global memory, a plurality of compute units, each of the compute units including a local memory and a plurality of processing elements, and each of the processing elements including a private memory and processing data blocks stored in the private memory, an attribute group holding unit which includes a specific attribute which includes a parameter indicative of a size of the data block, an arithmetic attribute which includes a parameter indicating whether the data block is a data relevant to processing, and indicating a transfer order when the data block is data relevant to processing, and a policy attribute which includes a parameter indicative of how to execute a transfer of the data block and how to execute processing of the data block.
    Type: Application
    Filed: April 28, 2017
    Publication date: August 10, 2017
    Inventor: Shorin KYO
  • Publication number: 20170147529
    Abstract: Technology to suppress the drop in SIMD processor efficiency that occurs when exchanging two-dimensional data in a plurality of rectangular regions, between an external section and a plurality of processor elements in an SIMD processor, so that one rectangular region corresponds to one processor element. In the SIMD processor, an address storage unit in a memory controller is capable of setting N number of addresses Ai (i=1 through N) in an external memory by utilizing a control processor. A parameter storage unit is capable of setting a first parameter OSV, a second parameter W, and a third parameter L by utilizing a control processor. A data transfer unit executes the transfer of data between an external memory, and the buffers in N number of processor elements contained in the applicable SIMD processor, based on the contents of the address storage unit and the parameter storage unit.
    Type: Application
    Filed: February 3, 2017
    Publication date: May 25, 2017
    Inventor: Shorin KYO
  • Patent number: 9639337
    Abstract: An attribute group storage unit acquires and holds attribute groups set to respective data blocks. A scenario determination unit determines respective transfer systems of the respective blocks between a memory of the lowest hierarchy and a memory of another hierarchy based on those attribute groups and a configuration of an arithmetic unit which is the parallel processor, and controls the transfer of the respective data blocks according to the determined transfer systems, and the parallel arithmetic operation corresponding to the transfer. Each of the attribute groups is necessary to determine the transfer systems, and includes one or more attributes not depending on the configuration of the parallel processor. The attribute groups of the write blocks are set assuming that each of the write blocks has already been located in the memory of another hierarchy, and is transferred to the memory of the lowest hierarchy.
    Type: Grant
    Filed: June 21, 2012
    Date of Patent: May 2, 2017
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Shorin Kyo
  • Patent number: 9589624
    Abstract: In a semiconductor device in accordance with one embodiment, a memory access control unit counts the number of addresses accessed by burst access to each address included in an address set of an external memory that is going to be accessed. When the number of addresses is larger than a reference value, the memory access control unit performs burst access to the address, and when the number of addresses is smaller than the reference value, the memory access control unit performs random access to the address.
    Type: Grant
    Filed: August 19, 2015
    Date of Patent: March 7, 2017
    Assignee: Renesas Electronics Corporation
    Inventor: Shorin Kyo
  • Patent number: 9495213
    Abstract: When executing a first kernel and a second kernel related to each other by the arithmetic unit, if an allocation attribute of a continuous write block of the first kernel and an allocation attribute of a continuous read block corresponding to the continuous write block of the second kernel are the same, a scenario determination unit executes the first kernel and the second kernel in a pipeline by using the continuous write block for execution of the second kernel through the private memory or the local memory without transferring it to the global memory. At this time, the scenario determination unit logically adds a margin attribute and a dependence attribute of the continuous read block of the second kernel respectively to a margin attribute and a dependence attribute set for the read block for each of the read block of the first kernel.
    Type: Grant
    Filed: January 20, 2015
    Date of Patent: November 15, 2016
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Shorin Kyo
  • Publication number: 20160055897
    Abstract: In a semiconductor device in accordance with one embodiment, a memory access control unit counts the number of addresses accessed by burst access to each address included in an address set of an external memory that is going to be accessed. When the number of addresses is larger than a reference value, the memory access control unit performs burst access to the address, and when the number of addresses is smaller than the reference value, the memory access control unit performs random access to the address.
    Type: Application
    Filed: August 19, 2015
    Publication date: February 25, 2016
    Inventor: Shorin KYO
  • Publication number: 20150370755
    Abstract: To improve processing efficiency of a SIMD processor that divides two-dimensional data into blocks, each having a width of PE number N, to store the data in a local memory of each of PEs by a lateral direction priority method. When designating a local address of N pieces of data arranged in a row direction from head data whose coordinate values in two-dimensional data are (X,Y) to a PE array 110, the N pieces of data being stored in local memories, a CP 150 broadcasts a local address A1, a local address A2, and a threshold number Z obtained by an address calculation unit. Each of the PEs compares a magnitude relation between the threshold number Z and its own number, and selects one of the local address A1 and the local address A2 according to the comparison result.
    Type: Application
    Filed: August 28, 2015
    Publication date: December 24, 2015
    Applicant: Renesas Electronics Corporation
    Inventor: Shorin KYO
  • Publication number: 20150363357
    Abstract: Technology to suppress the drop in SIMD processor efficiency that occurs when exchanging two-dimensional data in a plurality of rectangular regions, between an external section and a plurality of processor elements in an SIMD processor, so that one rectangular region corresponds to one processor element. In the SIMD processor, an address storage unit in a memory controller is capable of setting N number of addresses Ai (i=1 through N) in an external memory by utilizing a control processor. A parameter storage unit is capable of setting a first parameter OSV, a second parameter W, and a third parameter L by utilizing a control processor. A data transfer unit executes the transfer of data between an external memory, and the buffers in N number of processor elements contained in the applicable SIMD processor, based on the contents of the address storage unit and the parameter storage unit.
    Type: Application
    Filed: August 24, 2015
    Publication date: December 17, 2015
    Inventor: Shorin KYO
  • Patent number: 9158737
    Abstract: To improve processing efficiency of a SIMD processor that divides two-dimensional data into blocks, each having a width of PE number N, to store the data in a local memory of each of PEs by a lateral direction priority method. When designating a local address of N pieces of data arranged in a row direction from head data whose coordinate values in two-dimensional data are (X,Y) to a PE array 110, the N pieces of data being stored in local memories, a CP 150 broadcasts a local address A1, a local address A2, and a threshold number Z obtained by an address calculation unit. Each of the PEs compares a magnitude relation between the threshold number Z and its own number, and selects one of the local address A1 and the local address A2 according to the comparison result.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: October 13, 2015
    Assignee: Renesas Electronics Corporation
    Inventor: Shorin Kyo
  • Patent number: 9129085
    Abstract: Technology to suppress the drop in SIMD processor efficiency that occurs when exchanging two-dimensional data in a plurality of rectangular regions, between an external section and a plurality of processor elements in an SIMD processor, so that one rectangular region corresponds to one processor element. In the SIMD processor, an address storage unit in a memory controller is capable of setting N number of addresses Ai (i=1 through N) in an external memory by utilizing a control processor. A parameter storage unit is capable of setting a first parameter OSV, a second parameter W, and a third parameter L by utilizing a control processor. A data transfer unit executes the transfer of data between an external memory, and the buffers in N number of processor elements contained in the applicable SIMD processor, based on the contents of the address storage unit and the parameter storage unit.
    Type: Grant
    Filed: July 3, 2012
    Date of Patent: September 8, 2015
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Shorin Kyo
  • Publication number: 20150227466
    Abstract: When executing a first kernel and a second kernel related to each other by the arithmetic unit, if an allocation attribute of a continuous write block of the first kernel and an allocation attribute of a continuous read block corresponding to the continuous write block of the second kernel are the same, a scenario determination unit executes the first kernel and the second kernel in a pipeline by using the continuous write block for execution of the second kernel through the private memory or the local memory without transferring it to the global memory. At this time, the scenario determination unit logically adds a margin attribute and a dependence attribute of the continuous read block of the second kernel respectively to a margin attribute and a dependence attribute set for the read block for each of the read block of the first kernel.
    Type: Application
    Filed: January 20, 2015
    Publication date: August 13, 2015
    Inventor: Shorin Kyo
  • Patent number: 8769244
    Abstract: Uniforming of the processing load is efficiently realized. Each processing element configuring an SIMD parallel computer system includes a data storage module that stores data processed or transferred, a number-of-data-sets storage device that stores number of data sets, and a front data storage device that stores the front data. Each processing element further includes a control processor that compares the number of data sets stored in one processing element with the number of data sets stored in the own processing element, and issues a data distribution leveling instruction that designates an action for updating contents of the data storage module, the number-of-data-sets storage device, and the front data storage device according to a rule determined based on a comparison result of the own processing element and that of the other processing elements and an action for moving the data stored in the one processing element to the own processing element.
    Type: Grant
    Filed: April 8, 2009
    Date of Patent: July 1, 2014
    Assignee: Nec Corporation
    Inventor: Shorin Kyo
  • Patent number: 8732687
    Abstract: For a program that is made up of functions in units, each function is divided into instruction code blocks having a size CS where CS is the instruction cache line size of a target processor and an instruction code block that is Xth counting from the top of each function F is expressed as (F, X). Flow information of nodes that take (F, X) as identification names is extracted from an executable file of the function program. For each identification name, as neighborhood weight of each identification name that differs from that identification name, information for which that the frequency of appearance of each identification name is taken into consideration that belongs to a function that differs from that function in the neighborhood of each appearing node in the flow information is found. Based on said neighborhood weight information, the functions are arranged in the memory space such that the number of conflicts of said instruction cache is reduced.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: May 20, 2014
    Assignee: NEC Corporation
    Inventor: Shorin Kyo
  • Patent number: 8683106
    Abstract: Nowadays, many architectures have processing units with different bandwidth requirements which are connected over a pipelined ring bus. The proposed invention can optimize the data transfer for the case where processing units with lower bandwidth requirements can be grouped and controlled together for a data transfer, so that the available bus bandwidth can be optimally utilized.
    Type: Grant
    Filed: March 3, 2008
    Date of Patent: March 25, 2014
    Assignee: NEC Corporation
    Inventors: Hanno Lieske, Shorin Kyo
  • Patent number: 8635432
    Abstract: There is provided an SIMD processor array system in which data can be efficiently transferred between processor elements located at different distances. The SIMD processor array system includes a control processor (CP) that is capable of issuing a plurality of instructions at the same time, and a PE array that includes a plurality of mutually-connected processing elements (PEs) to be controlled by the CP. The CP issues an inter-PE data shift instruction to each PE. According to the inter-PE data shift instruction, each PE performs a data sending operation of copying all the contents of a transfer data storing part of an adjoining PE to a transfer data storing part (MBF) of the own PE, and a data fetch operation of copying part or all of the contents of the MBF of the adjoining PE to a transfer data fetch and storing part (RBUF) of the own PE if part of the contents the MBF of the adjoining PE coincide with the contents of an ID storing part (IDB) of the own PE.
    Type: Grant
    Filed: March 4, 2009
    Date of Patent: January 21, 2014
    Assignee: NEC Corporation
    Inventor: Shorin Kyo
  • Publication number: 20130080739
    Abstract: To improve processing efficiency of a SIMD processor that divides two-dimensional data into blocks, each having a width of PE number N, to store the data in a local memory of each of PEs by a lateral direction priority method. When designating a local address of N pieces of data arranged in a row direction from head data whose coordinate values in two-dimensional data are (X,Y) to a PE array 110, the N pieces of data being stored in local memories, a CP 150 broadcasts a local address A1, a local address A2, and a threshold number Z obtained by an address calculation unit. Each of the PEs compares a magnitude relation between the threshold number Z and its own number, and selects one of the local address A1 and the local address A2 according to the comparison result.
    Type: Application
    Filed: July 30, 2012
    Publication date: March 28, 2013
    Inventor: Shorin KYO
  • Publication number: 20130024658
    Abstract: Technology to suppress the drop in SIMD processor efficiency that occurs when exchanging two-dimensional data in a plurality of rectangular regions, between an external section and a plurality of processor elements in an SIMD processor, so that one rectangular region corresponds to one processor element. In the SIMD processor, an address storage unit in a memory controller is capable of setting N number of addresses Ai (i=1 through N) in an external memory by utilizing a control processor. A parameter storage unit is capable of setting a first parameter OSV, a second parameter W, and a third parameter L by utilizing a control processor. A data transfer unit executes the transfer of data between an external memory, and the buffers in N number of processor elements contained in the applicable SIMD processor, based on the contents of the address storage unit and the parameter storage unit.
    Type: Application
    Filed: July 3, 2012
    Publication date: January 24, 2013
    Inventor: Shorin KYO