SOUND OUTPUT DEVICE, SOUND GENERATION METHOD, AND PROGRAM
According to the present disclosure, a sound output device includes: a sound acquisition part configured to acquire sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and a head-related transfer function adjustment part configured to adjust a head-related transfer function of sound captured by the sound guide part. Since the head-related transfer function adjustment part adjusts the head-related transfer function of sound captured by the sound guide part, it is possible to listen to both ambient sound and sound provided from a sound output device such that the listener does not feel strangeness even in the state in which the listener is wearing the sound output device.
Latest Sony Corporation Patents:
The present application claims the benefit under 35 U.S.C. § 120 as a continuation application of U.S. application Ser. No. 15/765,365, filed on Apr. 2, 2018, which is a national stage filing under 35 U.S.C. 371 of International Patent Application Serial No. PCT/JP2016/076145, filed Sep. 6, 2016, entitled “SOUND OUTPUT DEVICE, SOUND GENERATION METHOD, AND PROGRAM”, which claims priority under 35 U.S.C. § 119(a)-(d) or 35 U.S.C. § 365(b) to Japanese application number 2015-201000, filed Oct. 9, 2015, the entire contents of each of which are incorporated herein by reference in their entireties.
TECHNICAL FIELDThe present disclosure relates to sound output devices, sound generation methods, and programs.
BACKGROUND ARTAccording to related arts, small earphones configured to convert electrical signals output from reproduction devices or the like into sound signals through speakers have been widespread. Such earphones emit sound such that the sound is heard only by a listener wearing the earphones. Therefore, such earphones have been used in various kinds of environments.
Such earphones have forms that allow the earphones to be inserted into ears of listeners. For example, in-ear earphones have forms that allow users to use the earphones by inserting the earphones deeply into their ears (ear canals). Because of their structure, most of in-ear earphones have open designs. Such earphones have relatively good noise isolation performances, and therefore such earphones have advantage that users can enjoy music or the like even in places with slightly large noise.
In general, in-ear earphone has a speaker unit and housing as basic structural elements. The speaker unit is configured to convert electrical signals into sound signals. The housing has a substantially cylindrical shape, and the housing also serves as a sound tube. The speaker unit is attached on one end of the housing (outer side of ear canal). The housing has an emission outlet through which vibrating air generated in the speaker unit is emitted to an ear canal and transmitted to an eardrum. In addition, in general, an ear tip (removable part) is attached to the other end of the housing (part to be inserted into ear canal). The ear tip has a shape that fits a listener's ear canal when worn by the listener. For example, Patent Literature 1 proposes an in-ear earphone device in which a sound tube is arranged to tilt from a position other than the center of housing such that the housing fits into a concha auriculae and the sound tube is arranged close to an entrance of an ear canal.
CITATION LIST Patent LiteraturePatent Literature 1: JP 4709017B
DISCLOSURE OF INVENTION Technical ProblemEven in the case where a listener is wearing earphones and listening to provided sound, the listener has to listen to ambient sound at the same time if a person around the listener speaks to the listener, for example. However, with regard to most of conventional earphones such as in-ear earphones, it is extremely difficult for a listener to listen to ambient sound while wearing the earphones. This is because such earphones have structures that completely cover ear openings to improve reproduction sound quality and to prevent a reproduction sound from leaking to the outside. For example, listeners may feel inconvenience if they cannot listen to ambient sound during driving, being navigated, or doing outdoor or indoor sports such as walking, jogging, cycling, mountaineering, skiing, or snowboarding. In addition, in such a situation, the listeners may encounter dangerous situations. In addition, convenience may deteriorate if listeners cannot hear ambient sound during communication or a presentation. In addition, when a listener is wearing the conventional earphones, people around the listener can see earphones covering ear openings of the listener. Therefore, the people around the listener wearing the earphones may hesitate to speak to the listener, and this may interrupt communication between people.
In view of such circumstances, it is desirable to listen to both ambient sound and sound provided from a sound output device such that a listener does not feel strangeness even in the state in which the listener is wearing the sound output device.
Solution to ProblemAccording to the present disclosure, there is provided a sound output device including: a sound acquisition part configured to acquire sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and a head-related transfer function adjustment part configured to adjust a head-related transfer function of sound captured by the sound guide part.
The sound output device according may further include a sound environment adjustment part configured to adjust a sound environment of sound captured by the sound guide part.
In addition, the head-related transfer function adjustment part may change the head-related transfer function such that a sound image of the sound is localized at a place different from a place of ambient sound directly entering an ear of a listener.
In addition, the head-related transfer function adjustment part may change the head-related transfer function such that a sound image of the sound is localized above a head of the listener or near a foot of the listener.
In addition, the head-related transfer function adjustment part may adjust the head-related transfer function on a basis of operation performed by a listener.
In addition, the sound environment adjustment part may adjust the sound environment on a basis of operation performed by a listener.
In addition, the sound environment adjustment part may adjust the sound environment on a basis of sound information of an ambient environment of the listener.
In addition, the sound environment adjustment part may adjust the sound environment on a basis of a result of separating the sound information of the ambient environment into human voice and environmental sound other than the human voice.
In addition, the sound environment adjustment part may acquire a result of analyzing sound information of an ambient environment of the listener from another device, and adjust the sound environment.
In addition, the sound environment adjustment part may adjust the sound environment on a basis of location information of a listener.
In addition, the head-related transfer function adjustment part may adjust the head-related transfer function on a basis of a direction of a head of a listener.
In addition, the head-related transfer function adjustment part may adjust a head-related transfer function such that a sound image location is a constant location regardless of a direction of a head of a listener.
In addition, the sound output device may include a sound output part configured to output sound to be transmitted to an ear of the listener without passing through the sound guide part.
In addition, one of sound to be output to the sound guide part and sound to be transmitted to an ear of the listener without passing through the sound guide part may be delayed.
In addition, the sound output device may include a delay part configured to delay sound to be transmitted to an ear of the listener without passing through the sound guide part in comparison with sound to be output to the sound guide part.
In addition, the sound output device may include a location information acquisition part configured to acquire location information of a listener, and the sound acquisition part may acquire navigation information based on the location information.
In addition, the sound acquisition part may acquire speech of the listener or voice for giving an instruction on movement of the listener.
In addition, the sound acquisition part may acquire guidance information for explaining any event visually recognized by the listener in a language designated by the listener from among a plurality of languages.
In addition, according to the present disclosure, there is provided a sound generation method including: acquiring sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and adjusting a head-related transfer function of sound captured by the sound guide part.
In addition, according to the present disclosure, there is provided a program causing a computer to function as: a means for acquiring sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and a means for adjusting a head-related transfer function of sound captured by the sound guide part.
Advantageous Effects of InventionAs described above, according to the present disclosure, it is possible to listen to both ambient sound and sound provided from a sound output device such that the listener does not feel strangeness even in the state in which the listener is wearing the sound output device.
Note that the effects described above are not necessarily limitative. With or in the place of the above effects, there may be achieved any one of the effects described in this specification or other effects that may be grasped from this specification.
Hereinafter, (a) preferred embodiment(s) of the present disclosure will be described in detail with reference to the appended drawings. Note that, in this specification and the appended drawings, structural elements that have substantially the same function and structure are denoted with the same reference numerals, and repeated explanation of these structural elements is omitted.
Note that, the description is given in the following order.
1. Configuration Example of Sound Output Device 1. Configuration Example of Sound Output DeviceFirst, with reference to
The sound output device 100 illustrated in
As described later, the supporting part 130 fits to a vicinity of an opening of an ear canal (such as intertragic notch), and supports the sound guide part 120 near the other end 122 such that the sound output hole at the other end 122 of the sound guide part 120 faces deep in the ear canal. The outside diameter of the sound guide part 12 near at least the other end 122 is much smaller than the internal diameter of the opening of the ear canal. Therefore, the other end 122 does not completely cover the ear opening of the listener even in the state in which the other end 122 of the sound guide part 120 is supported by the supporting part 130 near the opening of the ear canal. In other words, the ear opening is open. The sound output device 100 is different from conventional earphones. The sound output device 100 can be referred to as an ‘ear-open-style’ device.
In addition, the supporting part 130 includes an opening part 131 configured to allow an entrance of an ear canal (ear opening) to open to the outside even in a state in which the sound guide part 120 is supported by the supporting part 130. In the example illustrated in
The tube-shaped sound guide part 120 captures sound generated by the sound generation part 110 into the tube from the one end 121 of the sound guide part 120, propagates air vibration of the sound, emits the air vibration to an ear canal from the other end 122 supported by the supporting part 130 near the opening of the ear canal, and transmits the air vibration to an eardrum.
As described above, the supporting part 130 that supports the vicinity of the other end 122 of the sound guide part 130 includes the opening part 131 configured to allow the opening of an ear canal (ear opening) to open to the outside. Therefore, the sound output device 100 does not completely cover an ear opening of a listener even in the state in which the listener is wearing the sound output device 100. Even in the case where a listener is wearing the sound output device 100 and listening to sound output from the sound generation part 110, the listener can sufficiently hear ambient sound through the opening part 131.
Note that, although the sound output device 100 according to the embodiment allows an ear opening to open to the outside, the sound output device 100 can suppress sound generated by the sound generation part 100 (reproduction sound) from leaking to the outside. This is because the sound output device 100 is worn such that the other end 122 of the sound guide part 120 faces deep in the ear canal near the opening of the ear canal, air vibration of generated sound is emitted near the eardrum, and this enables good sound quality even in the case of reducing output from the sound output part 100.
In addition, directivity of air vibration emitted from the other end 122 of the sound guide part 120 also contributes to prevention of sound leakage.
Returning to the description with reference to
In addition, the sound guide part 120 further includes a deformation part 124 between the curved clip part 123 and the other end 122 that is arranged near an opening of an ear canal. When excessive external force is applied, the deformation part 124 deforms such that the other end 122 of the sound guide part 120 is not inserted into deep in the ear canal too much.
When using the sound output device 100 having the above-described configuration, it is possible for a listener to naturally hear ambient sound even while wearing the sound output device 100. Therefore, it is possible for the listener to fully utilize his/her functions as human beings depending on his/her auditory property, such as recognition of spaces, recognition of dangers, and recognition of conversations and subtle nuances in the conversations.
As described above, in the sound output device 100, the structure for reproduction does not completely cover the vicinity of the opening of an ear. Therefore, ambient sound is acoustically transparent. In a way similar to environments of a person who does not wear general earphones, it is possible to hear ambient sound as it is, and it is also possible to hear both the ambient sound and sound information or music simultaneously by reproducing desired sound information or music through its pipe or duct shape.
Basically, in-ear earphones that have been widespread in recent years have closed structures that completely cover ear canals. Therefore, user hears his/her own voice and chewing sound in a different way from a case where his/her ear canals are open to the outside. In many case, this causes users to feel strangeness and uncomfortable. This is because own vocalized sound and chewing sound are emitted to closed ear canals though bones and muscles. Therefore, low frequencies of the sound are enhanced and the enhanced sound propagates to eardrums. When using the sound output device 100, such phenomenon never occurs. Therefore, it is possible to enjoy usual conversations even while listening to desired sound information.
On the other hand, although users can simultaneously hear both actual sound in an ambient environment and necessary sound information reproduced by the sound output device 100 (such as music or information sound from a radio or a network), these sounds may interrupt each other. In addition, the ambient environmental sound is naturally ‘heard in the same way as usual’. Therefore, sound sources are localized with appropriate senses of distance. However, when reproduction sound information or reproduction music is reproduced near ear canals in a way similar to a case of using general earphones, sound images have close distances and lateralization occurs. In a similar way, sound images also have close distances and lateralization occurs in the case of listening to reproduction sound information or reproduction music in a stereo state. As described above, when simultaneously listening to both ambient environmental sound and reproduction sound information or the like in the case where senses of distance between them are different from ‘results of listening’, sometimes ‘listening fatigue’ occurs and it takes a while to recognize content of the sound. For example, in the case where an alarm is ringing in an ambient environmental sound while listening to music, sometimes it takes a while to change the target to be aurally focused on.
Therefore, according to the embodiment of the present disclosure, it is possible to solve such problems by creating a phenomenon known as the so-called ‘cocktail party effect’ as a system. There are various theories as to a principle of the cocktail party effect. One of the theories is that, it is possible to distinguish different pieces of sound image location information since it is possible to specially recognize the pieces of sound image location information in a three-dimensional space in one's head. For example, it is difficult to separate and distinguish conversations of people when reproducing content in which conversations in a conference are recorded through a monaural microphone. However, it is possible to separate and distinguish conversations when using headphones for reproducing content in which conversations in a conference are recorded through binaural recording.
In other words, although sound information, music, or the like is reproduced as it is by the sound output device 100 near ear canals of ears, sound images are localized at artefactual locations by using signal processing. It is possible to reduce listening fatigue of users by providing sound sources that fits an ambient sound environment or by providing sound sources as if the sound sources are in a natural space. In addition, it is possible to selectively listen to ambient environmental sound and reproduction sound information depending on a sound image map recognized by a user (in his/her head) without paying attention to transition time and listening fatigue.
Such sound image localization can be referred to as ‘audio augmented reality’ (AR) that applies the AR technology that is generally popular in a field of video to a field of audio. In addition, it is also considered that reproduction sound information is overlaid on ambient sound. The embodiment of the present disclosure also describes new UX in addition to a system focusing on solving the above-described problem.
As illustrated in
As illustrated in the configuration example in
In the configuration example in
In general, most of the HRTFs are measured in an anechoic chamber or a room with less reverberation. By convoluting the HRTFs and sound of the sound source 406 through the filters 415, it is possible for the person 400 to recognize an approximate direction of the sound source 406 and an approximate distance to the sound source 406, and it is possible to localize a sound image. In addition, according to the embodiment, acoustic transfer functions L and R are convoluted through filters 418 to blend the sound source 406 in an ambient environment as a sound image during reproduction, as illustrated in
In a case of actual application of the system illustrated in
The sound image location control part 424 controls a sound image location of the sound source 406 in response to operation performed on the UI 422. In this case, an optimal filter is selected from the database 420 in response to the operation performed on the UI 422. In addition, the sound environment control part 426 controls sound of the sound source 406 in response to the operation performed on the UI 422. In this case, the optimal filter corresponding to a desired sound environment is selected from the database 421 in response to the operation performed on the UI 422.
For example, sometimes locations at which users want to localize a sound image of the sound source 106 are different depending on differences in hearing sensation between individuals or depending on usage situations. For this reason, the users are allowed to operate the UIs 422 to select locations of the sound image localization. This enables construction of system with high convenience for listeners (users). In addition, it is known that the HRTFs are different between individuals due to their ear shapes. Therefore, it is possible for user to select optimal HRTFs corresponding to an individual difference from HRTFs corresponding to a plurality of ear shapes that are classified for sound image locations and stored in the database 420.
Also in the case of the sound environment, it is possible for the user to select an optimal sound environment by using the UI 422 to set the sound of the sound source 406 in a desired sound environment. For example, it is possible to listen to the sound of the sound source 406 in a sound environment such as a concert venue, a movie theater, or the like.
For example, it is possible to apply the configuration illustrated in
In the case illustrated in
As described above, there is only a little difference in a way of listening and a distance to a sound image between ambient environmental sound heard by the user and instruction sound from the sound output device 100. Therefore, it is possible to prevent ‘distraction of attention’ due to ears focusing on specific sound, and it is possible to guide the attention to the sound image location. Therefore, it is also possible to reduce time necessary for transitioning attention of a user from the ambient environmental sound to the instruction sound, in comparison with conventional cases in which the instruction sound is lateralized.
Also in the cases of
Note that, in the case of the usage methods illustrated in
On the other hand, in the case where the VAD 442a determines that the sound signal stream is non-voice, it is determined that the collected sound is ambient environmental sound itself. To analyze features of the ambient environmental sound, the ambient environmental sound is classified by using band pass filters (BPFs) 442e for respective bands, energy in the respective bands is calculated, and a buffer 442f stores them in addition to their time-series change (variation). This result is checked against a prepared ambient sound environment database 442g, the pattern matching part 442h matches the result with the ambient sound environment database 442g, and a spectrum characteristic of a most similar optimal sound environment is selected. The optimal sound filter generation part 442i integrates a characteristic obtained from the non-voice and a characteristic obtained when it is determined that the sound signal is the voice, to generate filters simulating an ambient sound environment.
In addition, as indicated by a dashed arrow in
Note that, as illustrated using dashed lines in
Note that, in the above-described example, the sound environment information is recognized and determined on the basis of the information collected by the microphone 440 and the optimal filters 418 are set. However, as illustrated in
Note that,
In addition, since the sound output device 100 includes the GPS 446, it is possible to navigate the listener 400 on the basis of information acquired through the GPS 446. Therefore, for example, it is possible for a user to listen to navigation information from the sound output device 100 while hearing ambient sound even in the case of driving a car as illustrated in
The wireless communication part 710 in the system 700 transmits the location information to a navigation system 702. The navigation system 702 transmits navigation voice information to the sound source 406 on the basis of the location information.
In addition, information acquired by the sensors 416 and the microphone 440 is also transmitted to the system 700 in the smartphone or the cloud via the wireless communication part 432. The sound image location control part 424 provided in the system 700 in the smartphone or the cloud controls the filters 415 on the basis of information of the sensors 416 to control a sound image location. In addition, the sound environment recognition control part 442 provided in the system 700 in the smartphone or the cloud recognizes a sound environment on the basis of information of the microphone 440 to control the filters 418.
According to the configuration illustrated in
For example, when a destination, a target object, a future movement direction, or the like is on the left or right side in the case where the navigation voice information is provided from the sound source 406 to a user, it is desirable to present sound as if a sound image is localized in a direction toward the destination, the target object, or the future movement direction. For example, when the destination, the movement direction, or the like is on the left side, a sound image of the navigation voice information is set to be location on the left side. Therefore, it is possible for the user to recognize the direction quickly and easily. This results in safer behavior of the user.
Therefore, for example, when the destination, the movement direction, or the like is on the left side, the sound image location control part 424 controls the left and right filters 415a-1, 415a-2, 415b-1, and 415b-2 on the basis of the navigation information such that the navigation voice information is localized on the left side of the user and user can hear the navigation voice information that comes from the left side of the user.
An addition part 704a adds outputs from the filter 415a-1 and the filter 415b-1 and transmits it to the wireless communication part 710. An addition part 704b adds outputs from the filter 415a-2 and the filter 415b-2 and transmits it to the wireless communication part 710. The wireless communication part 710 transmits sound information obtained from the addition parts 704a and 704b to the wireless communication part 432 of the sound output device 100. The sound output device 100 uses an amplifier to amplify the sound information transmitted from the system 700, and provides the amplified sound information to the user.
By using such a configuration, it is possible for the sound image location control part 424 to freely set a sound image location. For example, as illustrated in
Note that, according to the configuration illustrated in
Therefore, it is possible for the user to clearly distinguish ambient environmental sound and sound provided from the sound environment device 100. For example, it becomes easier to distinguish navigation voice information or stereo music from the ambient environmental sound when localizing the navigation sound information or the stereo music right above or right below the user. Therefore, even in the case where the ambient environmental sound includes human voice or music, it is possible for the user to clearly distinguish the ambient environmental sound from the navigation voice information or stereo music provided from the sound output device 100. Accordingly, it is possible to clearly distinguish the ambient environmental sound from the sound provided from the sound output device 100 even in the case of using the sound output device 100 while driving a car as illustrated in
Next, a case where a plurality of listeners enjoy the same content will be described.
In general, most of devices for providing voice serving as the sub-information of Kabuki or opera provide the voice through an earphone. Here, examples of the sub-information voice include explanation of content of the show in a plurality of language. However, the earphone covers an ear. Therefore, users cannot enjoy direct sound of a play, song, or music played in front of the users through their ears. For this reason, some viewers choose not to listen to sub-information. However, by using the sound output device 100 according to the embodiment, it is possible to deliver direct sound of opera, Kabuki, or the like to ears with no interruption. Therefore, it is possible for the user to directly listen to direct sound from the virtual speaker 900 as the ambient environmental sound. In addition, by the sound output device 100 outputting voice of the sub-information, it is possible to localize a sound image of the sub-voice information at a left rear side of a listener, and the listener can hear the explanation as if someone whispers into his/her ear, for example. Accordingly, it is possible for the user to directly hear live sound of a show or the like and enjoy the atmosphere in the venue while getting explanatory information.
In the configuration example illustrated in
In the past, voice of an instructor, navigation voice, explanatory sub-voice, or the like is targeted as a single dry source sound source. However, when treating it as a ‘single object’, it is possible to extend and apply the system according to the embodiment of the present disclosure such that a plurality of objects are simultaneously reproduced as sound sources. For example, as illustrated in
For example, when this system is applied to all users in an exhibition hall, it is possible for all the users to experience existence of a sound source (virtual speaker 900) in the same sound image location while having conversation with each other, share the existence of the virtual sound image, and enjoy mixture of a real world and virtual sound sources. Of course, the sound is dedicated to each individual. Therefore, it is possible to reproduce sound in a language corresponding to each individual. In contrast to speakers, a plurality of users who speaks different languages form each other can enjoy the same content.
Note that, it is preferable to prepare prerecorded explanatory voice although players do a show (such as playing music, singing a song, doing a play, or the like) in real time. In general, explanatory content based on average show progress time is created in advance, and an operator changes a speed of a sequence to fast or slow in view of actual progress speed of the show. Accordingly, it is possible to optimally adjust the explanatory voice.
Next, a system in which provision of voice from the sound output device 100 and provision of voice from a stereo sound source or the like are combined will be described. For example, it is possible to express a virtual speaker by using headphones while targeting on stereo sound sources such as music. In this case, it is possible to localize a sound image as if the virtual speakers 900 are at virtual sound image locations.
For example, in the case of building a system as illustrated in
However, in the embodiment according to the present disclosure, as illustrated in
In
On the other hand, the system illustrated in
Note that, in
The delay parts 860 and 862 are provided for synchronizing sound from real speakers 804 and sound from the sound output device 100 during reproduction between multi channels.
In
Td1+Tac=Td2+Tpr+Tw1
In general, video is also reproduced simultaneously with sound from the real speaker 804. Therefore, it is desirable to suppress values in the left-hand side and the right-hand side in the above-listed equation to be minimized. For example, on the assumption that Bluetooth (registered trademark) is used as the wireless communication of the system in
The above described numerical values may be set in advance in a device or equipment or may be manually set by a user on the assumption of a usage environment for the user. On the other hand, it is also possible to automatically measure and set delay values.
As an example of the measurement signal A and the measurement signal B that are reproduced simultaneously, for example, it is possible to separately perform analysis by using FFT, as long as they have waveforms with different frequency components as illustrated in
Note that, in
In addition, as illustrated in
In addition, in a way similar to
Note that, in
Note that, with regard to measurement of delay between the real speaker 804 and the microphone 440 of the sound output device 100, technologies described in JP 4285457B, JP 4210859B, JP 4407541B, and JP 4466453B are known as automatic sound field correction technologies for multichannel speakers. For example, when such technologies are applied to the sound output device 100, it is possible to measure respective distances from a plurality of speakers (three speakers SP-L, C, and R) by arranging microphones 440 on respective parts corresponding to ears in the sound output device 100. It is possible to perform measurement itself sequentially by using the TSP, or it is possible to perform measurement simultaneously in the respective speakers by using independent sine waves in a way similar to
In addition, as an example of reproduction for a user, it is possible to use sound including such a delay measurable component, as a ‘device start-up sound’. In addition, in the case of the movie theater, the measurement signal may be mixed in a ‘theater etiquette PSA’ or an advertisement before movie. Therefore, it is possible to measure delay time of each user without letting the users recognize the measurement.
As described above, according to the embodiment, it is possible to localize sound images at desired location by adjusting head-related transfer functions in the case where the sound output device 100 illustrated in
The preferred embodiment(s) of the present disclosure has/have been described above with reference to the accompanying drawings, whilst the present disclosure is not limited to the above examples. A person skilled in the art may find various alterations and modifications within the scope of the appended claims, and it should be understood that they will naturally come under the technical scope of the present disclosure.
Further, the effects described in this specification are merely illustrative or exemplified effects, and are not limitative. That is, with or in the place of the above effects, the technology according to the present disclosure may achieve other effects that are clear to those skilled in the art from the description of this specification.
Additionally, the present technology may also be configured as below.
(1)
A sound output device including:
a sound acquisition part configured to acquire sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and
a head-related transfer function adjustment part configured to adjust a head-related transfer function of sound captured by the sound guide part.
(2)
The sound output device according to (1), further including
a sound environment adjustment part configured to adjust a sound environment of sound captured by the sound guide part.
(3)
The sound output device according to (1),
in which the head-related transfer function adjustment part changes the head-related transfer function such that a sound image of the sound is localized at a place different from a place of ambient sound directly entering an ear of a listener.
(4)
The sound output device according to (1),
in which the head-related transfer function adjustment part changes the head-related transfer function such that a sound image of the sound is localized above a head of the listener or near a foot of the listener.
(5)
The sound output device according to (1),
in which the head-related transfer function adjustment part adjusts the head-related transfer function on a basis of operation performed by a listener.
(6)
The sound output device according to (2),
in which the sound environment adjustment part adjusts the sound environment on a basis of operation performed by a listener.
(7)
The sound output device according to (2),
in which the sound environment adjustment part adjusts the sound environment on a basis of sound information of an ambient environment of the listener.
(8)
The sound output device according to (7),
in which the sound environment adjustment part adjusts the sound environment on a basis of a result of separating the sound information of the ambient environment into human voice and environmental sound other than the human voice.
(9)
The sound output device according to (2),
in which the sound environment adjustment part acquires a result of analyzing sound information of an ambient environment of the listener from another device, and adjusts the sound environment.
(10)
The sound output device according to (2),
in which the sound environment adjustment part adjusts the sound environment on a basis of location information of a listener.
(11)
The sound output device according to (1),
in which the head-related transfer function adjustment part adjusts the head-related transfer function on a basis of a direction of a head of a listener.
(12)
The sound output device according to (2),
in which the head-related transfer function adjustment part adjusts a head-related transfer function such that a sound image location is a constant location regardless of a direction of a head of a listener.
(13)
The sound output device according to (1), including
a sound output part configured to output sound to be transmitted to an ear of the listener without passing through the sound guide part.
(14)
The sound output device according to (13),
in which one of sound to be output to the sound guide part and sound to be transmitted to an ear of the listener without passing through the sound guide part is delayed.
(15)
The sound output device according to (13), including
a delay part configured to delay sound to be transmitted to an ear of the listener without passing through the sound guide part in comparison with sound to be output to the sound guide part.
(16)
The sound output device according to (1), including
a location information acquisition part configured to acquire locational information of a listener,
in which the sound acquisition part acquires navigation information based on the location information.
(17)
The sound output device according to (1),
wherein the sound acquisition part acquires speech of the listener or voice for giving an instruction on movement of the listener.
(18)
The sound output device according to (1),
in which the sound acquisition part acquires guidance information for explaining any event visually recognized by the listener in a language designated by the listener from among a plurality of languages.
(19)
A sound generation method including:
acquiring sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and
adjusting a head-related transfer function of sound captured by the sound guide part.
(20)
A program causing a computer to function as:
a means for acquiring sound to be output to the other end of a sound guide part, one end of which is arranged near an entrance of an ear canal of a listener, the sound guide part having a hollow structure; and
a means for adjusting a head-related transfer function of sound captured by the sound guide part.
REFERENCE SIGNS LIST
- 100 sound output device
- 415, 418 filter
- 416 sensor (acceleration sensor and gyro sensor)
- 422 UI
- 424 sound image location control part
- 426 sound environment control part
- 440 microphone
- 442 sound environment recognition control part
- 446 GPS
- 804 speaker
- 860 delay part
Claims
1-20. (canceled)
21. A sound output device comprising:
- a support configured to fit the sound output device to and support the sound device from an intertragic notch of an ear of a listener without hanging from a top of the ear; and
- at least one processor configured to: acquire sound to be output to a first end of a sound guide; and adjust a head-related transfer function of sound captured by the sound guide, wherein the sound guide comprises a second end that is closer to the intertragic notch than the first end of the sound guide is, wherein the support is configured to suspend the first end of the sound guide behind a lobe of the ear, wherein the sound guide has a hollow structure, and wherein the hollow structure of the sound guide curves around an axis parallel to an ear canal of the listener.
22. The sound output device according to claim 21, wherein the at least one processor is further configured to:
- adjust a sound environment of sound captured by the sound guide.
23. The sound output device according to claim 21,
- wherein the head-related transfer function is adjusted such that a location of sound source of the sound is localized at a place different from a place of ambient sound directly entering an ear of a listener.
24. The sound output device according to claim 21,
- wherein the head-related transfer function is adjusted such that a location of sound image of the sound is localized above a head of the listener or near a foot of the listener.
25. The sound output device according to claim 21,
- wherein the head-related transfer function is adjusted on a basis of operation performed by a listener.
26. The sound output device according to claim 22,
- wherein the sound environment is adjusted on a basis of operation performed by a listener.
27. The sound output device according to claim 22,
- wherein the sound environment is adjusted on a basis of sound information of an ambient environment of the listener.
28. The sound output device according to claim 27,
- wherein the sound environment is adjusted on a basis of a result of separating the sound information of the ambient environment into human voice and environmental sound other than the human voice.
29. The sound output device according to claim 22,
- wherein the at least one processor is configured to acquire a result of analyzing sound information of an ambient environment of the listener from another device, and adjust the sound environment.
30. The sound output device according to claim 22,
- wherein the sound environment is adjusted on a basis of location information of a listener.
31. The sound output device according to claim 21,
- wherein the head-related transfer function is adjusted on a basis of a direction of a head of a listener.
32. The sound output device according to claim 22,
- wherein the head-related transfer function is adjusted such that a sound image location is a constant location regardless of a direction of a head of a listener.
33. The sound output device according to claim 21, comprising
- a sound output configured to output sound to be transmitted to an ear of the listener without passing through the sound guide.
34. The sound output device according to claim 33,
- wherein one of sound to be output to the sound guide and sound to be transmitted to an ear of the listener without passing through the sound guide is delayed.
35. The sound output device according to claim 33,
- wherein the at least one processor is configured to delay sound to be transmitted to an ear of the listener without passing through the sound guide in comparison with sound to be output to the sound guide.
36. The sound output device according to claim 21, comprising
- wherein the at least one processor is configured to acquire location information of a listener, and acquire navigation information based on the location information.
37. The sound output device according to claim 21,
- wherein the at least one processor is configured to acquire speech of the listener or voice for giving an instruction on movement of the listener.
38. The sound output device according to claim 21,
- wherein the at least one processor is configured to acquire guidance information for explaining any event visually recognized by the listener in a language designated by the listener from among a plurality of languages.
39. A sound generation method comprising:
- acquiring sound to be output to a first end of a sound guide of a sound output device, wherein the sound guide is configured to fit the sound output device to and support the sound output device from an intertragic notch of an ear of a listener without hanging from a top of the ear and to suspend the first end of the sound guide behind a lobe of the ear; and
- adjusting a head-related transfer function of sound captured by the sound guide,
- wherein the sound guide comprises a second end that is closer to the intertragic notch than the first end of the sound guide is,
- wherein the sound guide has a hollow structure, and
- wherein the hollow structure of the sound guide curves around an axis parallel to an ear canal of the listener.
40. At least one non-transitory computer-readable storage medium encoded with executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method comprising:
- acquiring sound to be output to a first end of a sound guide of a sound output device, wherein the sound guide is configured to fit the sound output device to and support the sound output device from an intertragic notch of an ear of a listener without hanging from a top of the ear and to suspend the first end of the sound guide behind a lobe of the ear; and
- adjusting a head-related transfer function of sound captured by the sound guide,
- wherein the sound guide comprises a second end that is closer to the intertragic notch than the first end of the sound guide is,
- wherein the sound guide has a hollow structure, and
- wherein the hollow structure of the sound guide curves around an axis parallel to an ear canal of the listener.
Type: Application
Filed: Sep 22, 2020
Publication Date: Jan 7, 2021
Applicant: Sony Corporation (Tokyo)
Inventors: Kohei Asada (Kanagawa), Go Igarashi (Tokyo), Koji Nageno (Tokyo), Haruo Oba (Kanagawa), Homare Kon (Tokyo)
Application Number: 17/028,752