DEVICE DISCOVERY SYSTEM
In one embodiment, a device discovery system includes a data storage medium to store a clustered data structure including device signatures grouped according to clusters. Each device signature includes device information. Each cluster from a sub-set of the clusters has a different device name. The system also includes an input/output sub-system to receive, from a remote device, a first device signature describing information about a first device, and a processor to perform a decision process based on the clustered data structure with the first device signature as input yielding an output including a first device name or an indication that a name associated with the first device signature is unknown. The processor is operative to prepare a response message including data about the output. The input/output sub-system is operative to send the response message to the remote device.
The present disclosure generally relates to device discovery based on clustering.
BACKGROUNDThe Internet of Things (IoT), using technology such as Internet Protocol version 6, by way of example only, enables a practically unlimited number of devices, such as sensors and actuators, to connect to either private networks or the Internet at large and be monitored or controlled from remote servers. One of the main industries capitalizing on this functionality is the home automation industry where millions of devices can be purchased in local retail stores all over the world and be connected to home gateways as part of an elaborate interconnected system. The devices range from connected televisions to motion detectors, from connected doors and/or windows to individual lights. In such a system, all devices, of any size and manufactured in any country, may be controlled by one or more home automation applications that may run on most mobile devices. Each device/thing may also be monitored and/or configured by the manufacturers' servers. However, just as each and every home automation device provides some useful function, each device/thing may pose a threat to the system, because any malware, rootkit or advanced persistent threat can hide in any of the connected devices and either sabotage or perform espionage on any aspect of the digital home or use the device as a platform from which to mount attacks on other nodes on the Internet. For security and other reasons it is very useful to know the type, make and model of each connected device in the home in order to make appropriate decisions based on the device type, make and model.
The present disclosure will be understood and appreciated more fully from the following detailed description, taken in conjunction with the drawings in which:
There is provided in accordance with an embodiment of the present disclosure a device discovery system including a data storage medium to store a clustered data structure including a plurality of device signatures grouped according to a plurality of clusters clustered in accordance with a clustering algorithm based on the plurality of device signatures as input to the clustering algorithm. The clustered data structure also including a plurality of device names. Each of the plurality of device signatures includes device information. A sub-set of the plurality of clusters is associated with the plurality of device names such that each first cluster of the plurality of clusters in the sub-set has a different one of the plurality of device names. Each of the plurality of device names includes a device attribute. The system also includes an input/output sub-system to receive, from a remote device, a first device signature describing information about a first device. The system also includes a device identification processor to perform a decision process based on the clustered data structure with the first device signature as input yielding an output including a first device name among the plurality of device names or an indication that a name associated with the first device signature is unknown. The device identification processor is operative to prepare a response message including data about the output. The input/output sub-system is operative to send the response message to the remote device.
DETAILED DESCRIPTIONReference is now made to
The device discovery system 10 is operative to try to identify all devices in the home from a domain of many thousands or tens of thousands of possible device models, produced and manufactured across the globe, numbers which are increasing each day. Current methods of device identification typically work with supervised data that has been tested and classified based on a trusted training set of known data in a controlled environment and does not provide a solution to analyzing the plethora of unsupervised data that cannot be easily labelled. However, the device discovery system 10 is based on an unsupervised data model and includes automating the process of disambiguating or labeling of unknown devices as will be described in more detail below. It should also be noted that there is typically no standard communication protocol for home automation devices; hence, no standard communication methods, protocols or information may be assumed per home automation device or device type. The device discovery system 10 provides device discovery services based on a best effort clustering of a set of heterogeneous devices with different discovery protocol information as will be described in more detail below.
The device discovery system 10 provides device discovery services to a plurality of discovery service providers 22. Each of the discovery service providers 22 provides services to a plurality of homes 24, e.g., applying security and other policies to the devices in the homes 24. Each home 24 may include a gateway data collection agent 26. The gateway data collection agent 26 of each home 24, supported by the discovery service providers 22, collects and probes device identification information and device properties from network protocols, agents, services and techniques used by some devices/things in the home such as: Dynamic Host Configuration Protocol (DHCP); Hypertext Transfer Protocol (HTTP) User Agents; Network Mapper (NMAP) Port Scans; Universal Plug and Play (UpNp) discovery; BonJour; Banner Grabbing; Address Resolution Protocol (ARP); and media access control (MAC) Address Prefix, by way of example only. This information may be collected in either a single pass or multiple passes depending on performance requirements for each new device which is connected to the home 24. Alternatively or additionally, this information may be collected passively by eavesdropping to broadcast messages or actively by querying the devices, depending on the protocol. It should be noted that not every device necessarily responds to all discovery protocols, and even two device instances of the same model might not respond with the same protocols, as the response may also depend on the specific network setup in which the devices reside. The gateway data collection agent 26 creates a message 28 which includes a home identification (ID) and discovery data based on the above information of the new device. The message 28 is received by the discovery service provider 22 associated with the home 24. The discovery service provider 22 may further process the discovery data to form a new device signature for the new device according to data formatting requirements of the device discovery system 10. It should be noted that the new device is described as a “new” device for the sake of convenience only. However, it will be appreciated that the device discovery system 10 may also be implemented using a device signature of a device which has been installed in the discovery home 24 for any period of time. The discovery service provider 22 creates a name request 30 including a request identification (ID) and the new device signature. The name request 30 is then sent to the device discovery system 10 for processing. It will be appreciated throughout that the functionality of the device discovery system 10 may be incorporated into each of the discovery service providers 22.
The device discovery system 10 includes a data storage medium 12, an input/output sub-system 14, a device identification processor 16, a cluster naming processor 18 and a clustering update processor 20. The data storage medium 12 is operative to store a clustered data structure 32 and optionally a classifier 34 based on the data of the clustered data structure 32. The clustered data structure 32 is described in more detail with reference to
Reference is now made to
In one embodiment, each new device signature 40 received by the device discovery system 10 (
The clustered data structure 32 also includes a plurality of device names 42 which are added after the clustering process. Each device name includes an attribute of the device. For example, a device attribute may include at least one of the following: a device type; a device manufacturer; a device model, a device operating system, a device hardware element. Each device name 42 is associated with one of the clusters 38. The clusters 38 receive their device names 42 via a naming process discussed in more detail with reference to
Reference is now made to
Next, the device identification processor 16 is operative to perform a decision process yielding an output (block 52). The process of block 52 is broken down into several sub-steps included in the dotted line box as shown in
The next step is a test for minimum information content of the new device signature. If the collected discovery data does not include enough information (predefined by the device discovery system 10), then the new device signature may be discarded as it is not useful for classifying the device information. A non-limiting example of a device signature not including enough information is a device signature which only includes a MAC address prefix or data accrued from banner grabbing. The device identification processor 16 is operative to determine if the new device signature includes a predefined minimum information content (decision block 54). If the new device signature does not include the minimum information content (branch 56), the device identification processor 16 is operative to prepare a response message 36 (block 58) and the input/output sub-system 14 is operative to send the response message 36 to the discovery service provider 22 (or another remote device) indicating that the new device signature lacks the minimum information content (block 60). If the new device signature does include the minimum information content, then the process continues down one or two optional branches, along branch 62 according to a first option and along branch 64 according to a second option, depending on the implementation of the device discovery system 10.
According to the first option (branch 62), the process continues as follows. The device identification processor 16 is operative to perform a decision process based on the clustered data structure 32 with the new device signature as input yielding an output (block 66). As part of the decision process, the device identification processor 16 is operative to compare the new device signature to the clustered data structure 32 to find a closest matching cluster of the clusters 38 (
The process continues at a decision point 68. If there is not a closest matching cluster from the clusters 38 (
Going back to decision point 68, if there is a closest matching cluster from the clusters 38 (
If the criterion is not fulfilled (branch 92), processing continues with the step of block 76 where the identification of the closest matching cluster may be included in the response message 36 but the device name 42 of the closest matching cluster 38 is not included. If the criterion is fulfilled (branch 94), the device identification processor 16 prepares the response message 36 which may include the request identification (ID) included in the name request 30, the generation index 44 (
More details regarding the comparison of the new device signature to the clustered data structure 32 are now described. The details that follow may be applied to clustering and re-clustering of the device signatures 40 (
Each signature may have multiple coordinates (for example, but not limited to, a lists of open ports, operating system version, DHCP option list, MAC address, UPnP settings, and string of central processing unit (CPU) name). Some coordinates may be missing, some other coordinates available, some of which are strings, some are version numbers of installed components, some are network addresses, some are lists of numbers (such as a list of open ports) and some actual numbers (for example, but not limited to, time since last reset). Further, different signatures may contain different coordinates depending on what was available for collection. Therefore, specifying a distance metric between two signatures is an interesting challenge. For a given coordinate which exists in both signatures, there is a variety of metrics that could be used depending on the nature of that coordinate. The device discovery system 10 may be operative to apply different metrics to the different coordinates in the device signatures, such as edit distances or Levenshtein distances for strings by way of example only, Jaccard similarity for lists of numbers by way of example only, absolute differences for numbers by way of example only, some custom weighted metric for version numbers and network addresses by way of example only.
For coordinates which appear in one of the device signatures being compared but are missing in the other device signature of the comparison pair, the value of the metric for that coordinate needs to be defined. The value of the metric may be set, by way of example only, to some fixed positive number C_i (with i being the index of the coordinate) essentially pushing the two signatures in the comparison pair apart by C_i if the i-th coordinate is missing in either signature, or to zero having no effect on the distance between the two signatures, if the coordinate is not available in both.
All of the different coordinates may then be normalized and optionally weighted according to importance and summed together for example using a Euclidean distance or other suitable distance measure, as will now be described in more detail. In order to combine the different metrics for the individual coordinates M, the normalized distance metrics M1(x,y), M2(x,y), M3(x,y) . . . Mn(x,y) between a signature x and a signature y for coordinates 1 to n can be combined into a single metric such as the Euclidean distance/metric: Mcomb(x,y)=sqrt(M1(x,y)̂2+M2(x,y)̂2+ . . . +Mn(x,y)̂2), in essence creating a norm out of given metrics. Alternatively an L1 norm (sum of coordinates) or Lmax (maximum of coordinates) may be used. Also, each metric from among the given metrics may be assigned a specific weight signifying its importance for measuring the distance. A higher weight lends more importance to that specific coordinate. Then combining the different metrics for example with the Euclidean distance would look like Mcomb_weighted(x,y)=sqrt(w1*M1(x,y)̂2+w2*M2(x,y)+ . . . +wn*Mn(x,y)̂2)) with w1, w2, . . . wn being the weights of the individual coordinates. Therefore, complex signatures can be evaluated against each other to produce a measure of similarity or distance between the signatures. The specific values for the weights, w1, . . . wn, the choice of how to combine the different metrics, the choice of specific metrics for specific coordinates, and the values of the penalties C_1, . . . C_n for missing coordinates in either signature are all heuristic parameters that may be evaluated on an implementation by implementation basis and are applied to clustering as well as comparing a single new device signature to the device signatures or centroids of the clustered data structure 32.
Referring once again to
The device identification processor 16 (
For the purposes of future re-clustering of the clustered data structure 32, the device identification processor 16 (
The device identification processor 16 (
Reference is now made to
The name suppliers 108 may include manufacturers, research laboratories and users in the homes 24 (
The cluster naming processor 18 may be operative to present the name response messages 110 to a human operator for inspection. The human operator may assign the confidence level 46 (
Reference is now made to
Reference is now made to
Reference is now made to
Reference is now made to
In practice, some or all of these functions may be combined in a single physical component or, alternatively, implemented using multiple physical components. These physical components may comprise hard-wired or programmable devices, or a combination of the two. In some embodiments, at least some of the functions of the processing circuitry may be carried out by a programmable processor under the control of suitable software. This software may be downloaded to a device in electronic form, over a network, for example. Alternatively or additionally, the software may be stored in tangible, non-transitory computer-readable storage media, such as optical, magnetic, or electronic memory.
It is appreciated that software components may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It is further appreciated that the software components may be instantiated, for example: as a computer program product or on a tangible medium. In some cases, it may be possible to instantiate the software components as a signal interpretable by an appropriate computer, although such an instantiation may be excluded in certain embodiments of the present disclosure.
It will be appreciated that various features of the disclosure which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the disclosure which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable sub-combination.
It will be appreciated by persons skilled in the art that the present disclosure is not limited by what has been particularly shown and described hereinabove. Rather the scope of the disclosure is defined by the appended claims and equivalents thereof.
Claims
1. A device discovery system comprising:
- a data storage medium to store a clustered data structure including: a plurality of device signatures grouped according to a plurality of clusters clustered in accordance with a clustering algorithm based on the plurality of device signatures as input to the clustering algorithm; and a plurality of device names, each of the plurality of device signatures including device information, a sub-set of the plurality of clusters being associated with the plurality of device names such that each first cluster of the plurality of clusters in the sub-set has a different one of the plurality of device names, each of the plurality of device names including a device attribute;
- an input/output sub-system to receive, from a remote device, a first device signature describing information about a first device; and
- a device identification processor to: perform a decision process based on the clustered data structure with the first device signature as input yielding an output including a first device name among the plurality of device names or an indication that a name associated with the first device signature is unknown; and prepare a response message including data about the output, wherein the input/output sub-system is operative to send the response message to the remote device.
2. The system according to claim 1, wherein each of the plurality of device names includes at least one of the following: a device type; a device manufacturer; and a device model.
3. The system according to claim 1, wherein the device identification processor is operative to compare the first device signature to the clustered data structure to find a closest matching cluster of the plurality of clusters for the first device signature.
4. The system according to claim 1, wherein: the output includes a reference to a closest matching cluster of the plurality of clusters for the first device signature, the first device name being a name of the closest matching cluster; and the response message includes the first device name.
5. The system according to claim 1, wherein the clustered data structure includes a generation index, the system further comprising a clustering update processor to periodically: re-cluster the clustered data structure in accordance with the clustering algorithm yielding a new generation of the clustered data structure; apply the plurality of device names to the new generation of the clustered data structure; and update the generation index of the clustered data structure in accordance with the new generation of the clustered data structure.
6. The system according to claim 5, wherein the input/output sub-system is operative to, in response to the updating of the generation index, send a message to the remote device informing the remote device of an updated value of the generation index or publish an updated value of the generation index.
7. The system according to claim 5, wherein: the clustering update processor is operative to, in response to the clustered data structure being re-clustered, create a classifier based on the clustered data structure; and the device identification processor is operative to perform the decision process with the first device signature as input to the classifier yielding the output.
8. The system according to claim 5, wherein the response message includes the generation index.
9. The system according to claim 1, wherein:
- the input/output sub-system is operative to receive, from the remote device, a second device signature describing information about a third device;
- the device identification processor is operative to determine if the second device signature includes a minimum information content; and
- if the second device signature does not include the minimum information content, the input/output sub-system is operative to send a response message to the remote device indicating that the second device signature lacks the minimum information content.
10. The system according to claim 1, wherein if the first device signature is measured by the device identification processor as not close enough to any of the plurality of clusters based on a threshold, then the device identification processor is operative to add a new cluster to the clustered data structure.
11. The system according to claim 1, wherein the first device name of the closest matching cluster has a level of confidence that the first device name of the closest matching cluster is correct.
12. The system according to claim 11, wherein the response message includes the level of confidence.
13. The system according to claim 11, wherein the device identification processor is operative to include the first device name in the response message if the level of confidence fulfills a criterion of the remote device.
14. The system according to claim 1, further comprising a cluster naming processor to: prepare a first name-enquiry message for sending to a first name-supplier to find a second device name for a second cluster of the plurality of clusters; receive a first name-response from the first name-supplier; generate the second device name based on the first name-response; and assign the second device name to the second cluster.
15. The system according to claim 14, wherein the first name-enquiry message includes, or references, data from at least some device signatures of the plurality of device signatures included in the second cluster.
16. The system according to claim 14, wherein the cluster naming processor is operative to assign a level of confidence that the second device name of the second cluster is correct.
17. The system according to claim 16, wherein the input/output sub-system is operative to receive the level of confidence from the first name-supplier.
18. The system according to claim 16, wherein the cluster naming processor is operative to calculate the level of confidence based on a similarity of the first name-response from the first name-supplier with at least one second name-response from at least one second name-supplier.
19. The system according to claim 16, wherein the cluster naming processor is operative to select the second cluster to obtain a device name for from the plurality of clusters based on a prioritization of the second cluster from among the plurality of clusters.
20. A device discovery method comprising:
- storing a clustered data structure including: a plurality of device signatures grouped according to a plurality of clusters clustered in accordance with a clustering algorithm based on the plurality of device signatures as input to the clustering algorithm; and a plurality of device names, each of the plurality of device signatures including device information, a sub-set of the plurality of clusters being associated with the plurality of device names such that each first cluster of the plurality of clusters in the sub-set has a different one of the plurality of device names, each of the plurality of device names including a device attribute;
- receiving, from a remote device, a first device signature describing information about a first device;
- performing a decision process based on the clustered data structure with the first device signature as input yielding an output including a first device name among the plurality of device names or an indication that a name associated with the first device signature is unknown;
- preparing a response message including data about the output; and
- sending the response message to the remote device.
Type: Application
Filed: Mar 14, 2016
Publication Date: Sep 14, 2017
Inventors: Steve EPSTEIN (Hashmonaim), Ezra DARSHAN (Beit Shemesh), Harel CAIN (Jerusalem), Shali MOR (Modiin)
Application Number: 15/068,754