The Find Point Clusters tool finds clusters of point features in surrounding noise based on their spatial distribution.
Analysis using GeoAnalytics Tools
Analysis using GeoAnalytics Tools is run using distributed processing across multiple ArcGIS GeoAnalytics Server machines and cores. GeoAnalytics Tools and standard feature analysis tools in ArcGIS Enterprise have different parameters and capabilities. To learn more about these differences, see Feature analysis tool differences.
A non-governmental organization is studying a particular pest-borne disease and has a point dataset representing households in a study area, some of which are infested, some of which are not. Using the Find Point Clusters tool, an analyst can determine clusters of infested households to help pinpoint an area to begin treatment and extermination of pests.
The input for Find Point Clusters is a single point layer.
The Choose the clustering method you want to use parameter determines whether a defined distance or self-adjusting clustering algorithm will be used. Defined distance (DBSCAN) finds clusters of points that are in close proximity based on a specified search range. Self-adjusting (HDBSCAN) finds clusters of points similar to DBSCAN but uses varying search ranges allowing for clusters with varying densities based on cluster probability (or stability).
All results will include a field called CLUSTER_ID that indicates which cluster each feature belongs to, and a field called COLOR_ID which is a label used for drawing the results so that each cluster is visually distinct from its neighboring clusters. For both fields, a value of -1 indicates that a feature has been labeled as noise.
If the Self-adjusting (HDBSCAN) clustering method is used, results will also include the following fields:
- PROB—The probability that a feature belongs in its assigned cluster.
- OUTLIER—The likelihood that a feature is an outlier within its own cluster. A larger value indicates that the feature is more likely to be an outlier.
- EXEMPLAR— Indicates which features are most representative of each cluster. These features are indicated by a value of 1.
- STABILITY—The persistence of each cluster across a range of scales. A larger score indicates that a cluster persists over a wider range of distance scales.
The Minimum number of points to be considered a cluster parameter is used differently depending on the clustering method chose:
- Defined distance (DBSCAN)—Specifies the number of features that must be found within a search range of a point for that point to start forming a cluster. The results may include clusters with fewer features than this value. The search range distance is set using the Limit the search range to parameter.
- Self-adjusting (HDBSCAN)—Specifies the number of features neighboring each point (including the point itself) that will be considered when estimating density. This number is also the minimum cluster size allowed when extracting clusters.
When using the HDBSCAN algorithm with an input layer containing more than 3 million features, the tool may fail unless your administrator increases the value of the javaHeapSize parameter on the GeoAnalyticsTools GP Service. Roughly 2 GB of heap space is needed per 3 million features. The amount of RAM specified by javaHeapSize should be available on each GeoAnalytics Server machine in addition to the 16 GB normally required by GeoAnalytics Server. For example, if you want to cluster 9 million features with HDBSCAN you should set javaHeapSize to no less than 6144 MB, or 6 GB. In this case, each GeoAnalytics Server machine should have a total of at least 22 GB of RAM available.
To learn more, see the ArcGIS Pro documentation on How Density-based Clustering works
ArcGIS API for Python example
The Find Point Clusters tool is available through ArcGIS API for Python.
This example finds clusters of retail locations.
# Import the required ArcGIS API for Python modules
from arcgis.gis import GIS
from arcgis.geoanalytics import analyze_patterns
# Connect to your ArcGIS Enterprise portal and check that GeoAnalytics is supported
portal = GIS("https://myportal.domain.com/portal", "gis_publisher", "my_password", verify_cert=False)
if not portal.geoanalytics.is_supported():
print("Quitting, GeoAnalytics is not supported")
# Find the big data file share dataset you're interested in using for analysis
search_result = portal.content.search("", "Big Data File Share")
# Look through search results for a big data file share with the matching name
bd_file = next(x for x in search_result if x.title == "bigDataFileShares_RetailLocation")
# Look through the big data file share for points of sale
pos = next(x for x in bd_file.layers if x.properties.name == "POS")
# Set the tool environment settings
arcgis.env.verbose = True
# Run the tool Find Point Clusters
output = analyze_patterns.find_point_clusters(pos, 10, "Kilometers", "POS_Clusters")
# Visualize the tool results if you are running Python in a Jupyter Notebook
processed_map = portal.map('USA')
This example finds clusters of retail locations.
Use Find Point Clusters to find clusters of point features in surrounding noise based on their spatial distribution. Other tools that may be useful are the following:
Map Viewer analysis tools
To determine if there is any statistically significant clustering in the spatial pattern of your data, use the Find Hot Spots tool.
To create a density map of your point or line features, use the Calculate Density tool.
To determine if there are any statistically significant outliers in the spatial pattern of your data, use the Find Outliers tool.
ArcGIS Desktop analysis tools
The Density-based Clustering geoprocessing tool performs the same function as Find Point Clusters.
The Find Point Clusters GeoAnalytics Tools is available in ArcGIS Pro.