Hi, I'm Ivan Saakov, Engineering Manager at the inDrive Security Operations Center.

In this article, I share my experience and the architecture behind migrating Splunk Enterprise from a traditional on-premises bare-metal cluster with local disks to AWS using SmartStore technology. SmartStore enables the use of S3-compatible storage for warm Splunk buckets while keeping them fully searchable.

To keep this article focused, I intentionally omit the basic Splunk Enterprise installation and configuration steps and instead concentrate on the migration-specific settings that matter most.

Key achievement of the approach I used: Zero Downtime:

Splunk Support generally recommends stopping data ingestion for this type of migration. I deliberately deviated from that guidance. In a mission-critical Security Operations Center (SOC) environment, stopping ingestion or alerting is simply unacceptable.

1. Terminology and Architecture

Definitions and abbreviations:

2. SmartStore Logic

SmartStore changes the storage paradigm:

The Cache Manager, which runs on the indexer nodes, is responsible for intelligently managing the data lifecycle on fast local NVMe disks. Its behavior is based on two mechanisms:

3. Hardware Sizing (AWS)

For SmartStore, the balance between CPU and local cache performance is critical. Using EBS volumes in AWS is possible, but in practice it is usually more expensive when the IOPS and throughput requirements are comparable.

Instance choice: i3en family

Important nuance: Ephemeral storage

Mitigation:

4. Amazon S3 Configuration (Production Hardening)

5. Security and IAM: Hybrid Access

We required access to a single S3 bucket from both the Source and Target environments. The following set of S3 permissions proved sufficient:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ListAndLocation",
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::splunk-bucket"
    },
    {
      "Sid": "ObjectRW",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload"
      ],
      "Resource": "arn:aws:s3:::splunk-bucket/*"
    }
  ]
}

Access implementation

6. Traffic Balancing (AWS ALB)

Proper ALB configuration is critical for stable ingestion through HTTP Event Collector (HEC) and for the Web UI.

Global Listener Settings

Target Groups

A. HEC (IDX)

B. Web GUI (SHC)

7. Infrastructure Setup: Multisite Cluster

To achieve a zero-downtime migration, I used a temporary architecture in which two Splunk clusters run in parallel, each managed by its own Cluster Manager:

8. Site Architecture (Multisite)

In a SmartStore + Multisite configuration, it is critical to assign roles to sites correctly.

Architecture:

Why Site 0?

If you place SH in site1, Splunk automatically enables Search Affinity, attempting to read data from local peers. In SmartStore, that is counterproductive: the local cache may be empty while the required bucket resides in S3. Forcing site affinity interferes with Cache Manager logic and increases latency. Placing the SHC in site0 disables that behavior and allows the SHC to request data from any available peer. At the same time, each SHC node can still be placed in its own AWS Availability Zone without any issue.

Stage 1. Configure the Target CM (AWS)

At this stage, we establish connectivity for the multisite IDX cluster. Cluster Manager initialization command:

/opt/splunk/bin/splunk edit cluster-config \-mode manager \-multisite true \-site site1 \-available_sites site1,site2 \-site_replication_factor origin:1,total:2 \-site_search_factor origin:1,total:2 \-replication_factor 2 \-search_factor 2 \-cluster_label idx-aws-smartstore \-secret 'ClusterSecretKey'

server.conf:

[clustering]
mode = manager
multisite = true
available_sites = site1,site2
cluster_label = idx-aws-smartstore
site_replication_factor = origin:1, total:2
site_search_factor = origin:1, total:2
constrain_singlesite_buckets = false

Notes:

Target License Manager (LM)

A single source of truth, which can be combined with the CM. I recommend moving both the Target and Source clusters to the new LM. The goal is to avoid License Violations while the two environments are running in parallel.

Note that all cluster components except the indexers - CM, SH, HF, and DS - should send _internal and _audit logs to the target AWS cluster via outputs.conf.

Stage 2. Configure the Target IDX

Initialize peers in different AZs, that is, different sites.

For the node in AZ-A (site1):

/opt/splunk/bin/splunk edit cluster-config -mode peer -manager_uri https://aws-splunk-cm:8089 -multisite true -site site1 -secret 'ClusterSecretKey'

For the node in AZ-B (site2):

/opt/splunk/bin/splunk edit cluster-config -mode peer -manager_uri https://aws-splunk-cm:8089 -multisite true -site site2 -secret 'ClusterSecretKey'

server.conf:

[imds]
imds_version = v2

[cachemanager]
eviction_policy = lruk
eviction_padding = 102400

It is important to configure incoming data streams. New indexers do not listen on any ports by default. Configure:

Integration Layer setup

Target HF: the following items must be migrated:

Target DS: the following items must be migrated:

Stage 3. Build a Hybrid SHC

At this stage, we temporarily expand the Source SHC so that it can work with both the Source and Target IDX clusters at the same time.

This approach allows us to:

Hybrid SHC strategy:

Configure Multi-Cluster Search on the Source SHC

On every Source SH node, edit server.conf and replace the old [clustering] section with the following:

[clustering]
mode = searchhead
manager_uri = clustermanager:multi, clustermanager:single

[clustermanager:multi]
multisite = true
site = site0
manager_uri = https://aws-splunk-cm:8089

[clustermanager:single]
manager_uri = https://old-splunk-cm:8089

site = site0 is critical for SmartStore + Multisite. SH nodes must not participate in site affinity. Each SH node will query both IDX clusters in parallel.

Verify that the SH can see the new IDX nodes in AWS:

index=_internal | dedup splunk_server | table splunk_server

Initialize the Target SH

Important: During initialization, point to the old Deployer URL so the new SH nodes immediately retrieve the current application bundle.

/opt/splunk/bin/splunk init shcluster-config -mgmt_uri https://aws-splunk-cm:8089 -replication_port 9200 -conf_deploy_fetch_url https://old-splunk-cm:8089 -secret 'OLDClusterSecretKey'

In server.conf, before restart, specify the same multi-cluster search configuration used on the Source SH nodes:

[clustering]
mode = searchhead
manager_uri = clustermanager:multi, clustermanager:single

[clustermanager:multi]
multisite = true
site = site0
manager_uri =https://aws-splunk-cm:8089

[clustermanager:single]
manager_uri = https://old-splunk-cm:8089

Join into a Single SHC

Join the Target SH to the existing Source SHC:

/opt/splunk/bin/splunk add shcluster-member -current_member_uri 
https://old-splunk-search:8089

Zero-downtime mechanics: what happens under the hood

During SHC consolidation, several independent mechanisms operate in parallel:

Configuration Management Switchover (Deployer Switchover)

After SHC synchronization succeeds, configuration management can be fully moved to the Target Deployer across all SHC nodes. Edit server.conf on every SH:

[shclustering]
conf_deploy_fetch_url = https://aws-splunk-cm:8089

After that, execute a rolling restart of the SHC to apply the settings.

Stage 4. Migrate Data to S3 (Push to Cloud)

At this stage, we begin migrating existing warm and cold index buckets from the old indexer cluster to AWS S3, which will then serve as the remote store for the Target cluster.

Migration strategy

I recommend migrating in stages. Start with one non-critical index such as test_index. Verify that bucket upload to S3 succeeds and that no errors are present. Then gradually add the remaining indexes, in batches or all at once, depending on channel throughput and system load.

indexes.conf on the Source IDX cluster:

[volume:remote_store]
storageType = remote
path = s3://splunk_s3

remote.s3.region = eu-central-1
remote.s3.endpoint = https://s3.eu-central-1.amazonaws.com
remote.s3.encryption = sse-s3
remote.s3.supports_versioning = false

remote.s3.access_key = XXX
remote.s3.secret_key = XXX

[test_index]
remotePath = volume:remote_store/test_index

The selected indexes begin uploading existing warm and cold buckets to S3 in the background. The process is asynchronous and does not interrupt ingestion. New hot buckets continue to be written locally.

Migration validation

Force hot bucket rollover

To close active write files and turn all hot buckets into warm buckets so they can be uploaded to S3, use:

/opt/splunk/bin/splunk _internal call /data/indexes/*/roll-hot-buckets -auth 
admin:password

Do not move on to the next stage until Upload Queue reaches 0.

Stage 5. Attach the Target IDX Cluster

Apply the following configuration to the Target cluster in indexes.conf:

[default]
repFactor = auto
bucketMerging = true

homePath   = /splunk_cache/$_index_name/db
coldPath   = /splunk_cache/$_index_name/colddb
thawedPath = /splunk_cache/$_index_name/thaweddb

remotePath = volume:remote_store/$_index_name

[volume:remote_store]
storageType = remote
path = s3://splunk_s3

remote.s3.region = eu-central-1
remote.s3.endpoint = https://s3.eu-central-1.amazonaws.com
remote.s3.encryption = sse-s3
remote.s3.supports_versioning = false

[test_index]

The Target SmartStore indexers Ingest the configuration, connect to S3, discover the uploaded bucket metadata, begin serving search over the data in S3 through Cache Manager, and, after ingestion is switched over, start writing to their local hot buckets.

Cutover to the Target SmartStore Cluster

Stage 6. Finalize SHC Configuration (Post-Migration Cleanup)

After a successful cutover, the Target IDX cluster handles ingestion and search, and the Source IDX cluster no longer participates in search.

Post-Migration Final State: remove the [clustermanager:multi] and [clustermanager:single] sections. Leave only a direct reference to the Target CM.

server.conf:

[general]
site = site0
serverName = aws-splunk-search

[license]
manager_uri = https://aws-splunk-cm:8089

[replication_port://9200]

[shclustering]
conf_deploy_fetch_url = https://aws-splunk-cm:8089
mgmt_uri = https://aws-splunk-search:8089
replication_factor = 3
shcluster_label = shc_aws_prod

[clustering]
mode = searchhead
multisite = true
manager_uri = https://aws-splunk-cm:8089

Remove Source SH nodes from the SHC

Run the removal command from any active SHC member, preferably the Captain, specifying the URI of the old server being removed, then stop that server:

/opt/splunk/bin/splunk remove shcluster-member -mgmt_uri https://old-splunk-search:8089

After this stage:

Conclusion

The migration is complete. The Source IDX cluster is effectively no longer used. The Target IDX cluster operates in production mode, the data has been migrated and now resides in AWS S3, and the SHC has been fully moved to AWS. The old servers can be decommissioned permanently.

The key zero-downtime condition was achieved:

Happy Splunking!