Opportunity Name: Optimize OpenSearch Domains with Over-Provisioned Storage
AWS Resource Type: AWS OpenSearch
Opportunity Description:
This Finder identifies OpenSearch domains that are over-provisioned in terms of attached storage volume size, which can lead to unnecessary costs. Since AWS charges premium pricing for OpenSearch EBS gp2 and gp3 volumes—35% and 52.5% more expensive than EC2 equivalents, respectively—right-sizing these volumes is a high-impact optimization opportunity.
CloudFix analyzes 30 days of CloudWatch metrics to predict future storage requirements and recommends an optimal volume size that maintains performance while reducing costs.
Criteria for identifying the opportunity:
- OpenSearch domain has ≥30 days of available CloudWatch metrics.
- Volume size is greater than 10GB (AWS minimum size for OpenSearch volumes).
- A linear regression model predicts a smaller required storage size over the next 3 months:
- PredictedFreeStorageSpaceIn3Months = 4 × CurrentFreeStorageSpace − 3 × FreeStorageSpace30DaysAgo
- RecommendedVolumeSize = int((CurrentVolumeSize − MinimumFreeStorageSpace) × 1.3)
- The RecommendedVolumeSize is less than the current size.
- For gp2 volumes, the Fixer ensures IOPS requirements are still met post-resize.
Potential Savings (range in % on annual basis):
Typical savings range ≥12%, depending on current usage and provisioned storage sizes.
What happens when the Fixer is executed?
The Fixer automates the storage volume resizing process:
- Calls the UpdateDomainConfig API to apply the new volume size.
- Monitors progress with the DescribeDomainChangeProgress API.
Is it possible to rollback once CloudFix implements the fixer?
Yes. Rollback scenarios include:
- Automatic rollback if free storage space drops below 20%, triggered via CloudWatch alarms.
- Manual rollback from a snapshot restore if a cluster enters a “red” state.The rollback automation ensures volume is resized back up to provide 30% free space, and the alarm remains active for 90 days post-resize.
Can CloudFix implement the fix automatically once I accept the recommendation?
Yes. Fixer execution is fully automated upon user approval.
Does this fix require downtime?
No. However, temporary performance degradation (e.g., latency spikes) may occur during resizing. AWS increases instance count temporarily during volume modification, which may stress cluster master nodes. It's recommended to schedule the fix during a maintenance window.
Additional Resources:
Bill Gleeson
Comments