商品搜索性能问题排查

  1. 启动arthas服务

    1
    2
    3
    1. cd /usr/local/src/
    2. source /etc/profile
    3. ./as.sh
  2. 定位性能瓶颈

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    # 监控耗时大于1秒钟的线程
    [arthas@13]$ trace com.ylsk.web.mobileApi.merchandise.service.MobileMerchandiseV2Service merchandiseSearch '#cost > 1000'
    Press Q or Ctrl+C to abort.
    Affect(class count: 1 , method count: 1) cost in 243 ms, listenerId: 7
    `---ts=2025-06-18 18:33:52.078;thread_name=qtp1405163418-423;id=423;is_daemon=false;priority=5;TCCL=org.springframework.boot.loader.LaunchedURLClassLoader@49c2faae
    `---[6994.776167ms] com.ylsk.web.mobileApi.merchandise.service.MobileMerchandiseV2Service:merchandiseSearch()
    +---[0.00% 0.009284ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchMerchandiseV2Param:getKeyWord() #283
    +---[0.00% 0.006202ms ] com.ylsk.common.utils.StringUtils:isNotBlank() #284
    +---[0.00% 0.006645ms ] com.ylsk.web.mobileApi.merchandise.entity.MerSearchV2ParamsVo:<init>() #289
    +---[0.00% 0.104357ms ] org.springframework.beans.BeanUtils:copyProperties() #290
    +---[0.00% 0.02407ms ] com.ylsk.web.mobileApi.merchandise.service.MobileMerchandiseV2Service:fillMustMerList() #292
    +---[92.31% 6456.983087ms ] com.ylsk.web.mobileApi.merchandise.service.MobileMerchandiseV2Service:loadMerInfoBySearch() #295
    +---[0.00% 0.012519ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchResultVo:getMerchandiseList() #298
    +---[0.00% 0.006747ms ] org.springframework.util.CollectionUtils:isEmpty() #299
    +---[0.10% 7.320169ms ] com.ylsk.web.mobileApi.merchandise.service.MobileMerchandiseV2Service:loadOtherInfoToMerList() #301
    +---[0.00% 0.006406ms ] org.assertj.core.util.Lists:newArrayList() #306
    +---[0.00% 0.009533ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchResultVo:getTotalSize() #308
    +---[0.00% 0.006345ms ] com.ylsk.platform.Conv:NL() #308
    +---[0.00% 0.005198ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchResultVo:getMerchandiseList() #309
    +---[0.00% 0.008116ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchMerchandiseV2Param:getCustId() #312
    +---[7.56% 528.538416ms ] com.ylsk.web.mobileApi.merchandise.service.MobileMerchandiseV2Service:fillMerchandiseBuyGiveRemain() #313
    +---[0.00% 0.00982ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchMerchandiseV2Param:getPage() #316
    +---[0.00% 0.007291ms ] com.ylsk.web.mobileApi.merchandise.entity.SearchMerchandiseV2Param:getPageSize() #317
    +---[0.00% 0.013504ms ] com.ylsk.common.utils.Pagination:getIsCanGoNext() #318
    +---[0.00% min=0.003737ms,max=0.009941ms,total=0.013678ms,count=2] com.ylsk.web.mobileApi.merchandise.entity.SearchMerchandiseV2Param:getBranchId() #323
    +---[0.00% 0.004901ms ] com.yvan.platform.StringUtils:isNullOrEmpty() #323
    `---[0.02% 1.31669ms ] com.ylsk.web.mobileApi.common.service.CommonDataService:getDeliverTips() #324
  3. 主要性能瓶颈在ES搜索:loadMerInfoBySearch() - 占用 92.31%,耗时 6456.98ms

  4. 查看ES集群中各节点的健康情况

  • GET /_nodes/stats
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    "os": {
    "cpu": {
    "percent": 2, // CPU使用率:2%
    "load_average": {
    "1m": 0, "5m": 0.03, "15m": 0.05 // 系统负载很低
    }
    },
    "mem": {
    "total_in_bytes": 8200249344, // 总内存:7.6GB
    "free_in_bytes": 141103104, // 空闲内存:134MB
    "used_in_bytes": 8059146240, // 已用内存:7.5GB
    "used_percent": 98 // 🚨内存使用率:98%
    }
    },
    "jvm": {
    "uptime_in_millis": 57706435160, // JVM运行时间:668天
    "mem": {
    "heap_used_in_bytes": 2111900680, // 堆内存使用:2GB
    "heap_used_percent": 49, // 堆使用率:49%
    "heap_max_in_bytes": 4260102144, // 最大堆内存:4GB
    "non_heap_used_in_bytes": 174629224 // 非堆内存:166MB
    }
    }
  1. 查看所有索引的健康状态
  • GET /_cat/indices?v

总结:

  • 最后得出结论,ES集群3个节点都暴露出内存不足的问题(内存使用率高达98%,含文件缓存)
  • 进入服务器验证内存使用情况,A、B、C3个节点仅B节点内存使用率高达95%,其它两个节点高达75%,不含文件缓存
  • 回顾前端时间B节点多安装了两个中间件(rabbitmq和Kafka,公司降成本,将华为云服务调整为自主部署)
  • 临时调小JVM内存到3G,事后升级各个节点的内存到16G
  • 最终性能问题得以解决