Java S3 客户端连接 Ceph 一段时间后出现了这种异常,未能排查出具体的原因。从错误状态看是访问拒绝,猜测可能是资源连接太久,有资源没有释放造成泄漏。也有可能是资源的长时间不用,类似长连接断开后未能再次建立连接。

ceph 版本:

1
2
$ ceph --version
ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) luminous (stable)

依赖的 java sdk 版本是:

1
2
3
4
5
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.10.0</version>
</dependency>

出现问题的几个类:AmazonHttpClient,AmazonHttpClient,HttpResponse,S3ErrorResponseHandler,AmazonServiceException,仔细看了一遍代码,还是未能找出具体的原因。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
2020-12-07 10:30:49.133 ERROR 1323266 --- [http-nio-8090-exec-9] c.f.s.web.controller.ImageController     : null (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: tx000000000000000002a25-005fcd93d9-8532-default)

com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: tx000000000000000002a25-005fcd93d9-8532-default)
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1160)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:748)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:467)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:302)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3769)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1189)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1057)
at cn.xxx.siamat.biz.service.image.storage.impl.ImageStorageServiceImpl.getObject(ImageStorageServiceImpl.java:121)
at cn.fraudmetrix.siamat.web.controller.ImageController.image(ImageController.java:31)
at sun.reflect.GeneratedMethodAccessor54.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:870)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:776)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:991)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:925)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:978)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:870)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:855)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)

检查 ceph 的状态,osd状态显示为 up:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ ceph -s
cluster:
id: 3634dbd3-54bd-43a2-9ce1-5596c50d6d39
health: HEALTH_OK

services:
mon: 1 daemons, quorum retina-d-033072
mgr: retina-d-033072(active)
osd: 1 osds: 1 up, 1 in
rgw: 1 daemon active

data:
pools: 6 pools, 48 pgs
objects: 6.65k objects, 295MiB
usage: 1.51GiB used, 1.74TiB / 1.75TiB avail
pgs: 48 active+clean

最后,我们重启应用解决。但这不是最终的解决方式,这是一颗定时炸弹!