I got an alert, I traced it back with an error trace and spotted the point of exception. And at that point, we were connecting with the Elastic search server. And the error, “Connection timed out”, the sentry was showing clearly shows that our code is trying to connect with some third party and is unable to connect with. So I saw that code is trying to connect with elastic search I came to the conclusion that our elastic search server is down somehow. I took the DevOps team in the loop and they confirmed that there is some issue with the server and they fixed it.
Handling exceptions more carefully: The elastic search server was back and the issue was fixed but this was not enough, as it could again be a show stopper for the end-users in case our elastic search server goes down. So I decided to handle the exception more carefully and we caught TimeoutException and logged a proper error instead of throwing it and breaking the application. Now our application won’t break if our elastic search server is down, as we have handled the exception at the place where we were trying to connect, but it could break again where we were trying to use that connection as now it won’t return the connection it and will silently log an exception and that was our next interesting challenge. To handle this we figured the places where this particular connection was being used and luckily that was just one place and we added a default value at that place and updated the docstring to express that learning: Whenever we try to connect with any third-party services, properly think through and handle all possible exceptions like timeout, internal error in that service, etc.

Homepage    Back to Blogs