We had a SharePoint 2016 medium size farm (2 WFEs, 2 App Servers, 2 DB Servers & 2 OWA Servers) configured and running perfectly fine for a few months. Until one fine day, while opening Excel files in browser started throwing the below error –
We’re sorry. We ran into a problem completing your request. Please try that again in a few minutes.
Strangely, all other office file types (Word, PowerPoint) continued to work in browser without any issues.
Update: As per Microsoft, this is a known issue (“System Center 2016 Operations Manager APM Agent causing heap corruption in SharePoint”) and documented here.
As it happens with all such issues we started looking into OWA Server logs.
When looked into the category “Microsoft Office Web Apps” in Windows Event Logs, we found the error “Could not establish trust relationship for the SSL/TLS secure channel with authority” being logged repeatedly.
Natural conclusion drawn was that something was wrong with certificates, may be expired. But when when certificate was found good, valid and found that this error was being logged even before this excel related issue was found. On top of it, found this article, which explains why these certificate errors were being logged. So apparently, it was fine to ignore these errors. We moved on…
Next we looked into the windows event logs under “System” and found that Application pool for Excel Services was getting failed frequently.
Now this could be the cause of this issue, but we had no idea what was causing this failure. Clueless about the source of this behaviour, we decided to rebuild the OWA farm. So, we removed all the OWA components and reinstalled following this nice technet article about how to deploy Office Web Apps farm. However, even after all that effort, we were still greeted with the same error message “We’re sorry. We ran into a problem completing your request. Please try that again in a few minutes”, when tried to open any excel sheet in browser. Time to look into something else.
When we looked into the windows events log under “Application”, we found many errors being logged related to w3wp process frequently. This looked strange but somehow not related as the faulting module path was mentioning some “Microsoft Monitoring Agent” and nothing related to our Office Web Apps.
However, since this was causing w3wp process to fail, it couldn’t have been so unrelated too 🙂
So, what did we learn from that last Error #3? Let’s fix the faulting application “Microsoft Monitoring Agent” and that is the SCOM (System Center Operations Manager) agent. This is used by many organizations to monitor servers health.
We went ahead and uninstalled the SCOM agent from both of our OWA servers and after that when we browsed the excel sheet again, we got to see the moment for which the team was waiting for days – the excel sheet nicely opened in the browser 🙂
We installed the SCOM agent again on both the servers and issue didn’t return.
However unlikely it may sound, some corruption in the installed SCOM agent was causing the Excel Services application pool to crash. Just uninstalling and installing it again fixed the issue. Now, no more such errors are being logged.
You may still see this message as information in windows event logs every now and then, but this is expected and by design.