Keeping your cool can be challenging when a crisis occurs and you’re faced with a massive wave of support requests and frustrated customers. How can you consistently provide transparent and accurate communication that maintains a trusting relationship with your customers, during a time when your product or service is failing many or all of them?
It all starts with the resolve to be prepared and stay calm — and a having crisis management plan in place.
About this guide
Creating a well-documented and efficient crisis management process means accounting for all scenarios, gathering input from all relevant stakeholders, and defining how and when to communicate with customers. Behind that plan you need a team that’s trained to use it, so that when a crisis happens, they’re ready to go. Finally, you want a way of measuring how well your team does during these crisis situations so that you can do even better the next time a crisis hits. Accomplishing all of that is the subject of this guide and we cover the following topics:
- What qualifies as a crisis?
- The 4 keys to crisis management
- How do you build trust?
- Stay calm, be prepared
- How to create a crisis management plan
- Assemble your incident response team
- Communicate with your customers effectively
- An example crisis management process
- Measure how well you did
- Congratulations! You’re just getting started
This guide is a companion to the crisis management talk that Dave presented at Relate, the Zendesk global user conference. You can find a recording of Dave's talk here: Manage customer satisfaction in a crisis.
What qualifies as a crisis?
First, what are we talking about when we say “crisis”? There are several types of events that can qualify, including:
- Service disruption
- Security incident
- Legal entanglement
- Public relations nightmare
- Physical emergency
The key element that ties all of these types of crises together is that they can affect many customers at the same time. Unlike events that affect a single customer, when a crisis affects many customers public communication is usually required, and that means you want your team to speak in a single voice.
The 4 keys to crisis management
These situations can damage the trust relationships you have with your customers, so you want to do everything you can to manage and repair, if necessary, that trust — and as quickly as possible. Here are what we believe are the 4 keys to effectively managing a crisis.
Restore service ASAP — This probably goes without saying, but it’s certainly true. This means assessing the scope, repairing the damage, and putting measures into place to prevent it from happening again.
Don’t be defensive! — This is no time to deny, cover up, or shift blame. Transparency and honesty are key, both internally and externally.
Provide a consistent response — This consistency falls along two lines: your crisis communications should be consistent with your brand’s voice (taking into account the seriousness of the situation, of course), and you want a consistent customer experience during crisis situations, so that the next time one happens, your customers will know what to expect and trust that you can execute.
Handle the crisis efficiently — A well-defined process allows a small team to handle the situation and lets the rest of your team function as normally as possible, with no duplication of effort.
How do you build trust?
So how do you build a trust relationship in the first place? Dr. Heidi Grant Halvorson, in a talk on the science of perception at Relate Live San Francisco in May 2016, said that the two things that engender trust are warmth and honesty — and this is exactly what customers want from you when there’s a crisis. They want to know that you can resolve the problem (and hopefully prevent it from happening again), and they want to know that you care about them.
Stay calm, be prepared
Calmness is a way to demonstrate competence — it’s grace under pressure, not apathy or coldness. Emotions are contagious. If your team is calm, they can help your customers stay calm as well.
What’s more, anxiety can lead to bad decision-making, so keeping your team calm can result in better outcomes for your team and your customers.
As a leader, your emotions carry extra weight with your team. If you’re nervous, your team will be too. So how do we keep everyone calm?
Or put more simply, this haiku employing the name we use at Zendesk to describe our crisis management process:
A Red Alert is
not the time to figure out
how to handle one
How to create a crisis management plan
The best way to stay calm is to have a strong and tested plan in place, and people available that know how to execute it.
“If you fail to plan, you are planning to fail” - Benjamin Franklin
If you don’t have a plan around handling crises, it will be much more stressful when one hits. You’ll need to figure out what to do and how to do it amidst a possibly chaotic and stressful situation. Having a plan will help you and your team stay calm, and helps you provide a consistent response, so it contributes to your competence.
Your crisis management plan should include the following:
1. Owners responsible for writing, documenting, and maintaining the plan
2. A clear definition of what qualifies as a crisis
3. Every step in the crisis management process detailed
4. Roles and duties: who’s in charge of what during the crisis
5. Staffing: how to ensure people will be available when needed
6. Training: how to ensure the team can execute the plan
7. Tools: agreement on tools helps ensure consistency of response
8. Communications: protocols for internal and external communication — who speaks for the company, format and channels used, and the cadence of those messages.
9. Special cases: anything outside the norm
10. Metrics: how you measure success, how you know you’re getting better
11. The expectation that it will evolve over time
Partner with internal stakeholders
A crisis management plan doesn’t just involve your support team — there are many internal stakeholders that may need to be involved. Make them a part of your planning phase, whether they’ll need to help manage crises when they occur, or just need to be informed while one is happening.
Here are some stakeholders you should consider including — it’s probably best to meet with these teams one at a time to find out what’s worked and what hasn’t in previous crisis situations, and what their needs are going forward:
Your Support team — This may be obvious, but it’s critical that your Support team knows when there’s a crisis situation happening so that they can properly respond to customers, and look for support requests that might help define the scope of an issue when it’s first emerging. It’s important to keep the entire team informed about the current status and messaging that’s going out to customers until the crisis is resolved and they can return to normal procedures.
Engineering (e.g. to fix problems from bad code deploys)
Operations (e.g. most SaaS issues)
Security (e.g. DDoS attacks)
Legal (e.g. lawsuits)<
Human Resources (e.g. if the crisis involves internal personnel)
Facilities or Disaster Response (e.g. natural disaster or accident)
Marketing — They may want or need to adjust outgoing messaging during the crisis. Sending an email blast encouraging people to try a new feature may not be received well by customers when they can’t use your product at all.
Public Relations and Legal — These teams may need to get involved if the crisis has an extraordinarily large customer impact, or for crises that involve legal issues.
Sales, Account Management, Customer Success, and Executives — All these teams need to know when a crisis is ongoing, so they don’t go into a meeting with a customer or prospect and get ambushed with questions they don’t have the answer for.
Key customers and partners — Certain customers may expect proactive or verbal confirmation of crisis events as part of your support contract. Similarly, resellers or other partners might need to be kept apprised of the situation in the same way that internal customer-facing teams would. If your business depends on third-party tools, you may want to arrange with them to be notified as part of their crisis-management process (hopefully they have one too).
Document your plan
As with any complex system or process, it’s unrealistic to assume that everyone involved will remember all the details of the plan in the heat of the moment. If your Support team is geographically distributed or divided into shifts, then it’s also important that everyone is using a single source of truth, so that they can all respond consistently. That’s why it’s so important to have everything documented and available to everyone.
“If it isn’t written down, don’t expect anyone to remember it.” – Dave Dyson (as far as I can tell)
Here’s what you should be aiming for with your documented plan:
Complete — Cover both process and people. To put it simply: if it’s not written down, don’t expect your team to remember it. Cover the process from beginning to end, what the decision points are and how to make them, who’s in charge of what, how to get ahold of them, and sample messages — everything someone might need in order to execute the plan, when they’re by themselves on a weekend evening.
Clear — Be sure your plan is easy to digest when onboarding and in the heat of the moment. Use headings, table of contents, and diagrams. Slide decks and even pre-recorded video can make training easier, but don’t underestimate the time and cost of keeping those up to date when budgeting time and resources, since you know your plan will change over time.
Accessible — Don’t host your plan documentation in your support tool (or the product you support), so that if your support tool goes down, your team will still have all the information they need. This could mean having the plan available in a different software tool or printed out (and stored in a secure location, if it includes privileged information such as passwords to necessary tools).
Up to date — A plan that no longer matches the reality on the ground will only cause problems. Either your team will ignore the documentation, or it will lead them to make incorrect decisions — and any new team members will be getting incorrect information. Review your documentation whenever you make changes to your process, when training new team members, and when reviewing incidents that didn’t go well.
Assemble your incident response team
A crisis-management plan is no good without a team that can execute it successfully and consistently when the need arises. Depending on the size of your organization, your incident response team might just be you, or you might have a dedicated team for whom it’s their primary responsibility. For many organizations, however, crisis management will be a role that’s part of the job of some people in your company, primarily the support team. Here’s a sample team roster, based our process at Zendesk:
Incident Lead — An experienced agent or team lead on your support team, who owns the problem ticket, gathers scope and impact information to share with the rest of the incident team, and provides status updates back to the support team.
Support Duty Manager — A manager on the support team who manages support resources during the crisis (for example, assigning additional staff to phone coverage), crafts the customer-facing messages and coordinates with whoever’s leading the effort to restore service. Pro Tip: Have a backup Support Manager on standby, in case your primary is unavailable - not only does this help ensure coverage, it allows you to handle multiple simultaneous issues, should that arise.
Operations Manager — A manager from the team that’s working to correct the problem (Operations, Engineering, Security, etc.). Manage the team working to restore service, confirms facts about the incident, and provides the internal-facing post-mortem report.
Incident Manager — A manager from the Support Operations team, who can assist with large incidents as needed, and crafts the public-facing post-mortem from internal version, and can evaluate the quality of the response.
After you have your incident team roles defined, you’ll want to make sure they’re available when lightning strikes. In a small team, this could mean long on-call shifts, but as your team grows, you can and should spread the duty out, across more people and shorter, less frequent shifts. As a global organization, we generally schedule 8-hour shifts spanning normal work hours for seven days at a time, spaced about by several weeks. This allows for good work-life balance while being frequent enough so that skills don’t deteriorate.
Paging systems such as PagerDuty or OpsGenie can allow you to schedule shifts and send notifications when an incident occurs. Make sure there’s an escalation path so that if someone can’t be available, there’s someone else who can step in.
After your incident response team has been notified, they need a place to work. Chat rooms are ideal for a distributed staff, and make it easy to review the incident later. Making checking in the first step for each member of your incident team - that way everyone involved knows who they’re working with.
Train your team to handle a crisis
Because you’re putting your best people in charge of executing your crisis management process, you might think that they’ll automatically be able to master it on their own. However, because this high-stakes and detail-oriented process is different from the day-to-day processes that your team is used to, they’ll perform better with an effective training program. Here are some training tips you may find useful:
Onboarding — Don’t assume a new person on the team will just pick this up on their own, even though they’re great at the usual tasks they do every day. This is a critical function, and training helps you ensure they understand every step, as well as the reasons behind those steps.
Shadowing — Having a trainee shadow experienced staff during an incident can help reinforce the training they’ve had and may also highlight any discrepancies in your training materials or questions that weren’t answered.
Visual aids — While videos and slide decks may help ensure consistency and may be easier to digest than text, they require additional effort to maintain and update as your crisis management process evolves. Expect to spend time maintaining them.
Drills — If your crises don’t happen very often, congratulations are in order! But that also can mean that your crisis management team might forget key steps in your process. If that’s the case, then running occasional drills can be worthwhile.
Scorecards — This can be a great way to help ensure that all steps were followed, both in onboarding and drills, as well as after any real-world crisis.
An example crisis management process
Now that you’ve got the tools to build your crisis-management plan, it’s time to think about the detailed steps to include. Every business will have its own needs, but the following outline should serve well as a template that you and your stakeholders can use to build on.
Initial Alert. Someone in your internal organization thinks a situation merits a crisis response, and pulls the alarm bell. For example, you might have a special email address to use for this purpose. You might also have software that monitors your systems and pulls the alarm when certain thresholds are met. In any case, automatic alerts are sent to your incident response team and other internal stakeholders.
Incident response team check-in. The members of your incident response team respond to their alerts (or escalate them to a backup if they’re unavailable), and gather in the designated chatroom or other space.
Verify the crisis criteria are met. If the consensus is that the criteria have been met, the incident response team continues with the process; otherwise, they stand down and notify internal stakeholders of the false alarm.
Determine scope and impact. Create an internal-facing Problem ticket, publish that ticket to the support team so they can attach customer reports (as Incident tickets) as they come in. Look for patterns in the customer reports and from what your support team and internal monitoring tools are telling you in order to get an idea of who’s affected by the issue, and what those effects are.
Public acknowledgement. Post an acknowledgement to Twitter, your system status page, your Help Center, or whatever other public-facing channels that are part of your process. Proactively contact key customers or partners, and create Incident tickets on their behalf to attach to the Problem ticket.
Provide status updates. As your operations or other team works to restore service, follow your defined communications cadence and update customers regularly via the same channels you used initially – Twitter, Help Center article, system status page, key customer tickets, etc.
Resolve the issue. Once services have been restored, send a final update to you above channels, and solve the Problem ticket (which will automatically solve the attached Incidents), linking to the Help Center article for the crisis, where you’ll publish the post-mortem. Breathe a sigh of relief, as most of your team can resume their normal tasks.
Wrap up. Craft the internal and public-facing post-mortem reports and post them. Review your checklist to determine if your incident management process was followed and if any changes need to be made to meet to cover unforeseen circumstances.
Communicate with your customers effectively
When a crisis strikes, solving the underlying issue is obviously the first and most critical task, but the other half of the battle is communicating effectively with your customer-facing stakeholders and directly with your customers. Poor communication can destroy the trust that you might have gained as a result of effectively resolving the cause of the crisis.
You’ve got to be proactive and let everyone know what’s happening. Here are some tips for effectively communicating with everyone during a crisis.
Timely — Communicate as soon as you can, and then follow a set cadence for follow-up communications. If you need to adjust the cadence (for example, if it becomes clear that a situation will last for hours or days), then communicate what the new cadence will be. Meeting a cadence means that sometimes you will not have new information to relay, but radio silence will needlessly increase your customers’ stress.
Relevant — Communicate the scope of the issue (who is affected, or likely to be), as well as the impact (how it will affect them). Include workarounds, if any.
Accurate — Don’t speculate — incorrect information can set unrealistic expectations. Having to correct yourself publicly damages trust in your competence. Therefore, you should ensure that a situation meets your crisis criteria before communicating publicly – build in a window of time at the start of your process for this verification step.
Compassionate — This means taking ownership and acknowledging the impact the crisis is having on your customers. Apologize for that impact, avoid being defensive, and don’t shift blame — your customers depend on you.
Honest — This should go without saying, but… don’t lie to your customers. It will come back to haunt you.
Transparent — Share as much as you can, to the extent that it can be helpful in setting expectations for your customers. They don’t need to know which version of some third-party tool you use to build your software, but it can help them to know that you’re working on reverting a build, or that you’re having problems with a server that’s impacting connectivity with your East Coast customers (for example).
One voice — There are two parts to this. First, only one person at a time should be messaging your customers – there should be a single source of truth. This prevents contradictory information from being sent out; mixed messages will damage your perceived competence. Second, the voice should be consistent in tone and terminology from one situation to the next. Including sample message templates in your process documentation is a great way to ensure this.
One of the best ways to build confidence and trust is to proactively communicate what’s happening during a crisis. This lets your customers know that you’re on top of the situation. Here are effective ways to proactively communicate with your customers.
Provide customers with a system status page — A system status page gives your customers a place to go when they suspect something is amiss. The more detailed a breakdown you have (feature-specific availability, rather than a simple “up/down” status for your entire product) the easier it is for your customers to know what’s going on. Include updating the page as part of your crisis management communication cadence, so that it always has the most up-to-date information.
Post updates on social media. — Twitter, for example, is a great way to communicate your status publicly during a crisis, since it’s a place that people often go in order to complain when things aren’t working. If you’re demonstrating competence and warmth here, it can defuse tensions. Additionally, customers can follow your Twitter feed to receive notifications when you post updates. Note: it’s a good idea to suspend ordinary marketing tweets during a crisis, as they’re not likely to go over well with frustrated customers.
Directly contact important customers via email or phone — Critical customers and partners may have special SLAs (service level agreements) and expectations of service, requiring you to contact them when there’s a crisis. Even if this is not the case, doing so can help to demonstrate the importance of your relationship with them. Ideally, all affected customers should be made aware of the situation, but that may not be practical, depending on the nature of your business and your scale.
Make a record of all outgoing communications — Your incident chat room and the problem ticket are good places for this. This allows your incident staff and anyone examining your response later on to reconstruct the timeline of events.
Publish a summary of your crisis-management process — This is a great way to set expectations for your customers – if they understand the process and how you’ll be communicating with them, they’ll most likely be more patient during a crisis.
A sample communications cadence
Here’s a sample cadence of communications, based on what we use in our process at Zendesk. The needs of your business and customer might dictate different times, but having a well-defined cadence will help ensure consistent execution of your process.
Initial public acknowledgement of incident — ASAP after the incident is verified, but within 15 minutes. The sooner you acknowledge an incident publicly, the less anxious customers become.
Description of incident scope and impact — ASAP but within 30 minutes. Incident scope should be specific enough for customers to self-identify if they are impacted.
Status updates on investigation/resolution — Every 30 minutes thereafter. When possible, status updates should provide new information to demonstrate progress is being made toward resolution.
"All clear" — ASAP after service is restored. Include a pointer to the article where the post-incident summary will be published.
Post-incident summary — Within three business days of the incident. Include scope, impact, timeline, root cause, resolution, and planned improvements.
Post-crisis follow up with customers
Your communication with customers doesn’t begin and end with crisis alerts and updates. It’s just as important to provide them with a summary after the crisis has been resolved. This gives them more insight and details about what happened and why and how you’re going to work to prevent future occurrences. Sharing these publicly demonstrates transparency and shows how you’re working to improv. Here’s what each summary should contain.
Scope and customer impact — Which customers were affected, and what the effects were. Root cause. What caused the issue.
Incident duration — The start and end time of the crisis, including instigating events, when the problem was noticed, when solutions were implemented, and the time when all customers were restored to normal service.
Communication timeline — Include all your outgoing public messaging with times and dates.
Resolution steps — What was required to restore service in this case.
Recommendations for improvement — Any changes being made to prevent the issue from happening again, including product, process, and training.
Handling special cases
A process like the one described above will probably cover the majority of the situations you’re likely to face. But every business will have situations that are uncommon, yet important enough to be included in their planning, such as the following.
3rd-Party partner or tool outages — Document the process for notifying the 3rd party, and tracking the progress they make on resolving the issue. In your customer communication, avoid throwing your partner under the bus! You can point customers to the partner’s system status page if that’s the best way for them to stay apprised, but remember that your customers are looking to you, not to them. You should still follow all the steps of your process.
Support tool/channels down — This can be especially stressful to both your customers as well as your support team, so having a plan in place will help all of you stay calm. As mentioned earlier, keep your crisis-management documentation somewhere that won’t be affected if your support tool is unavailable. You may need to set up backup communication channels; for example, a backup phone system, if your customers expect to be able to contact you that way. Twitter can be especially helpful as an outgoing and proactive communication channel here.
Shift handoffs — If your support team has multiple shifts or is geographically distributed, it’s helpful to be able to hand off long-running issues. This is one reason why a dedicated chat room is so useful. Insist on a warm handoff between teams, to ensure the incoming team is fully appraised of the situation.
Multiple simultaneous incidents — Sometimes multiple independent problems strike at the same time, or overlap. It’s probably best in these cases to handle them separately (and be clear in your outgoing communication which issue is being updated). This is where having backup incident management team members can come in handy — if you have enough people available, you can have independent teams operating for each crisis.
Measure how well you did
When crises are sporadic, they can be easy to ignore in favor of day-to-day ticket volume. But they can have a long-term effect on the trust your customers have in you, so it’s important to track them so you have a benchmark to improve upon. Well-structured reports can also help you measure the cost of your crisis situations, which allows you to budget for them going forward. Finally, reports can also help focus attention on areas of your product or service that need attention from your Product teams.
Here are some useful metrics for tracking the impact crises have on your team and business:
Number and rate of incidents — Counting the number of customer incidents is a good way to help quantify the customer impact of your crisis situations. Using “time over time” reporting can show you if you’re improving.
Resolution time per incident — Long resolution times might point to areas that can be improved such as diagnosis, resolution, or communication between teams.
Cost to team — If you have an estimated cost for handling any given support ticket, you can multiply that times the number of incidents to get a rough idea of what it’s costing your time to handle these issues. You can get finer resolution if you capture handle time per ticket, and multiple that times the number of incident reports and an hourly support cost estimate.
Customer impact — Add the duration of all your crisis incidents, multiplied by the number of users affected (either by counting the number of users that report an issue or, if possible, by calculating the number of users). If you have the data available, you could even weight these by the MRR or predicted customer lifetime value, in order to get a sense of the amount of revenue that’s at risk.
Customer Satisfaction — Compare the CSAT average of crisis incidents versus other support requests. If your process works well, you might find (as we have, at times) that incident tickets result in a higher satisfaction score than ordinary tickets!
Breakdowns by problem cause (type of incident, product area affected, etc.) — Using a product/feature category drop-down field (for us, it’s our About field) can point you to product, process, or infrastructure areas that might be in need of some love. For a software company, that might mean improvements to a deploy process, a software feature that’s easily overloaded, or servers that need additional redundancy.
Crisis management process scorecard — This is a checklist of all the important steps of your process. As part of your post-mortem review, make sure all the proper steps were followed. To build this checklist, go through your process documentation, and for each of your crisis-management roles, include all the tasks that person is responsible for (checking in when notified, meeting internal and external communication cadences, managing incident tickets, crafting the post-mortem, etc.). Then, score the response after each crisis. Gaps can point to you team members that might need coaching, or places where your training, documentation or even your process need to be updated to match the “facts on the ground”.
Congratulations! You’re just getting started
Remember, your plan is going to evolve over time. Use your metrics and checklists to make sure your team is following the procedure as intended, and listen to them when they suggest improvements. Keep your documentation and training materials up to date. Your metrics will also let you know how well your customers think you’re doing — if they’re not happy, look for ways to improve. Keep talking to your internal stakeholders as well.
And above all, stay vigilant – your customers are depending on you.