Feedback on Collaboration for Mission-Critical Work

@bbodenmiller, I hit the character limit on the forum for a reply, so trying to answer your other questions here. I really appreciate your feedback, which was requested on this thread, and I’ll be trying to cover down on your points.

This thread is getting long so I might intentionally repeat/adapt some of the context from our previous discussions to err on the side of clarity.

There’s some risk of me getting called out for sounding like I’m repeating what I said before, but repeating some statements in a long thread is how I’m hoping to de-risk people missing part of what I’m trying to share.

The intention of safe limits is to have free Mattermost deployments adopt automation and scale capabilities earlier in their lifecycles, with Mattermost’s customer-facing teams supporting champions in making the business case to their stakeholders and to smoothly transition.

Until these changes are made, we’ll continue to have situations where Mattermost champions are in “dilemmas of success”–deployments that have grown the free version far past safe limits, and having higher seat counts, high adoption, and accelerated workflows paradoxically increase the complexity and time needed for procurement.

We thought of 10,000 users as an “insanely high” number for the starting point of safe limits, for a free product who’s recommended size is 50. The intention was to have impact limited to a very, very small number of deployments on the planet, which we could work with directly to ensure they’re continuing to smoothly operate.

In my mind, in running 10,000 users without enabling automation and scale capabilities, the Mattermost brand has the most extreme risk exposure: a) more things can go wrong, b) more end users impacted, c) longer time to mitigate the issues with procurement and implementation changes, d) geometrically greater brand risk given a * b * c.

So at the 10,000 user starting point, our thinking is we have a geometrically smaller number of deployments impacted (we estimate ~0.8% of free servers have more than 500 users, and 10,000 is nearly zero), and a geometrically greater risk.

That said, I’d like to ask @bbodenmiller a clarifying question–with the caveat that this isn’t something we’re necessarily committing to doing, as there are many stakeholders involved in this–but for you or anyone in the community:

If we shipped user limit warnings at 10,000 in Feb, for admin only, and functional user limits were at 25,000 for March, would anyone consider that a breaking change?

Whether it’s 10,000 or 25,000 users, if the upgrade cycle is 3-6 months, and the functional limits to the free Team Edition ship in March, then it sounds like it’s around 5-8 months from now until when users above 10,000-25,000 need to be deactivated from the system, or a bridge solution to a proof-of-value environment can be deployed, working with the Mattermost customer team.

Did I get that right?

I can’t personally conceive of running at 25,000 users on Team Edition, without scale and automation features enabled, without an incredible number of things breaking and being destructive to the brand.

I’m trying to understand if the issue is user limits in themselves, or certain free instances being impacted.

Regardless of the answer, it’s imperative that our customer-facing teams are connecting with our largest free deployments–especially ones we don’t know about yet–so upcoming changes don’t impact operations for any instance.

Side note: I think there’s going to be a safe limit user limit setting so high that effectively no one is impacted, and hence it’s not a breaking change. The analogy here is SpaceX Starship. My understanding is when getting approval to fly an experimental 100 ton rocket over the ocean, there was a concern that the falling debris from a failure might hit a whale. When the math was done, the surface area of whales relative to the surface area of the ocean was effectively zero. So it’s kind of the same here, there’s some super high limit we can use as a starting point that won’t hit any actual instance, and therefore wouldn’t be a breaking change.

We’re planning to do a blog post based on what’s shared at the top of this thread (Feedback on Collaboration for Mission-Critical Work) which is about accelerating mission critical work in the world.

It’ll discuss our focus on critical infrastructure organizations with our enterprise offerings, also discuss our nonprofit self-managed license for communities below 1000 users. We’ll also be potentially announcing a new cloud offering for nonprofits over 1000 users, very steeply discounted, with a value exchange where they may receive earlier versions of new improvements to share their feedback ahead of rolling out to broader audiences.

We’ll be discussing the changes in our free offering as well, with recommendations on securing deployment configurations across private networks and behind VPNs to provide a layered security foundation for the platform. We’ll also be sharing about safe limits rolling out, starting at 10,000.

There’s not a mechanism on the blog to provide feedback on the website, but we can look into linking to the Mattermost forums, or this thread specifically.

Interestingly the reason we had “unlimited” on the pricing page was that we used to have a cloud freemium version limited to 10 free users.

We iterated on the 10 user limit in free, and put “unlimited” for users in, but limits in other areas.

Then the cloud freemium version was discontinued, so there’s only the commercial cloud offerings now, but I think the pricing page needs to clean up the artifacts, because all versions–including paid versions–should have safe limits depending on their configuration.

Appreciate the feedback here, we should remove the out-of-date mentions.

100% agree.

I’m personally committed to ensure we successfully bridge anyone impacted by this change.

It’ll be unsurprising to our customer-facing teams that transitions can take more than a few months.

Sometimes it feels like the larger the purchase, the longer it takes, even when you’ve got a ton of end users using the system, and more users at the doors asking for access.

Often it’s engineers, technical teams and operators driving Mattermost adoption. These are the folks at the core of every critical infrastructure company, and we’re focused on accelerating their most vital workflows.

When Mattermost transitions from free to commercial early, it’s an easier path because it’s within discretionary budgets of technical leadership, there’s validation from tech teams that their workflows are faster, that their transparency and focus is higher, and error rates are reduced–because Mattermost is built for for operators and technologists.

Supporting Mattermost champions in making the business case, and winning the support of stakeholders is what our customer-facing function is built to do.

This includes all the support to keep things operational including figuring out bridge solutions, proper sizing, architecture selection, roll out planning, trial licenses, migration acceleration, and so forth.

In enterprises, when a free Mattermost deployment hits 100-500 users, IT often declares it a “successful pilot”. Then either the champion or procurement contacts Mattermost, or one of our resellers, and works with a mix of our online documentation, our system integrator community, and Mattermost support and technical account management teams on upgrading to one of our commercial versions, accelerating automation and scale, and to smoothly roll out to internal users.

There are larger transitions in the 1000s that have been a little more work. For a 10,000 user transition from free to commercial it might be a mix of pilot-to-production and upgrade-for-scale flows. I think the best outcome would be landing on the Kubernetes operator in a scaled, high availability environment so all the future management would flow smoothly.

That’s very good to hear. We’ve seen some really painful ones.

At the same time, no one really calls us when everything is working well, so our data isn’t super normalized.

One of the reasons of setting safely limits at 10,000 users on the Team Edition was that we didn’t expect IT departments to start considering a massive Microsoft Teams or Slack purchase, or exiting Mattermost all together, when the limit is astronomically high.

A few thoughts here:

Developing a free-to-paid business case at 10,000 users.

If I was a CIO and someone told me there was a piece of open source infrastructure that had a new limit of 10,000 users, coming down over time, I imagine I’d be like “Okay. Sounds like we get 10,000 free users. Figure out who to remove so we can stay under the limit, and let me know when they announce limits coming down, and we can deal with it then.”

What we intend for safety limits to trigger, both at 10,000 users and in future at lower levels, is an assessment of Mattermost’s value among its user base.

This happens regularly in our paid deployments for initial purchase, for license expansion and for license renewals. There’s a process to speak to all the organizations using Mattermost to understand:

a) why they want it,

b) what’s the benefit for them of continuing to have it, and

c) whether the organization’s would use their own budget to buy seats.

Some orgs may or may not have their own IT budgets, so sometimes it’s a “what if” assessment.

Through this process, the business value of Mattermost seats is established and the data can be provided to Central IT for review and budgeting.

In some enterprises, Central IT works through a charge-back system to group buy licenses for all the organizations that want them, and then get reimbursed by organizations using the shared services. In other enterprises, Central IT might just allocate the purchase from a shared services budget, or a departmental budget (e.g. product and engineering), or there could be a mix.

Meanwhile, it’s a priority for Mattermost’s customer-facing teams to support Mattermost champions in working with their organizations through this process, ensuring there’s no operational disruption, as well as sharing patterns, practices and materials to speed success in the commercial journey.

Why buy Mattermost, when Microsoft Teams and Slack are available?

I think Microsoft Teams has an estimated 300 million users right now, so I can certainly see the perspective that everything else is “niche”, perhaps Slack included.

What I can share publicly is what’s available on our website, which is the use of large Mattermost deployments for mission critical work.

This includes the U.S. Department of Defense using Mattermost for operational ChatOps in complex and real world environments, and the use of Mattermost in one of its largest-ever readiness exercises for providing airlift, aerial refueling, aeromedical evacuation, and humanitarian and disaster assistance.

We also enable critical infrastructure enterprises internationally, including utilities like RTE who manages France’s national power grid.

So my perspective is slightly different.

I think of Mattermost’s focus on accelerating the world’s mission critical work as an important and large market, and a meaningful mission for our company and our community.

When the work of an organization is vital to the safety of a society, there’s an elevated need for resilience, and for platforms dedicated to the success of operators and their technical teams, which can adapt to their specific needs.

This is the business case made over and over again within our commercial customer community.

Resilience in Microsoft Teams environments

In the context of resiliency, Mattermost’s ability to supplement Microsoft Teams is an essential service for many customers who run both.

My co-founder and I are former Microsoft engineers, that worked out of Redmond, and we have enormous respect for many of the security professionals that work there. At the same time, the cyber pressure Microsoft faces is extraordinary and increasing, due to the growing magnitude of confidential data they are protecting, advancements in offensive cyber capabilities and in the sheer ferocity in the newest generations of attacks.

It’s difficult for many critical infrastructure enterprises to get comfortable storing 100% of their confidential communications in Microsoft Teams as a multi-tenant shared service. Especially when it’s operating on a shared code base on public cloud with known points of ingress and egress, and uses availability keys in its encryption protocol.

I’ve talked to some security professionals who share that for safety reasons they need to operate under the assumption that Microsoft Teams is breached at all times, and any incident response or discussion of confidential vulnerabilities data needs to happen in an “out-of-band” system.

Part of this is the principle that security operations should be separated from the environments they’re protecting. So if security professionals are protecting a network, they should be on a separate network. If they’re protecting Microsoft Teams, defending against threats and responding to breaches or new vulnerabilities potentially in M365, they should be on a platform out-of-band from Microsoft Teams.

The other part of this is the principle that the effort that goes into a breach is proportional the economic value of the breach. At 300 million users, the economic value of breaching Microsoft Teams is astronomically high for attackers, compared to the value of breaching a self-sovereign Mattermost system used by only a single target.

Also this doesn’t apply just to Microsoft Teams, if you have any primary platform for collaboration that your security operations is tasked to defend they should have an out-of-band platform for security operations.

As a counter example, when you read through the public SolarWinds disclosures, it seems clear the security organization didn’t use out-of-band communication, so when their primary collaboration system was breached without their knowledge, information about vulnerabilities in their systems and in their customer base, as well as their response plans were flowing to attackers, compounding the problem and creating a cycle where recovery was somewhat impossible.

Beyond the security perspective, for resiliency in mission critical operations, out-of-band scenarios also apply to platform engineering, SRE, DevOps, and DevSecOps scenarios when organizations want control of their own uptime SLAs.

From this context, for any critical infrastructure enterprise running Microsoft Teams or any other primary collaboration platform at scale, there is an opportunity for Mattermost to provide out-of-band support for resilience in addition to workflow acceleration.

Focus and Adaptability to mission critical workflows

Mattermost provides operators, technologists and developers an environment focused on accelerating their vital workflows, increasing their visibility, and reducing error rates and blind spots in the delivery of mission critical work.

For technologists and DevSecOps practioners, Mattermost’s ability to integrate with systems like GitLab, to be customized, to integrate with custom tooling, in-house systems, custom security and compliance requirements, as well as connecting into DevSecOps infrastructure inside and outside the Microsoft portfolio all contribute to providing the focus needed to accelerate technical and operational success.

For operators in mission critical and high compliance environments, Mattermost’s open source code base and self-sovereign design lets organizations adapt Mattermost to nearly any purpose–deployment in private and air gapped networks, integration with custom compliance tools, customized federated communications across security boundaries, deep AI automation and a host of scenarios and capabilities that centralized SaaS platforms would struggle to provide.

In contrast, general collaboration tools like Microsoft Teams are designed to serve “everyone”, and there can be a level of information overload for end users, a level of request overload to Central IT for supporting specialization, and a level of system limitation that makes mission critical work significantly more difficult to complete.

Without the right tool for the job, without the right support for the vital work of operational and technical organizations, the risk of blind spots, errors and delivery delays can elevate. This can lead to risks in retaining the strongest engineers, and recruiting new stars. Unaddressed, these risks all together can snowball towards compromising operational resiliency and success for the enterprise itself.

Mattermost is built to address these risks, both independently and as a supplement to Microsoft Teams through our interoperability investments: Mattermost and Microsoft Solutions

Circling back to your question on Slack, we don’t see them that often in critical infrastructure enterprises, and my understanding is our value propositions are quite difference.

While Slack in its startup days served developers, after its acquisition by Salesforce its user interface was re-designed and it seems its long term direction has shifted to sales and marketing integration, with a new Slack CEO publicly announcing in 2023: “My vision is that we become the core engagement layer for all of Salesforce, for how people engage with Salesforce. There should be no starting point besides Slack.” Also, our understanding is that to become a Slack customer, CIOs need to approve Slack’s policy of deleting all customer IP stored in Slack when the customer downgrades their subscription tier.

Mattermost take a different approach, with a focus on critical infrastructure enterprises and accelerating the success of operators, technologists and developers in their mission critical work, and offering self-sovereign deployment and full control of private data and IP.

@bbodenmiller thanks again for the feedback, which is what this thread is for. Very open to more input from you or any reader.