Connected Cache for Enterprise - A Cloud-Native Game Changer

Picture the following scenario:
It's Patch Tuesday. You'd finally gotten the go-ahead to move your device patching to Autopatch, and all devices are onboarded. You start to get user reports of poor Teams meeting performance and your CEO is struggling to open webpages.
You get a Teams message from someone on your network team saying your internet bandwidth utilisation is basically 100%.
All of that planning and hard work, destroyed in an instant, and trust in the solution you worked so hard to push for evaporates...
But there's nothing you could have done, is there?
Why is Caching Important?
You'd be surprised at how much data gets naturally transferred to and from an endpoint device that's on for 6-8 hours a day. Windows Updates (Feature, Quality, Driver), Defender Definitions, Inbox Windows Apps, Microsoft Store Apps, Intune Win32 Apps, Edge Updates, and Office Installs and Updates would all, without any configuration set, likely have to come directly from the internet.
If someone is working from home, bandwidth constraints are (generally) less of a problem. But if your organisation has at least one office that has a non-negligible amount of people working from or visiting on a semi-regular basis, suddenly all of the bandwidth required from all those endpoints doing those updates can potentially cause serious network problems.
Or what about that remote location with heavily constrained network bandwidth availability? You could easily make a satellite connection completely unusable with all of that traffic.
The idea of transparent proxy caching at scale isn't new either, in fact a scenario I've seen them being used heavily for are LAN Parties like Dreamhack or Epic where you've got hundreds or even thousands of bandwidth-hungry gamers trying to install the same stuff from Steam, solutions like LANCache can massively help.

So unless you're that one higher education customer I spoke to who had multiple 20Gbps internet connections to their sites and literally said "We don't care" when the question of bandwidth came up, any way you could potentially take some of the strain off your internet connection you should probably be at least somewhat interested in.
And if you're not, I can almost guarantee your network team will be!
Gonna Get Myself Connected (Cache)
Let's start with a quick MCC 101. Microsoft Connected Cache isn't a new thing. MCC has been something you've been able to set up via Configuration Manager for some time. The basic principal is to set up a CM Distribution Point to store content that is supported via Delivery Optimization the first time devices need it, and subsequent requests for that same content come from the local DP rather than the internet.

It's pretty trivial to actually set up, with the majority of "complexity" coming from the required network configuration and policy settings needed to make sure clients know that the cache is a thing and to try and pull content from it.
But this obviously has one pretty massive caveat:
What if you're not using Config Mgr?
Well...
MCC for Enterprise and Education
Microsoft Connected Cache for Enterprise and Education is the same concept, but deployable locally and attached to an Azure resource, removing the need for any ConfigMgr infrastructure.
Instead, you can run it either on a Windows or Linux host somewhere on your network(s) and just kind of leave it to do it's thing. Got some spare NUCs and 2Tb NVMe drives lying about? Hell, with enough creativity, you can pretty much run it on anything...
From a technical perspective, it's essentially just some Docker containers being orchestrated by Azure IOTEdge. This means it's pretty flexible to deploy on Linux, but on Windows, it utilises Windows Subsystem for Linux (WSL) as an abstraction layer to do the same thing.
That local node then both caches and serves content to devices requesting it, and fires telemetry up to the Azure resource so you can view some neat metrics.
Best thing of all? It's free*!
*By "free", you've still got to have hardware for it to run on, but there's no cost for the Azure resource.
A Local Example In Action
What if you're weird like me and you've got an unnecessarily over-complicated home network, a VM host, and too much time on your hands?
You can deploy yourself a home MCC box, of course! Something that can actually be incredibly helpful if you're regularly spinning up VMs to test Autopilot and Intune stuff like I do.
Let's go through what I did as well as preliminary results, shall we...
Step 1: The Azure Resource
Genuinely as simple as just following the instructions. Done in a just a few minutes!
Step 2: The MCC Node
For initial testing, I decided to use my Proxmox node to create an LXC host running Ubuntu 20.04, and because this is only for me, I even under-specced it and only gave it 2 cores, 1Gb RAM and enough HDD space to cover the minimum 50Gb requirement:

From there, I simply followed the instructions to download the necessary scripts, set them as executable, and used the provided provisioning code to deploy the necessary node configuration. I had some initial trouble with Ubuntu AppArmor causing some issues but that could just be the fact I'm doing something a bit... unorthodox.
Setup takes just a few minutes and as soon as that's done, MCC is ready to cache!
Step 3: The Network Configuration
At home, I run OPNSense as my Router/Firewall which is just a personal preference, however because of the route I'm going for my DO policy, it was actually important as I needed to be able to set DHCP Options on my Trusted VLAN DHCP range.
For the config I was going with, I needed to configure DHCP Option 234 as a GUID that would be used to identify the DO Group, and Option 235 to point to the hostname of my MCC host.

Step 4: The Intune Policy
Arguably the most important part as without this, devices aren't going to know that a cache server exists, and how to use it.
The policy I ended up configuring looks like this:

This is just some tweaks to my OpenIntuneBaseline DO Policy, which in itself is derived from this excellent DO blog by Johan Arwidmark at 2Pint:
The key changes are:
- DO Cache Host - The DNS name for my MCC node. Technically not needed as I'm setting below, but just in case you mess up your DHCP Option config.
- DO Cache Host Source - Tells the client to use what's set in DHCP Option 235.
- DO Cache Server Fallback - How long to try and get content from the Cache before going out to CDN.
- DO Download Mode - Peering will cross NAT's, but be restricted by the Group ID.
- DO Group ID Source - This tells it to use our DHCP Option 234 as the Group ID.
Technically, I could skip the DHCP options and rely entirely on DNS-SD and the fallback behaviour for DOGroupIdSource but I went with what I knew. YMMV, so apply an appropriate configuration in your own environment.
Step 5: The Results!
After getting it all set up and making sure clients could hit MCC correctly, I sent one VM through Autopilot, another I fired off the 24H2 Feature Update, and did some App installs on my physical laptop. Given the small-scale nature of my tests, my cache efficiency is rubbish, but it's definitely an interesting graph!

A subsequent test (once I was confident that things had been cached) gave me even more incredible results! The below are local stats from a VM I sent through Autopilot, and then immediately got the 23H2 > 24H2 Upgrade:

The above represents an approximate 94% Bandwidth Saving!
I've personally seen MCC and DO save tens of terabytes of data from being downloaded from the internet in the space of a month in a (comparatively) small environment.
Now scale that up to hundreds or even thousands of endpoints across multiple sites, and it's very easy to see why this is an absolute game changer, and still very much valid for organisations not just using Intune for app deployments and Windows Updates, but also building cloud native devices via Autopilot.
Important Points of Note
So you're sold on the idea and want to get this out in the wild as soon as you can? Great! But let's just cover some of the things you're going to need to do. And yup, these are non-negotiable.
- Speak to your Network Team!
It feels stupid having to mention it, but you're not getting far without full cooperation from whoever runs your network. There are (again, non-negotiable) network connectivity requirements for both the devices running MCC as well as the clients utilising it. Do any of the following apply to your environment?
- Unauthenticated proxy access blocked
- SSL inspection forced on all traffic
- Inability to allow access via wildcard URLs
- VPN that doesn't allow or support split tunnelling/local breakout
If so, you're likely already having a terrible time with devices accessing cloud services, but the above will straight-up torpedo any possibility of utilising MCC.
Honestly, it's in their best interest to help you here, too.
- Speak to your Security Team!
Another thing that sounds stupid, but whatever device(s) you're running MCC on need to stay up-to-date. While this is hopefully an existing process for your Windows endpoints, a completely different security discussion then has to happen if you're allowing the use of WSL on specific devices.
Conversely, if you decide running it on Linux is easier/better, how are you keeping the host updated? Microsoft will be keeping the Azure IoT and MCC containers patched, but if you don't have an existing patch process or strategy for Linux, you'd need to make one before deploying this.
- Learn about Delivery Optimization
Connected Cache is the orchestrator and really important, but it's just a small part of the solution. From a Windows client perspective, it's ALL in the DO config, so do yourself a favour and get intimately familiar with the DO docs, the available settings and how they interact with each other before diving in:

- Identify the Right Environment Configuration
The settings I used in my local example above I took from knowledge gained by implementing the CM version of MCC with a customer, but you need to apply config that's right for your environment, not just what some guy on the internet says. If you've got hundreds of sites with hundreds of DHCP scopes, or some wacky DNS Zone configuration, you're going to need to come up with something more manageable.
I'm going to make an argument that the Connected Cache for Enterprise solution should be a near-mandatory inclusion into any organisation's shift into cloud management, and while it could well bring up some challenges caused by historic decisions, just because "that's how we've always done it" doesn't mean that's going to work moving forward.
Thanks for reading!
Comments ()