r/sysadmin 11h ago

You have to be joking Microsoft

1.2k Upvotes

Is the move to full cloud even worth it anymore? These constant outages is making me think I should just stick to my hybrid setup


r/linux 13h ago

Distro News Debian Urgently Seeks Volunteers After Data Protection Team Resigns

Thumbnail linuxiac.com
1.0k Upvotes

r/kubernetes 1h ago

Help choosing a distributed storage solution

Upvotes

I’m running a small 3 node cluster using mini PCs for my home lab for things like Nextcloud, databases, and other services that require persistent storage. Currently everything is creating persistent claims on my main NAS via NFS but too many times I’ve had unexpected downtime because the NAS decided to break. I’m wanting to replicate identical data across drives in my cluster for high availability and redundancy. What would be the best way to handle this?

All three are equipped with a i5-7500, 32Gi RAM, 256 NVMe drive, a 1T SATA SSD intended to be the replicated disk, and connected to a 1Gbe switch as they don’t have any faster NICs installed. I’ve looked into Longhorn and Ceph but both highly recommend 10Gbe but tha is not possible for me. I’ve looked at Minio/Garage but that would only allow S3 which feels limiting (though I don’t have a lot of experience with object storage so I may be naive in my thinking)


r/sysadmin 5h ago

Don't forget to request SLA compensation for today's 365 outage

238 Upvotes

Today’s outage, if it affected you, should have lasted long enough to qualify for an SLA payout. Make sure you look up how to submit a claim. It may not be worth the effort if you are a small direct customer, but if you purchase through a VAR or CSP, they should handle most of the process for you. Typically, you will only need to provide specifics that Microsoft requires, such as the start time, end time, and the number of licenses affected.

Microsoft can be inconsistent with the compensation amounts. We have received some significant refunds for past outages, as well as a few that were honestly quite insulting.


r/sysadmin 10h ago

Rant I Feel Like Nobody Knows Anything Anymore

640 Upvotes

I'm a relatively new sysadmin. Been in my current role for a few years, worked my way up from call center helpdesk to desktop support and now here. Even got myself a promotion to a higher grade sysadmin on my team. I'm at a stage in my career where I can generally work independently, but I still do need some mentorship and guidance, especially with niche applications and systems.

There is nobody. I'm expected to fly solo in a world where all the search engines are broken, every application either has or is pretending to have some bullshit LLM thing slapped on top of it, MS's documentation and infrastructure is total garbage, and every learning opportunity is a sales pitch or an outright grift. I spend 60-70% of my day just trying to figure out how to do the simplest things with broken tools. Workarounds piled on top of workarounds.

Couple that with all the outages in the past year, and I feel like I'm in the wrong career. Many days, it just feels like the whole tech world has lost its goddamn mind. Does anybody actually know how to write any software anymore? Does anybody actually know how to wire up a network anymore? Does anybody actually know how to do ANYTHING??

I go to get official MS-developed stuff off Github and find codebases riddled with vibe-coded nonsense, nonsensical documentation full of typos. I try to wrestle Intune into shape, try to get our environment squared away for Win11, and I feel like I'm fighting my tools more than anything else. Nothing works anymore. Nobody knows what they're doing. It's all coming down.

I make good money to do what I do, but man this is a frustrating, extremely stressful career. I feel like I spend all my time in pointless meetings with people who don't know what they're talking about, and there is no higher authority I can appeal to, no-one I can ask for help. Things fall apart and the center cannot hold.

Cheers


r/kubernetes 13h ago

How do you handle orphaned ConfigMaps and Secrets without breaking prod?

14 Upvotes

I'm doing some spring cleaning on our clusters and seeing tons of ConfigMaps and Secrets that look unused, but I'm paranoid about deleting them.

You know the deal- teams refactor, Helm releases get abandoned, but the old configs stick around because kubectl apply doesn't prune them automatically. Since K8s garbage collection only works if ownerReferences are set (which we often miss), they just pile up.

How are you guys handling this?

  • Manual cleanup? (Sounds like a nightmare)
  • Custom scripts? (Grepping for references in all manifests?)
  • Just let them rot? (Storage is cheap, right?)

I'm specifically worried about edge cases like secrets used in Ingress TLS or imagePullSecrets that are harder to track down than standard volume mounts.​

Anyone have a solid workflow for this that doesn't involve "scream testing" (delete and wait for someone to complain)?


r/sysadmin 11h ago

General Discussion The "Green Dashboard" is gaslighting my entire department

466 Upvotes

It’s happening again.

Tickets are flooding in. "Outlook isn't syncing." "Teams messages are failing." My phone is vibrating off the desk.

I check the Microsoft Service Health Dashboard.

There is nothing more infuriating than having to tell 500 panicked users (and my boss) that "Yes, it is broken," while the vendor insists everything is fine. I finally dug up the advisory MO1221364 buried in the admin center, blaming a "third-party networking issue" (classic).

Can we talk about the emotional toll of this? We are the ones on the front lines taking the heat, while the dashboard stays green for 4 hours to protect their SLA credits.

How many of you are currently staring at a "Healthy" dashboard while your infrastructure burns?


r/sysadmin 12h ago

Microsoft Latest MS update: "We're continuing to review what actions are required to restore the affected infrastructure to a heathy [sic] state and rebalance the service traffic to achieve recovery." -Ruh roh

432 Upvotes

The kind of thing you say when you have no idea what's going on.


r/kubernetes 1h ago

What is the best way to reduce inherited dependencies in Kubernetes workloads?

Upvotes

Our Kubernetes deployments often inherit dozens, sometimes hundreds of unnecessary packages from base images. These increase vulnerability exposure, create bloated images and make debugging runtime issues a nightmare. We try pruning, but its tricky to know which system libraries or language runtimes are safe to remove.

Do you build minimal images from prune existing ones? How do you ensure compatibility with Kubernetes tools and sidecars and keeping the attack surface low?


r/sysadmin 13h ago

Microsoft 365 Exchange down?

501 Upvotes

Cant send or recieve any emails all the sudden are they down?


r/kubernetes 23h ago

Making and Scaling a Game Server in Kubernetes using Agones

Thumbnail
noe-t.dev
41 Upvotes

Hi everyone. I just wrote an article about using Agones, a Kubernetes framework for running and orchestrating game servers. This is my first time writing a blog article, and I’d really appreciate any feedback or advice you might have.

In this article, I go over the development of a basic game in Go, its integration with Agones, building a matchmaking service also in Go and deploying everything with autoscaling based on player activity.

Also, since this has become an issue on this subreddit recently, I just want to clarify that this article is not AI-generated slop but very much human-made slop 😅. Which might be worse given English is not my first language but I hope you’ll still enjoy it.


r/kubernetes 11h ago

[Help] with with K3S + Traefik + Gateway API + TCP/UDPRoutes

3 Upvotes

Hi all,

I am playing with K3S to try and learn a bit of Kubernetes. Have set up a Fedora VM with K3S, and as per recent docs I am trying to set up the Gateway API, which is supposed to replace Ingress.

K3S comes with Traefik installed via Helm, and as per their docs "you should customize Traefik by creating an additional HelmChartConfig manifest in /var/lib/rancher/k3s/server/manifests". Following Traefik's docs, I created such a file to enable the Gateway API, disable Ingress, and then enable Traefik's dashboard and create an HTTPRoute for it:

https://paste-bin.org/deahjffpii

This is working perfectly fine, and I can access Traefik's dashboard by browsing to https://traefik.k3s.local.

Now, I want to be able to create not only HTTPRoutes but also TCPRoutes and UDPRoutes, as I am trying to set up Syncthing as a deployment in the environment.

Traefik mentions to add the "experimentalChannel" to support TCPRoutes and UDPRoutes, as per the documentation at: https://doc.traefik.io/traefik-hub/api-gateway/reference/install/ref-helm. Looking at the version of Traefik installed (37.1.1), these are the values that can be used to customize the Chart: https://github.com/k3s-io/k3s-charts/blob/main/charts/traefik/37.1.1%2Bup37.1.0/values.yaml. There there is a reference to that "experimentalChannel" setting as well. So, I just added that to the previous HelmChartConfig file:

[...]
# Enable Gateway API and disable Ingress
    providers:
      kubernetesGateway:
        enabled: true
        experimentalChannel: true
      kubernetesIngress:
        enabled: false
      kubernetesCRD:
        enabled: true
[...]

Helm reloads Traefik just fine, but when I try to create a TCPRoute or UDPRoute, I keep getting this error:

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: [resource mapping not found for name: "syncthing-tcp" namespace: "syncthing" from "": no matches for kind "TCPRoute" in version "gateway.networking.k8s.io/v1alpha2"
ensure CRDs are installed first, resource mapping not found for name: "syncthing-udp" namespace: "syncthing" from "": no matches for kind "UDPRoute" in version "gateway.networking.k8s.io/v1alpha2"
ensure CRDs are installed first, resource mapping not found for name: "syncthing-discovery" namespace: "syncthing" from "": no matches for kind "UDPRoute" in version "gateway.networking.k8s.io/v1alpha2"
ensure CRDs are installed first]
helm.go:92: 2026-01-22 18:07:48.516328647 +0100 CET m=+0.768768674 [debug] [resource mapping not found for name: "syncthing-tcp" namespace: "syncthing" from "": no matches for kind "TCPRoute" in version "gateway.networking.k8s.io/v1alpha2"
ensure CRDs are installed first, resource mapping not found for name: "syncthing-udp" namespace: "syncthing" from "": no matches for kind "UDPRoute" in version "gateway.networking.k8s.io/v1alpha2"
ensure CRDs are installed first, resource mapping not found for name: "syncthing-discovery" namespace: "syncthing" from "": no matches for kind "UDPRoute" in version "gateway.networking.k8s.io/v1alpha2"
ensure CRDs are installed first]
unable to build kubernetes objects from release manifest

I have tried many things, but nothing seems to work. I don't want to mess up with how K3S installs Traefik, but not sure what to try. Any ideas?!

Cheers


r/sysadmin 2h ago

Microsoft back online. Excuse: too many servers were shut down during maintenance.

29 Upvotes

Preliminary root cause: We identified that the issue was caused by elevated service load resulting from reduced capacity during maintenance for a subset of North America hosted infrastructure.”

For 9 and a half hours? You can’t shift the traffic to another region? You can’t abort the maintenance and turn it back on? This smells fishy….


r/linux 13h ago

Software Release I wrote a configurable browser launcher.

Post image
147 Upvotes

More than a pretty launcher, Switchyard lets you configure websites to open in a given browser based on domain matches, patterns, and regular expressions. It’s inspired by apps like Choosy on the Mac.

Find it on Flathub: https://flathub.org/en/apps/io.github.alyraffauf.Switchyard

Or GitHub: https://github.com/alyraffauf/switchyard


r/sysadmin 6h ago

They actually labelled them false positive

52 Upvotes

LMAO! Microsoft had the balls to label the exchange, teams issues today as false positive!

WOW. that's craziness.


r/sysadmin 9h ago

Is anyone back up yet?

83 Upvotes

Microsoft 364 Service Health says they're deploying mitigations and monitoring... but I haven't seen any change yet. Not a single external email is coming in.

Is anyone else getting anything yet?

Edit: we're good now.


r/sysadmin 14h ago

General Discussion Does anyone have a user with an extreme setup that you don't even know where to start with?

206 Upvotes

So I have a user that was having Outlook issues, They hit the toggle to go over to New Outlook to see if it would fix it (it did ironically enough) but it wouldn't show all their folders.

They hit me up and asked about it. I saw there was a show more folders button at the bottom of the list and hit it. I get a warning about a 10,000 folder limit, and that if you proceed, it will show all your folders, but in Alphabetical order.

I queried his mailbox and this user had close to 15,000 folders just in their main Inbox. WHY? I don't know.

Mind you this user has Auto Archive turned on for anything older than 2 years so its not like he has a treasure trove of old emails.

So I told him if he wanted to use New Outlook, his folders would have to be in alphabetical order. He then asks if we could schedule a meeting to discuss what that meant. I just swapped him back to Classic and the issue he was apparently having was gone, and he was good.

Eventually, he will have to deal with his monstrosity of a folder structure at some point, but not today, thankfully.

So ya, anyone have a crazy user experience?

EDIT - I know not related to IT but this particular user is a flat-earther. Make of that what you will.


r/linux 19h ago

Alternative OS 30 years of ReactOS

Thumbnail reactos.org
327 Upvotes

r/sysadmin 12h ago

Widespread Connectivity Issues? M365 Admin, Exchange Online PS, and GitHub Actions

119 Upvotes

Is anyone else seeing major instability across the Microsoft stack right now?

I'm currently experiencing:

  • M365 Admin Center: Pages are only partially loading or timing out completely.
  • Exchange Online: Cannot establish a session via PowerShell (Connect-ExchangeOnline fails).
  • GitHub Actions: Significant delays in workflow runs; jobs are queuing for much longer than normal.

It seems like a broader connectivity issue affecting multiple services. I haven't seen an official MO post in the health dashboard yet because the dashboard itself is barely loading.

Can anyone confirm if they are seeing similar behavior?


r/sysadmin 12h ago

What is going on lately

98 Upvotes

Cloudflare going out last year, AWS and azure maybe couple months ago. Verizon last week. This is worst than Y2K..


r/kubernetes 21h ago

Helm + container images across clusters... need better options

8 Upvotes

Running container images via Helm across clusters is a mess. Every small change in image or values can break stuff. Charts get messy fast. Env overrides, tags, versions all pile up. i tried Chainguard for auditing and building images but it feels heavy and rigid for our setup. Any sug for something lighter or more flexible that works at scale? Workflows, tools, whatever. Need ideas.


r/sysadmin 17h ago

Question Do you permit selling or giving old equipment to employees?

219 Upvotes

Do you or your company permit giving/selling old equipment to employee's?
When I started at my current employer, the tech at my site would give old but usable equipment to employees.
However my supervisor changed the policy to no longer allow this and I had to deal with people insisting that I give them old equipment for home use.
The policy had changed because some old voip phones that were being disposed of showed up on FB Marketplace with the company logo visible in the pictures.


r/linux 19h ago

Discussion Prominent Intel Compiler Engineer Heads Off To AMD

Thumbnail phoronix.com
222 Upvotes

r/kubernetes 18h ago

Announcing the Checkpoint/Restore Working Group

Thumbnail kubernetes.io
4 Upvotes

This new Kubernetes working group will focus on the Checkpoint/Restore in Userspace ecosystem, including the CRIU itself and related tools (checkpointctl, criu-coordinator, checkpoint-restore-operator).


r/sysadmin 9h ago

End-user Support Lol. It feels good to punt the IT help tickets back to "pending" cause not my problem

48 Upvotes

We use slack more then email nowadays at my state gov workspace, so I'm just telling people "go look at https://status.cloud.microsoft/" and see you tomorrow cause nothing we (local IT) can do about it and I'm not salary to even care after hours.