nirik :fedora: :redhat:
@nirik.fosstodon.org.ap.brid.gy
6 followers 1 following 270 posts
Sysadmin shouting at clouds. #fedora #redhat [bridged from https://fosstodon.org/@nirik on the fediverse by https://fed.brid.gy/ ]
Posts Media Videos Starter Packs
nirik.fosstodon.org.ap.brid.gy
Today here in #fedora work:
* Some meetings.
* Updated the room icons for our matrix meeting rooms. Great work from the design team!
* Dug around a bunch on hardware planning for next year and warentee stuff.
* Looked into some mail problems, unsubscribed a user possibly causing bounces.
* Filed […]
Original post on fosstodon.org
fosstodon.org
Reposted by nirik :fedora: :redhat:
mattblaze.federate.social.ap.brid.gy
Apparently theres a new wave of people joining mastodon, and with them a new wave of self-appointed cops "welcoming" them with long lists of mostly fictitious "rules" they need to follow.

This is a social media platform, with many of different ways to use it. It's fine. Mostly, just try to be […]
Original post on federate.social
federate.social
Reposted by nirik :fedora: :redhat:
lilpecan.mastodon.social.ap.brid.gy
Today I saw someone toot, in dismay, people are posting their everyday life to Mastodon when they could simply buy a diary for $1.99. I can't stop thinking about this.
What if this is all they have? What if they're isolated with or without people in their life? What if the only kind word they […]
Original post on mastodon.social
mastodon.social
Reposted by nirik :fedora: :redhat:
glyph.mastodon.social.ap.brid.gy
[uspol]

these things are simply true and good:

- trans rights are human rights.
- everyone deserves health care.
- human movement should not be a crime.

it doesn't matter if there are complex philosophical and biological questions about the abstract nature of gender. it doesn't matter if […]
Original post on mastodon.social
mastodon.social
Reposted by nirik :fedora: :redhat:
paulehoffman.infosec.exchange.ap.brid.gy
I have discovered that some reading-books-on-devices people don't now about https://standardebooks.org/ They produce well-edited, great-looking editions of free books. Of course, this means that a lot of their books are old, but they do some recent things as well. They live on donations […]
Original post on infosec.exchange
infosec.exchange
nirik.fosstodon.org.ap.brid.gy
Another saturday, another weekly recap!

This week: mailman bounce fun, small datacenter move update, mass update/reboot recap, builder adjustments, DANE updated, slim7x and Radxa Orion O6 info.

https://www.scrye.com/blogs/nirik/posts/2025/10/04/infra-weekly-recap-early-october-2025/

#fedora
infra weekly recap: Early October 2025
Another saturday, so time for another weekly recap! ## Looping mailing list bounces are not fun We had a bit of fun in the early part of the week with our mailman server alerting about having a lot of mails in the queue. Looking at it, I found that they were almost entirely bounces, but why? Well, it was a sad confluence of events: some providers send bounce emails that are almost completely useless. I'll go ahead and name names: mimecast (used by Red Hat) and ibm (their own thing I guess?). These bounces don't include the orig email, don't include headers from the orig email, don't include who the email was sent to. So, for example, say a fictional someone named Bob Smith signs up for a fedora list with [email protected]. They then leave the company and emails to them bounce with a message saying "foobar@somethinginternal" bounced. You have no way to tell who it was really unless their internal name and external name match up. mailman cannot process these bounce messages at all, so it just drops them. But it gets worse. If someone in that state was a owner of a list, and they also enabled the 'send bounce emails you cannot process to list owners' then... the email bounces, the bounce can't be processed, it sends to the list owners, where it causes another bounce, etc. ;( I managed to figure out the addresses causing the current issue, but it's frustrating. </end rant> ## rdu2-cc to rdu3 dc move The move of our rdu2 community cage hardware to rdu3 continues. I was working on network acls to pass to the networking folks this week. Hopefully I got everything at least to a working starter state. Still looking like we are going to try a november move, but I am hoping we can get in a new replacement machine before that and I can actually migrate pagure.io before the move. The rest of the hosts there are not too critical and can be down, but it would be nice to avoid downtime for pagure.io. ## mass update/reboot We did another mass update/reboot cycle this week. Wanted to get everything up and on the latest updates before going into final freeze next week. To give a bit of history here, you may wonder why we do these periodic outages instead of just making it so everything is up all the time? We may consider that again, but at least in the past there were problems with databases and a few other difficult to manage things. Of course you can definitely setup databases these days with clusters and we might try and move to that at some point now, but in the past failover and back was prone to a lot of issues. In the mean time a few hours every few months doesn't seem like a undue burden. ## some builder adjustments / additions This last week I brought on line 5 more buildhw-a64's (hardware aarch64 builders). With that done, I then adjusted our 'heavybuilder' and 'secure-boot' channels to take advantage of the new hardware. So, I think we are in pretty good shape on x86_64 and aarch64 now. On power, our power10 buildvm's seem to be doing fine to me. We are planning some changes there in coming weeks though: We are moving from a 'entire machine is kvm host' to using lpars (logical partitions). This will allow us to move 1/2 of the current builders to a second power10 chassis and perhaps increase performance. On s390x, nothing much has changed. We continue to restrict ci jobs there and I try and balance out number of builders vs specs. ## DANE update Small update for anyone who noticed or cares: I updated the ssl cert for *.fedoraproject.org last week, and finally got to updating the DANE record for it today. DANE is a way to tie a ssl cert to dns for the host. Postfix and exim at least can automatically use that to verify things, as well as a firefox extension I have that tells you if it validates or not. ## slim7x laptop news I've kept using my lenovo slim7x laptop and have switched over to mainline rawhide kernels a while back. The only missing thing that wasn't upsreamed for me was bluetooth support and since that was heading upstream, I got Fedora kernel maintainers to just include the patch. Recent merge window kernels seem to have broken something in the devicetree file for the laptop tho. It boots to a blank screen with the dtb from the kernel. Passing it an old one and it works fine. There's still work to get the devicetree files on the live media, at which time booting from usb on these just becomes a manual step of passing the right dtb, which is a great deal better than 'build your own live media with devicetree files on it'. I guess for now I'll just keep daly driving it, but the lack of webcam is kinda anoying. ## Radxa Orion O6 Picked up one of these the other day with a set of flimsy excuses: "I can use it to build kernels for the laptop" and "I can help test fedora rcs". It was also on sale at the time. :) I just installed it this morning. Pretty painless overall, just switching it to 'acpi' mode from 'devicetree' and then an anoying detour of it not liking the first usb stick I plugged in. With an older one it booted right up with the f43 workstation live and was installed a few minutes later. Will probibly do a seperate blog post with review soon. ## comments? additions? reactions? As always, comment on mastodon:
www.scrye.com
nirik.fosstodon.org.ap.brid.gy
Oh look:
curl https://kojipkgs.fedoraproject.org/compose/branched/Fedora-43-20251004.n.0/STATUS
FINISHED

(This means this prerelease fedora 43 nightly compose made everything without any optional / non blocking failures).

Hopefully that bodes well for going into final freeze next week. 😃

#fedora
nirik.fosstodon.org.ap.brid.gy
Wed in my #fedora land:
* Fixed some idempotency issues in infra ansible, but introduced another more complex problem.
* Mass update/reboot outage. Updated and rebooted pretty much everything.
* Tried to provision last 5 aarch64 builders, but ran into anoying problems like ipmi serial console […]
Original post on fosstodon.org
fosstodon.org
nirik.fosstodon.org.ap.brid.gy
Got behind a bit again. ;(
Tuesday:
* Bunch o meetings
* DC folks swapped SFP's in our power10. from 10G to 25G. Great, but outage until we could get switch updated too. 😦
* Created a eln-toolbox repo on quay.io
* Worked on network acls for Novemeber dc move
* Updated/rebooted all our signing hosts.
nirik.fosstodon.org.ap.brid.gy
Today in my #fedora land:
* Looked at large queue on mailman. Tracked down a loop (bounce -> list owners -> list owner email bounces, repeat).
* Tried to finish off vlan move in DC, but contact was out sick today. ;(
* Confirmed the 4 machines I found friday with network link down are fixed.
* […]
Original post on fosstodon.org
fosstodon.org
Reposted by nirik :fedora: :redhat:
abbra.mastodon.social.ap.brid.gy
This is now in Rawhide and is integrated in fedora-packager for FEDORAPROJECT.ORG realm. Just install fedora-packager-kerberos package in Rawhide and try kinit for your Fedora account.
nirik.fosstodon.org.ap.brid.gy
Saturday recap of my week in #fedora infra.

This post has: Logging and looping apps, some monitoring news, the case of a very anoying sporadic 503 error, and revisting our 'oncall' role.

https://www.scrye.com/blogs/nirik/posts/2025/09/27/infra-weekly-recap-end-of-sept-2025/
infra weekly recap: End of sept 2025
Another saturday, time for another weekly recap. September is almost over and time is flying by. But of course fall is the very best season, so we have that going for us. ## Logging and looping apps This week our central log server ran low on disk space again, and it was the same issue we have hit in the past: A application that processes fedora messaging messages hit one where processing caused a traceback. So, it puts the message back on the queue and retries. This results in rather a lot of logs. :( We actually have discussed some fixes for this in the past, but haven't gotten around to implementing any of them. Perhaps this latest issue can revive that work. ## Monitoring news We currently have a nagios setup and a new zabbix setup we are moving to. This week we found that nagios wasn't monitoring some of the external servers it should have been. Turns out it was some changes we made for the datacenter move, when we have 2 nagios servers and only wanted one to monitor external things, but then it didn't properly get moved to the new one after the move. So, we fixed that and then I had to fix a number of aws security groups and firewalls for those servers to get everything working again. I'm really looking forward to just finishing the switch to zabbix. I really hope we can turn nagios off before the end of the year. zabbix has a number of advantages and will get us to a nicer space. ## Really very anoying sporadic 503 errors on kojipkgs requests Some small percentage of the time, we have been seeing 503 errors from requests to our kojipkgs servers. This is most visible on builds, but also started to affect composes and other things. The setup here is somewhat complex, but a build or compose request to kojipkgs.fedoraproject.org goes to one of two internal proxies (proxy110/prox101), there apache terminates ssl and proxies to a haproxy instance. That haproxy in turn has two varnish servers behind it (kojipkgs01 and kojipkgs02). varnish on those servers caches as much as it possibly can in memory, and anything it cannot cache is requested from a local apache in front of a netapp nfs server. The problem was the haproxy -> varnish step. Sometimes haproxy would see failed health checks and disable a backend, sometimes that would happen with _both_ backends and clients would get a 503. Digging around on it I found: - Seemingly no particular pattern to when it happened - There was no change in packages or configuration on the proxy servers - Rebooting proxies or varnish servers caused no change - I could not duplicate it from any other machines than proxy servers - I could duplicate it via curl on the proxies - when it happened the proxy had connections showing it sent a SYN tcpdump would have been nice, but given the massive amount of traffic it wasn't too practical. Finally, I realized all the stuck connections were with a very high local port. Switching curl to use under 32k local ports the problem was gone. But why? I still don't know why... but just switching the proxies to use a local port range under 32k seems to have completely cleared the problem up. Networking folks say there have been no changes, I could not find anything that changed on our end either. So, builds are back to normal, and composes (that were sometimes failing and sometimes taking 2x or more time to finish) are back to normal also. ## Oncall A number of years ago in Fedora Infrastructure we setup a weekly role called 'oncall'. This person would watch chats for people asking about problems or pinging specific team members and instead triage their issue and guide them in filing a ticket or deciding if the problem was urgent enough to bother others on the team with. In this weeks infra meeting we talked about it, and while this was getting used back when we were on IRC, it doesn't really seem to be getting too much use at all on matrix. There of course could be a number of reasons for that: people realize they should just file a ticket or there just have not been too many urgent issues or people are unaware of it or just general answers to questions has been good enough people don't escalate. Anyhow, we are thinking of dropping or revamping how that works. I'm planning on starting a discussion thread on it soon... ## comments? additions? reactions? As always, comment on mastodon: https://fosstodon.org/@nirik/115277573858704803
www.scrye.com
nirik.fosstodon.org.ap.brid.gy
Today in #fedora infra land:
* Meetings: sprint planning, infra
* rebooted some things to make sure 503 tweaking yesterday was all cleared
* Fixed outage ticket for updates/reboots next week (before final freeze)
* internal paperwork
* Some bz bug catchup/builds
* renewed *.fedoraproject.org […]
Original post on fosstodon.org
fosstodon.org
nirik.fosstodon.org.ap.brid.gy
Today in my #fedora corner:
* meetings
* Most of the rest of the day trying to track down this kojipkgs 503 issue. Joined by @adamw in the afternoon.
Finally found that limiting local ports to under 32k appeared to fix it. ;(
Not clear why. Have questions for networking folks and I still need to […]
Original post on fosstodon.org
fosstodon.org
Reposted by nirik :fedora: :redhat:
fedora.fosstodon.org.ap.brid.gy
This guide provides a step-by-step walk-through for integrating a uTrust FIDO2 security key (Identiv uTrust) with Fedora 42 to secure:

* LUKS2 full disk encryption (FDE)
* Graphical login (LightDM + Cinnamon)
* Sudo elevation

Learn more […]
Original post on fosstodon.org
fosstodon.org
nirik.fosstodon.org.ap.brid.gy
Today/Second monday in my #fedora corner:
* Fixed a raft of nagios checks for aws hosts. We broke this in the dc move. oops.
* Looked over some minor anubis issues
* looked at kojipkgs sometimes returning a 503 for builds. Very weird and spent a long time digging ( […]
Original post on fosstodon.org
fosstodon.org
nirik.fosstodon.org.ap.brid.gy
Yesterday / Monday in my #fedora corner:
* blockerbug meeting (at last for a part of it)
* prepped for finishing up vlan moves at a datacenter.
* had the call with folks there, but the person who could actually make the switch changes was not there. ;( Rescheduled for next monday
* Looked at […]
Original post on fosstodon.org
fosstodon.org
Reposted by nirik :fedora: :redhat:
fedora.fosstodon.org.ap.brid.gy
Windows 10 support ends on October 14.

Help the people around continue to be protected by helping them switch to Linux. They don't need to go buy a new computer just because Microsoft made a decision for them.

Follow @Endof10 for success stories and resources for how to help people switch! […]
Original post on fosstodon.org
fosstodon.org
nirik.fosstodon.org.ap.brid.gy
Some great RPG info today at @pancakescon

Storytelling, Villians, great stuff

Got a bunch of new tabs opened. :)
nirik.fosstodon.org.ap.brid.gy
@warthog9 do it!

I bet you will find there's others that use it too...
nirik.fosstodon.org.ap.brid.gy
Saturday recap of my week in #fedora land.

This week: Fedora 43 beta released, anubis rolled out, and some lists/gmail fun.

https://www.scrye.com/blogs/nirik/posts/2025/09/20/misc-infra-bits-from-3rd-week-of-sept-2025/
Misc infra bits from 3rd week of sept 2025
Another week gone by, pretty busy, but not massively so. ## Fedora 43 Beta out Beta release was tuesday. Things went pretty smoothly from my view overall. We managed to completely peg one of our 10G connections in our main datacenter with folks syncing and downloading. Thanks to all the community who worked on things and I do hope you all enjoy it. ## Anubis rollout Last week I was testing anubis in staging, but this week I rolled it out to production in a number of places. The biggest win was pagure.io, which has been particularly hard hit by scrapers. Here's the load last week, can you tell when anubis was enabled? I then enabled things for a bunch of other sites of ours: koji, src, koschei, lists, etc. There's still 2 outstanding issues I know of: * Some folks have reported that it's giving them challenges all the time. I'm not sure why this would be happening, but hopefully we can track it down. * rss feeds on bodhi and badges aren't working anymore. I need to allow the rss feed paths that they use. In the mean time using a user-agent without Mozilla in it should work around that. So, hopefully that saves us a bunch of bandwith, cpu time, database cycles and more. Thanks anubis developers! ## Mailing list issue with gmail accounts Just a early heads up that we had a bit of an issue last night with the fedora devel list. As near as I can tell, my current theory is: * Some spam got through to the list. I'm still not 100% sure how it got though, but it was using <> and playing other weird tricks. * That spam had some bad archives attached to it. * gmail subscribers got those emails and then google decided to block lists.fedoraproject.org because of it. * lists.fedoraproject.org got the bounces from google and disabled a bunch (~340) users. I deleted the spam from archives and put in place some things on the list to hopefully ensure it doesn't allow such posts thru. Next week we will see if we can't re-enable all those users, provided google is accepting lists emails again. ## comments? additions? reactions? As always, comment on mastodon:
www.scrye.com
nirik.fosstodon.org.ap.brid.gy
Today for me in #fedora land:
* Enabled anubis for pagure.io
* Reworked the role for proxies, fixed various issues in testing in staging
* Enabled anubis in prod proxies, fixed some more thinkos and issues.
* Enabled anubis for koji, lists, and koschei. More tomorrow.