• 2 Posts
  • 55 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle
  • It’s Cannonical. They prefer implementing everything themselves fast, rather than developing a more sustainable project with the rest of the community over a longer timescale. When they do that, there will be very little buy-in from the wider community.

    Others could technically implement another snap store for their own distro, but they’d have to build a lot of the backend that Cannonical didn’t release. It’s easier to use Flatpak or AppImage or whatever rather than hitch themselves onto Cannonicals’s homegrown solution that might get abandoned down the line like Mir or Ubuntu Touch.


  • It’s Cannonical. They prefer implementing everything themselves fast, rather than developing a more sustainable project with the rest of the community over a longer timescale. It makes sense that when they do that, there will be very little buy-in from the wider community. Much like Unity and Mir.

    As you say - why would others put time into the less supported system? Better alternatives exist. If Canonical want their own software ecosystem, they’ll have to maintain it themselves. Which, based on Mir and Ubuntu Touch, they don’t have a good track record of.










  • ----BEGIN PGP SIGNATURE-----
    iQIzBAEBCgAdFiEETYf5hKIig5JX/jalu9uZGunHyUIFAmaB8YEACgkQu9uZGunH
    yUKi7Q/+OJPzHWfGPtzk53KnMJ3GC8KQGEUCzKkSKmE0ugdI9h1Lj4SkvHpKWECK
    Y1GxNujMPRM/aAS2M97AEbtYolenWzgYmO1wt131/hEG4tk+iYeB2Sfyvngbg5KI
    y4D7mqpcVWYSf6S13vUX8VuyKeTxK6xdkp95E0wPVLfJwx5o5nH0njLXxeW0IblY
    URLonem/yuBrJ6Ny3XX9+sKRKcdI9tOqhMhTxPcQySXcTx1pAG7YE7G5UqTbJxis
    wy7LbYZB5Yy0FO3CtRIkA+cclG4y2RMM9M9buHzXTWCyDuoQao68yEVh4OdqwH1U
    5AUnqdve5SiwygF/vc50Ila6VjJ4hyz1qVQnjqqD96p7CSVzVudLDDZMQZ8WvqLh
    qaFr51xJvH6p6/CP1ji4HHucbJf6BhtSqc8ID9KFfaXxjfZHiUtgsVDYMV0e7u9v
    lhcDH/3kmw/JImX25qsEsBeQyzOJsBvxOYD3lZrwSY9+7KNGVQstFrEvCuVPHr72
    BQJPIhg3+9g6m36+9Uhs1N6b8G9DsZ6OgnNqr9dGturUg6CtRsLSpqoZq0FT9cLA
    tnFTJDaXgx1DZnsLGDSoQQYjZ3vS+YYZ8jG86KGLFyXVK+uSssvorm9YR1/GGOy7
    suaxro72An+MxCczF5TIR9n3gisKvcwa8ZbdoaGd9cigyzWlYg8=
    =EgZm
    ----END PGP SIGNATURE-----
    

  • It’s a proprietary config file. I think it’s a list of rules to forbid certain behaviours on the system. Presumably it’s downloaded by some userland service, but it has to be parsed by the kernel driver. I think the files get loaded ok but the driver crashes when iterating over an array of pointers. Possibly these are the rules and some have uninitialised pointers but this is speculation based on some kernel dumps on twitter. So the bug probably existed in the kernel driver for quite a while, but they pushed a (somehow) malformed config file that triggered the crash.


  • For this Channel File, yes. I don’t know what the failure rate is - this article mentions 40-70%, but there could well be a lot of variance between different companies’ machines.

    The driver has presumably had this bug for some time, but they’ve never had a channel file trigger it before. I can’t find any good information on how they deploy these channel files other than that they push several changes per day. One would hope these are always run by a diverse set of test machines to validate there’s no impact to functionality but only they know the procedure there. It might vary based on how urgent a mitigation is or how invasive it’ll be - though they could just be winging it. It’d be interesting to find out exactly how this all went down.





  • It’s not that clear cut a problem. There seems to be two elements; the kernel driver had a memory safety bug; and a definitions file was deployed incorrectly, triggering the bug. The kernel driver definitely deserves a lot of scrutiny and static analysis should have told them this bug existed. The live updates are a bit different since this is a real-time response system. If malware starts actively exploiting a software vulnerability, they can’t wait for distribution maintainers to package their mitigation - they have to be deployed ASAP. They certainly should roll-out definitions progressively and monitor for anything anomalous but it has to be quick or the malware could beat them to it.

    This is more a code safety issue than CI/CD strategy. The bug was in the driver all along, but it had never been triggered before so it passed the tests and got rolled out to everyone. Critical code like this ought to be written in memory safe languages like Rust.



  • This doesn’t really answer my question but Crowdstrike do explain a bit here: https://www.crowdstrike.com/blog/technical-details-on-todays-outage/

    These channel files are configuration for the driver and are pushed several times a day. It seems the driver can take a page fault if certain conditions are met. A mistake in a config file triggered this condition and put a lot of machines into a BSOD bootloop.

    I think it makes sense that this was a preexisting bug in the driver which was triggered by an erroneous config. What I still don’t know is if these channel updates have a staged deployment (presumably driver updates do), and what fraction of machines that got the bad update actually had a BSOD.

    Anyway, they should rewrite it in Rust.