Linux maintains bugs: The real reason ifconfig on Linux is deprecated

In my third installment of FreeBSD vs Linux, I will discuss underlying reasons for why Linux moved away from ifconfig(8) to ip(8).

In the past, when people said, “Linux is a kernel, not an operating system”, I knew that was true but I always thought it was a rather pedantic criticism. Of course no one runs just the Linux kernel, you run a distribution of Linux. But after reviewing userland code, I understand the significant drawbacks to developing “just a kernel” in isolation from the rest of the system.

Lets say a userland program wants to request an object from the kernel. The kernel structure might be something like this:

struct foo {
     size_t size;
     char name[20];
     int val;
};

On POSIX systems, a typical way to communicate with the kernel is to open a file descriptor to the appropriate system and send an ioctl(1) with a pointer to where the kernel should store the responding data. FreeBSD might perform this task as follows:

struct foo x;
ioctl(fd, CMD_REQUEST_FOO, &x);

Linux should do the same and to be fair it typically does. This manifests as software source that requires the Linux kernels headers. But because userland tools are maintained independent of the kernel, and sometimes are even explicitly written to be cross-platform, they typically maintain their own copy of data structures and macros independent of the Linux source tree.

So far so good. This might even produce the exact same binary output. But what happens if the kernel structure or behavior changes? This could be due to a bug fix, an added feature or an optimization – either way, the structure may change.

On FreeBSD this is not a problem. They update the kernel and userland tools in tandem. In fact, because both the kernel and userland application are in the same source tree they can even share the same header files. For 3rd party userland applications, FreeBSD provides highly stable libraries that do all the kernel-interactions, such as lib80211(3) – its worth noting that OpenBSD and NetBSD do not have these libraries because the kernel interface itself is highly stable anyways. FreeBSD even provides a COMPAT layer in the rare cases that an older binary fails to run on modern versions of FreeBSD.

Conversely on Linux, because the kernel and the rest of the operating system are not developed in tandem, this means updating or fixing a kernel struct would almost guarantee to break a downstream application. The only to prevent this would be to conduct regular massively coordinated updates to system utilities when the kernel changes, and properly version applications for specific kernel releases. Quite a herculean endeavor. This also explains why systemtap, one of Linux’s many answers to dtrace(1), does not work on Ubuntu.

Also, Linux can never have an equivalent of a lib80211(3) because there is no single standard library set. Even for the standard C library set, Linux has Glibc, uClibC, Dietlibc, Bionic and Musl. Rather than guessing the underlying C library implementation or falling into “dependency hell“, applications default to the most low-level implementation or their requested functionality. Some tools, such as ifconfig(8), resort to just reading from the /proc filesystem.

Linux’s solution to this problem was to create a policy of never breaking userland applications. This means userland interfaces to the Linux kernel never change under any circumstances, even if they malfunction and have known bugs. That is worth reiterating. Linux maintains known bugs – and actively refuses to fix them. In fact, if you attempt to fix them, Linus will curse at you, as manifest by this email.

And this leads back to the topic. Have you ever wondered why nearly every distribution deprecated ifconfig(8), a standard networking tool dating back to classic Unix? When Linux first implemented multiple IPv4 addresses on the same physical interface, it did so by cloning the interface in software and assigning each clone a unique IPv4 address. For example, eth0 could be cloned with eth0:1, eth0:2, etc. From a programmatic perspective, eth0 still only had one IPv4 address. As time passed and developers updated the kernel, it allowed users to assign multiple IPv4 addresses directly to the same interface., bypassing the need for cloning.

But Linux’s API has not changed. It still only returns a single legacy IPv4 address per interface. An interface could have multiple IPv4 addresses but ifconfig(8) will still only report a single address. In other words, as it currently stands ifconfig(8) lies to you. I do not fully understand they did not just update ifconfig(8) – random IRC rumors say there was a failed attempt due to ifconfig(8)’s convoluted code-base. But for whatever reason, this led to the completely new tool ip(8).

By contrast, FreeBSD just updates their ifconfig(8) in tandem with any kernel updates and there were no problems. Simple.

This also explains why Linux has multiple tools for seemingly highly correlated network tasks. Rather than working together to create a consolidate tool, Linux has iw(8), iwconfig(8) and brctl(8), etc, whereas FreeBSD just has different drivers for its ifconfig(8) implementation. For the record, I think ip(8)’s syntax is cleaner than ifconfig(8)’s syntax, as the latter is a victim of IPv4 legacy syntax. If both tools worked just fine, it might be worth having ifconfig(8) for legacy scripts during a transitionary period, but making ip(8) the future. That would be perfectly fine, but it would be ideal if both tools just worked, rather than needing to abandon the tool because it is broken.

Written with love a laptop running OpenBSD 6.3.

Thoughts?

fsync(2) on FreeBSD vs Linux

Even with our modern technology, hard-disk operations tend to be the slowest and most painful part of any modern system. As such, modern operations implement buffering mechanism. In this schema, when an application calls write(2), rather than immediately performing physical disk operations, the operating stores data in a kernel buffer. When the buffer exceeds a certain amount or the when an application falls the fsync(2) system call, the kernel begins writing to the disk.

This scheme is significantly faster, perhaps most demonstrably by the massive performance differential between the GNU vs BSD yes(1), as initially noted here. Note: FreeBSD’s yes(2) has now reached parity with GNU.

So far so good. But what happens when a disk write operation fails? This could be due to a hardware or network failure, but ultimately it is not the fault of the operating system. However, the operating must properly handle the failure.

On Linux, when an application’s fsync(2) call fails, the kernel returns a disk error. However, it then clears the buffer and properly sets the buffer as “dirty” (EIO flag). When the application issues another fsync(2) and the disk succeeds, the kernel clears the error bit, and reports a successful write to the application. As such the previously failed data never hit the disk and, if discarded by the application, the data was lost.

On FreeBSD, when an application’s fsync(2) call fails, the kernel also returns an error. Similar to Linux, it also reports the error to the application. But unlike with Linux, it maintains the “dirty” bit, thus not re-writing over the kernel buffer, until the page buffer is cleared, even if the successive fsync(2) is successful. This way, the page data is not lost.

This is another example of the superiority of FreeBSD over Linux. FreeBSD can better survive a disk failure, while Linux’s implementation is fundamentally broken. In the past I have experienced Linux’s ext4 fail into read-only mode to prevent disk corruption. While that might be a fall-back mechanism, it is not a long-term solution. Instead, userland applications have to keep track of whether the kernel was successful or not. Depending on your perspective, this is a stack violation.

Additionally, any long-term solution to change the behavior of the operating system would mean all user-land applications would potentially break. Linus Torvalds has notoriously stated:

Breaking user programs simply isn't acceptable

In fact, he’s repeated this policy in more colorful language here. So you’re stuck with bad behavior.

Now consider if you want to build an operating system that will run for potentially a hundred years and produce zero errors or catch errors and properly perform exception handling. Go with FreeBSD.

Its worth noting that Illumos (Solaris) properly implements fsync(2), whereas OpenBSD and NetBSD also failed on this issue and I fully anticipate them to fix the problem.

Linux kernel code vs FreeBSD kernel code

Linux driver code contains some serious garbage. I heard this refrain, but I did not realize how bad it was until I looked at it myself. Here is just one example.

Device drivers typically read static memory, typically known as EEPROM or ROM, from the chip to identify version, hard-coded information, device capabilities, etc. These values are used throughout execution of the driver. The reading process is among the first things when the device is attached and powered on.

In the case of FreeBSD, after the kernel reads the ROM, it uses a struct pointer with all the variables pre-populated, and points it at the ROM blob data stored in memory. For example:

struct r88e_rom {
	uint8_t		reserved1[16];
	uint8_t		cck_tx_pwr[R88E_GROUP_2G];
	uint8_t		ht40_tx_pwr[R88E_GROUP_2G - 1];
	uint8_t		tx_pwr_diff;
	uint8_t		reserved2[156];
	uint8_t		channel_plan;
	uint8_t		crystalcap;
#define R88E_ROM_CRYSTALCAP_DEF		0x20

	uint8_t		thermal_meter;
	uint8_t		reserved3[6];
	uint8_t		rf_board_opt;
	uint8_t		rf_feature_opt;
	uint8_t		rf_bt_opt;
	uint8_t		version;
	uint8_t		customer_id;
	uint8_t		reserved4[3];
	uint8_t		rf_ant_opt;
	uint8_t		reserved5[6];
	uint16_t	vid;
	uint16_t	pid;
	uint8_t		usb_opt;
	uint8_t		reserved6[2];
	uint8_t		macaddr[IEEE80211_ADDR_LEN];
	uint8_t		reserved7[2];
	uint8_t		string[33];	/* "realtek 802.11n NIC" */
	uint8_t		reserved8[256];
} __packed;

_Static_assert(sizeof(struct r88e_rom) == R88E_EFUSE_MAP_LEN,
    "R88E_EFUSE_MAP_LEN must be equal to sizeof(struct r88e_rom)!");

Notice the assertion at the bottom, which ensures that the ROM struct’s size equals a pre-defined length. The code will fail to compile if this assertion is not valid. Later, the kernel will instantiate a struct pointer and point it to the ROM, stored in the variable buf, as follows:

struct r88e_rom *rom = (struct r88e_rom *)buf;

Now, rom->channel_plan is set to the correct value. Simple.

Unfortunately, this is not how the same code is written on Linux. As mentioned, the Linux driver also begins by reading the ROM blob and storing it in a value called hwinfo. But rather than creating an equivalent struct pointer, the Linux code uses offset values of the ROM on an as-needed basis. For example, the driver reads the channel_plan as follows:

rtlefuse->eeprom_version = *(u16 *)&hwinfo[params[7]];

In this example, params[7] comes from a list of ROM offsets values set in the previous calling function. (That alone made tracing difficult.) The rtlefuse->eeprom_version is now the same as FreeBSD’s rom->version. This manual process repeats for every variable in the ROM.

While that may be just annoying and require a negligible bit more CPU power, this is not be a problem if it was done all in one place. But instead, the driver reads from the hwinfo blob on a seemingly as-needed during execution. And because these as-needed instances are during normal execution, the driver reads-in the same static value from hwinfo every a simple WiFi function occurs, such as changing the channel.

Okay, but even that might not be too difficult…right? Here’s the real kicker.

Sometimes, the driver works by using incrementing offsets from the ROM blob. For example, consider at read_power_value_fromprom (in drivers/net/wireless/realtek/rtlwifi/hw.c). It initializes eeaddr as a u32 (uint32_t), then assigns it with the offset value EEPROM_TX_PWR_INX. So far so good. But then, rather than using new offsets for every successive value, it increments the eeaddr value in multiple doubly-nested for-loops. Here is a simplified version of the code:

for (rfpath = 0 ; rfpath < MAX_RF_PATH ; rfpath++) {
		/*2.4G default value*/
		for (group = 0 ; group < MAX_CHNL_GROUP_24G; group++) { pwrinfo24g->index_cck_base[rfpath][group] =
			  hwinfo[eeaddr++];
			if (pwrinfo24g->index_cck_base[rfpath][group] == 0xFF)
				pwrinfo24g->index_cck_base[rfpath][group] =
				  0x2D;
		}
}

Notice the line hwinfo[eeaddr++]! Merely reading in that variable changes the offset. Its the Heisenberg Uncertainty Principle equivalent of code. This is a cleaned-up version of the 188-line function. The actual function has 6 nested for-loops, some with if-statements, each incrementing the eeaddr parameter as they go along.

Why would anyone do it this way? You are needlessly using up the CPU, making the code difficult to follow, repeatedly reading in static values and making any minor modifications and re-ordering or re-structuring will essentially break the entire function.

And perhaps the worst offender is when 20 functions deep you are not even working with hwinfo anymore. You are working to a pointer to hwinfo that has been incremented God-knows where, with their own offsets that are near impossible to track down.

In my efforts to port this driver to FreeBSD, I literally resorted to printing out the entire ROM, manually finding the memory, and backing into the equivalent offset. Other bizarre code: I have seen if-conditions that are impossible to reach, misplaced code that should go in the previous function, code that does bits of a tasks, while another function does the entire task – so repeat code, unnecessarily repeated code, etc.

How does this make it into the Linux Kernel?

To be fair, this does not appear to be the fault of Larry Finger, who maintains this driver. This is the fault of Realtek, for vomiting this terrible driver in the first place, providing absolutely zero documentation and refusing to respond to any contact attempts.

I hope my FreeBSD port is cleaner and more performant!

Switched from Ubuntu-based to Fedora

tl;dr: Fedora’s debugging packages work, Ubuntu’s are out of date.

Linux = Linux = Linux, whether Arch or Slackware or Ubuntu or OpenSUSE or Linux from scratch as I once did (before there were instructions!). Unless and until the kernel forks and someone decides to modify the syscall table, they all use the same basic syscalls, they typically share the same basic libraries and core utilities, etc. They’re all the same.

Why did I use Ubuntu-based distributions? (Note: Not Debian) Because Ubuntu came pre-configured with all the things I did not care to learn or manually configure: ACPI, firmware, X11, a pretty WM theme, etc. I did not particularly care whether I was running Mint, Elementary, Ubuntu MATE or basic Ubuntu (except Unity…nah). As long as it did not do strange things like remove /sbin/ifconfig or have a radically different file structure than I was used to. I felt at home with knowing where the standard file paths were, and knew how to administer my machine. Their package repository was pretty solid. It had almost everything I wanted – and what little was not on it was typically available in Debian-package format. The broader Linux community effectively standardized on this package format. This is crucial. Debian’s apt and FreeBSD’s pkg are in my muscle-memory at this point.

Literally one thing pushed me over: Ubuntu’s SystemTap was broken. Utterly broken!

I got into OS-level programming, specifically, porting a Linux WiFi driver to FreeBSD. I wanted to use SystemTap, Linux’s answer to DTrace, to help understand what is going on during live execution. But SystemTap does not work on Ubuntu – at least currently.

But wait, I thought Linux = Linux = Linux and programs from 20 years ago will still work. Why does SystemTap fail?

SystemTap works by producing C code for a kernel module, compiling it and loading it into memory. Sometime ago, the kernel team changed the get_user_pages() kernel API call. This meant that any code compiled against the old function definition failed. I encountered this in the professional space when the VMWare kernel modules failed to build and I hacked it until it worked. (They think I’m a wizard now). I was on Kernel 4.10 but the version of systemtap Ubuntu used was nearly 2 years old. This meant no one from the Ubuntu team was using it.

I submitted a bug report and installed Fedora 26.

SystemTap was developed by Red Hat and was trivial to get working under Fedora. And while not every single package is available (Bitcoin, Steam thus far), there is enough that moving over was trivial. Also, they come in Cinnamon, which I prefer, with a pretty theme. And it provides a clean terminal out-of-the-box. Which I need. (I would rather use stock XFCE if their terminal was clean than fully-loaded CentOS with an ugly terminal)

dnf took a little getting used to, but a hop-over from apt. So whatever on that front….

I would be willing to try OpenSUSE again, but the latest time I did, they got rid of /sbin/ifconfig for /sbin/ip, which is unacceptable. Silly, perhaps…Does it come in Cinammon? What does it offer? Are the packages as clean and up to date? I may never know, unless another business-need arises. I do not care to run any of these “hardware” distributions, like Arch. I paid my Linux dues around kernel 2.2 on Slackware and its time to move on from that.

Thoughts?

But look, if you’re 99% of the Linux world, any specific distribution is trivial. Pick one and go with it. Unless you’re doing very specific tasks like me, it really does not matter what you use. So stop Distro Hoping!

Custom Kernel Modules for Chromebook

Note: I wrote this about a year and a half ago, but I refer to it all the time. Hopefully the instructions have not changed too much! Enjoy!

I recently purchased a Chromebook. It’s great, it symbolizes the direction the PC market should head – inexpensive, low-powered ARM processor, defense in depth resistance to malware and simple for non-technical users. And with crouton, it functions quite cleanly as a Debian-based workstation.

With its simplicity and low price, there are certain key features that are lacking in the stripped down Linux kernel that can make it frustrating for a power-user. Unfortunately, Chromium addons have not or cannot satisfy some tasks that require kernel-level functionality. Even in crouton, you may find your ability limited to the user-space. Those looking for casual additions, recompiling the kernel may seem like daunting over-kill. Instead, compiling and inserting a single module may serve as an apt alternative. In this guide, I will explain how to compile a custom kernel module to add additional functionality to your Chromebook and how to circumvent the built-in security mechanisms that prevent you from adding into the kernel-space.. This guide is specifically written for an ARM-based CPU using kernel 3.10.18 for the CIFS (SMB) module, but can be trivially ported to any other architecture, kernel and module.

Compiling the Kernel Module

As mentioned, Chromium OS is a stripped down version of Linux. Therefore, you should be able to compile and dynamically link kernel modules from the stock kernel into Chromium.

Per Google’s documentation, you must compile the kernel and modules on an x86_64 CPU, even if you will be compiling an ARM or 32-bit x86 module. This is possible thanks to GNU C Compiler’s cross-platform capability. The documentation also specifies using Ubuntu, but it worked just fine on my Debian 8 workstation.

If you have not already done so, install git, subversion and perform the basic configurations:

sudo apt-get install git subversion
git config --global user.email “name@domain.tld”
git config --global user.name "Your Name"

Google manages its various git repositories with wrapper depot_tools, a custom git wrapper. You can clone the associated git repository and set your PATH environmental variable to include the wrapper scripts as follows.

git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

export PATH=`pwd`/depot_tools:"$PATH"

Next, make a directory where your Chromium OS build will reside, download the Chromium source, and synchronize it to the latest updates. This take around 30 minutes to complete.

mkdir chromiumos && cd chromiumos
repo init -u https://chromium.googlesource.com/chromiumos/manifest.git
repo sync

Once completed, you will need to download the cross-platform SDK environment, build the dependencies and enter a chroot(1) environment. This will take another 30 minutes.

cros_sdk

Now that you are inside the chroot(1) environment, you need to specify the hardware configuration for your Chromebook device, either x86-generic, amd64-generic or arm-generic. You can determine your architecture by running uname -m on your Chromebook. For my ARM-based CPU, I did the following:

export BOARD=arm-generic

Now you must prepare the core packages associated with your board.

./setup_board --board=${BOARD}
./build_packages --board=${BOARD}

Change directory to ~/trunk/src/third_party/kernel/ and then to whichever subdirectory is associated with your kernel (ie, v3.10 for 3.10.18). You can determine your kernel version by running uname -r on your Chromebook.

Next, we will need to tell the kernel which hardware platform you are on and start with the base configuration of the kernel. A list the options of base configurations by running find ./chromeos/config. In my case I am using NVIDIA’s Tegra motherboard, which is ./chromeos/config/armel/chromeos-tegra.flavour.config, so I use chromeos-tegra as follows:

./chromeos/scripts/prepareconfig chromeos-tegra

If you are compiling for a non x86_64 CPU, set the architecture and compiler settings as follows:

export ARCH=arm
export CROSS_COMPILE=armv7a-cros-linux-gnueabi-

This next portion is the same as compiling any other kernel module. Configure the kernel by running make menuconfig

Select whichever controls you would like to install and save. Once completed you will have a .config file that corresponds to your hardware. Since we are only compiling the kernel modules, you can either run make modules to compile all kernel modules, or make fs/cifs/cifs.ko to build only a specific module. I prefer the former because your module may require other dependencies in other modules, such as with crypto/md4.ko for cifs. You can verify that the file was built for the right architecture by running file fs/cifs/cifs.ko. Great! On to inserting the module!

ChromiumOS’s Security Mechanisms

ChromeOS is the official signed release of ChromiumOS, which is what you run in developer mode. Even in developer mode, Google implemented multiple defensive mechanisms to slow down a would-be attacker from gaining access the underlying system. To protect the kernel, Google utilized the Linux Security Module (LSM), which validates files from the root partition against a list of cryptographic hash values stored in the kernel, thereby preventing an attacker from loading a malicious kernel modules. In effect, the only way to insert a kernel module is to have it stored on the root partition. But by default, the root partition is set to read-only, so you cannot simply move a file to the root partition and load it.

Therefore, we must disable the root partition verification running the following script.

sudo /usr/share/vboot/bin/make_dev_ssd.sh --remove_rootfs_verification --partitions 4

Now, reboot the machine and from ChromiumOS remount the root partition to be read-writeable, as follows:

sudo mount -o remount,rw /

From here, you should be able to simply insert the kernel module with insmod. Now, you can install
Enjoy!

FreeBSD and Linux Remote Dual Booting

The following is a quick and dirty guide on how to setup remote dual booting for FreeBSD (12.0-CURRENT) and Linux (Ubuntu 16.04). Granted, this method is slightly a hack, but it works and suits my needs.

Why remote dual-booting? I am currently developing a FreeBSD kernel module for a PCIe card. The device is supported on Linux and I am using the Linux implementation as documentation. As such, I find myself frequently rebooting into Linux to look printk() outputs, or booting into FreeBSD to test kernel code. This device is located at my house, and I typically work on it during my downtime at work.

Why not use Grub? I would have preferred Grub! But for whatever reason, Grub failed to install on FreeBSD. I do not know why, but even a very minimalistic attempt gave a non-descriptive error message.

efibootmgr? Any change I made with efibootmgr failed to survive a reboot. This is apparently a known problem. Also, this tool only exists on Linux, as FreeBSD does not seem to have an efibootmgr equivalent.

Ugh, so what do I do???

The solution I came up with was to manually swap EFI files on the EFI partition no an as-needed basis.

First, I went into the BIOS and disabled legacy BIOS booting, enabled EFI booting, and disabled secure booting.

Then, I installed Ubuntu. I had to manually create the partition tables, since by default the installer would consume the entire disk. However, this does not automatically create the EFI partition. So, you must manually create one. I set mine to 200MBs as the first partition. After installation, I booted up, mounted the /dev/sda1. I found that ubuntu had created /EFI/ubuntu/grubx64.efi and other related files. Great!

Next, I installed FreeBSD and while manually setting up the partition tables, FreeBSD auto-created an EFI partition. One already exists, so I safely deleted it, and proceeded with the rest of the install. Right before rebooting, I mounted /dev/ada0p1 (sda1 on Linux) as /boot.local/ and /dev/da0p1 as /boot.installer/. I then copied /boot.installer/EFI/BOOT/BOOTX64.EFI too /boot.local/EFI/BOOT/EFIBOOT/BOOTX64.EFI (I think I had to re-create EFI/BOOT, I’m forgetting off-hand). Then I rebooted.

When I rebooted the machine, Ubuntu still came up. This is because Ubuntu edits the EFI boot order and places ubuntu as the first partition. Ordinarily you should be able to use efibootmgr here to boot into FreeBSD and use the non-existent FreeBSD equivalent to boot back, but with the lack of that option, I mounted the EFI partition (/dev/sda1) as /boot/efi, and when I wanted boot into FreeBSD, I renamed /boot/efi/EFI/ubuntu/grubx64.efi to ubuntu.efi and then copied /boot/efi/EFI/BOOT/BOOTX64.EFI to /boot/efi/EFI/ubuntu/grubx64.efi. When I rebooted, FreeBSD came back up! Then on the FreeBSD side, I mounted /dev/sda1 to /boot/efi and did copied /boot/efi/EFI/ubuntu/ubuntu.efi to /boot/efi/EFI/ubuntu/grubx64.efi.

And that’s it! I can now remotely boot back and forth between the two systems.

Ugly? Yes. But it does the job.

Linux could fix this problem by debugging their efibootmgr utility and FreeBSD could fix this by having an efibootmgr equivalent at all.

Thoughts?

Differences between Mint and Ubuntu

I looked into the differences between Mint and Ubuntu to see which was best for me. I watched tons of videos, reviews, comparisons, ran them both for months, etc. Here’s what I learned…

They’re the same damn thing. No really, they are identical. The only differences are what software comes pre-installed and some user interface prettiness. Otherwise, no difference in the underlying system at all. Literally no difference.

Next question?