Been hitting a number of VM related bugs the last few days.
- I noticed trinity wasn’t making any forward progress, with a couple dozen child processes all busy in the kernel. Looking closer, they were all spinning in lru_add_drain_all. No idea wtf is going on with that one until I can reproduce it.
- Reproducing it is hard, because I keep hitting some migration related lockup instead, which triggers sometimes pretty quickly.
- cat’ing /proc/slab_allocators goes boom on PAGEALLOC_DEBUG kernels.
The first bug is the one that concerns me most right now, though the 2nd is feasibly something that some non-fuzzer workloads may hit too. Other than these bugs, 3.14rc6 is working pretty well for me.
Big-animal changes over the early 2013 version include:
- Some material buried in the introduction and a few key Quick Quizzes has been pulled up into a new “How To Use This Book” chapter.
- Threre is significantly more material in the “Beyond Partitioning” section.
- A new “Hazard Pointers” section has been added to the “Deferred Processing” chapter.
- The “Data Structures” chapter has been filled out.
- Formal verification has been moved from an appendix to a new chapter following the pre-existing “Validation” chapter.
- A new “Putting It All Together” chapter contains some case studies.
- New cartoons have been added and some of the old cartoons have been updated.
Interestingly enough, even before its first edition, this book has seen some use in various classrooms. The memory-barrier information and the “Counting” chapter seem to be the most popular in that environment.
So what next?
Why, the second edition, of course!!! What else? :-)
I suppose everyone has to pass through a hardware phase, and mine is now, for which I implemented a LED blinker with an AVRtiny2313. I don't think it even merits the usual blog laydown. Basically all it took was following tutorials to the letter.
For the initial project, I figured that learning gEDA would take too much, so I unleashed an inner hipster and used Fritzing. Hey, it allows to plan breadboards, so there. And well it was a learning experience and no mistake. Crashes, impossible to undo changes, UI elements outside of the screen, everything. Black magic everywhere: I could never figure out how to merge wires, dedicate a ground wire/plane, or edit labels (so all of them are incorrect in the schematic above). The biggest problem was the lack of library support together with an awful parts editor. Editing schematics in Inkscape was so painful, that I resigned to doing a piss-poor job, evident in all the crooked lines around the AVRtiny2313. I understand that Fritzing's main focus is iPad, but this is just at a level of typical outsourced Windows application.
Inkscape deserves a special mention due to the way Fritzing requires SVG files being in a particular format. If you load and edit some of those, the grouping defeats Inkscape features, so one cannot even select elements at times. And editing the raw XML cause weirdest effects, so it's not like LyX-on-TeX, edit and visualize. At least our flagship vector graphics package didn't crash.
The avr-gcc is awesome though. 100% turnkey: yum install and you're done. Same for avrdude. No huss, no fuss, everything works.
And then I realised that nouveau already has all the information that i915 wants, and maybe we could just have the Switcheroo code hand that over instead of forcing i915 to probe again. Sigh.
|Opened since 2014-02-28||12||29||9||(50)|
|Closed since 2014-02-28||8||41||8||(57)|
|Changed since 2014-02-28||18||54||16||(88)|
Weekly Fedora kernel bug statistics – March 07 2014 is a post from: codemonkey.org.uk
Looking at a review by Solly today, I saw something deeply disturbing. A simplified version that I tested follows:
import unittest class Context(object): def __init__(self): self.func = None def kill(self): self.func(31) class TextGuruMeditationMock(object): # The .run() normally is implemented in the report.Text. def run(self): return "Guru Meditation Example" @classmethod def setup_autorun(cls, ctx, dump_with=None): ctx.func = lambda *args: cls.handle_signal(dump_with, *args) @classmethod def handle_signal(cls, dump_func, *args): try: res = cls().run() except Exception: dump_func("Unable to run") else: dump_func(res) class TestSomething(unittest.TestCase): def test_dump_with(self): ctx = Context() class Writr(object): def __init__(self): self.res = '' def go(self, out): self.res += out target = Writr() TextGuruMeditationMock.setup_autorun(ctx, dump_with=target.go) ctx.kill() self.assertIn('Guru Meditation', target.res)
Okay, obviously we're setting a signal handler, which is a little lambda, which invokes the dump_with, which ... is a class method? How does it receive its self?!
I guess that the deep Python magic occurs in how the method target.go is prepared to become an argument. The only explanation I see is that Python creates some kind of activation record for this, which includes the instance (target) and the method, and that record is the object being passed down as dump_with. I knew that Python did it for scoped functions, where we have global dict, local dict, and all that good stuff. But this is different, isn't it? How does it even know that target.io belongs to target? In what part of Python spec is it described?
UPDATE: Commenters provided hints with the key idea being a "bound method" (a kind of user-defined method).
A user-defined method object combines a class, a class instance (or None) and any callable object (normally a user-defined function).
When a user-defined method object is created by retrieving a user-defined function object from a class, its im_self attribute is None and the method object is said to be unbound. When one is created by retrieving a user-defined function object from a class via one of its instances, its im_self attribute is the instance, and the method object is said to be bound.
Thanks, Josh et al.!
Spent some time chasing down what looks like a race condition in the watchdog code in trinity.
The symptom was a crash on x86-64, where it would try and decode a 32-bit syscall using the 64-bit syscall table. This segfaulted, because the 64-bit table is shorter. I stared at the code for quite a while, and adding debugging printfs at the crash site made the bug disappear. What I think was happening was that the child processes are updating two separate variables (one, a bool that says if we’re doing 32 or 64 bit calls, and two the syscall number), and the watchdog code was reading them in the middle of them being updated. I added some locking code to make sure we don’t read either value before an update is complete.
I’ve not managed to reproduce the bug since, so I’m really hoping I got it right.
Christoffer Dall lead a session today at Linaro Connect discussing standards for portable ARM virtual machines (video). About a week ago, Christoffer posted a draft specification to the linux-arm-kernel, kvm and xen mailing lists which attracted lots of useful feedback. Today we went over the major points of issue and Christopher is going to take the feedback to prepare a new draft.
Many of the issues raised boil down to how much reach the spec should have. If it specifies too much, then it will be burdensome for vendors to be compliant, but if it specifies too little then it won’t be useful for making portable disk images. Today we talked about how specific it must be on the topics of required hardware, required virtual interfaces (virtio, xenbus), firmware interface (UEFI) and hardware description (ACPI, FDT).
We also talked about the use-cases covered by this spec. For instance, while there is interest in supporting some hypothetical future version of ARM Windows as either a host or a guest, it is pointless to try and guess what requirements Microsoft will have. For now the focus is on Linux hosts running either Xen, KVM or QEMU, with guests running predominantly Linux (while still supporting any guest OS that conforms). OS vendors should be able to use the spec to design installation and update tools that will work with any compliant virtual machine.
The ARM Server Base System Architecture (SBSA) specification defines the basic requirements for ARM server hardware. Christoffer used the SBSA as a starting point, but quickly realized that the peripheral options described in the SBSA makes little sense in a virtual environment. For instance, a virtual machine can certainly emulate a SATA controller, but it can provide far better performance with an interface designed for virtualization. It was asked if the spec should specify a choice of either virtio or xenbus, but the problem with doing so is it effectively requires OSes to implement support for both in order to be compliant. This isn’t a problem for Linux guests because the kernel already has drivers for both, but it could be a problem for non-Linux guests.
Instead the choice was made to treat virtual buses in exactly the same way we treat real hardware; it is still up to the OS to include driver support for the platform it is running on. OS vendors are strongly encouraged to support both, but the spec does not require them to do so. If only one is supported then the onus is on them to list it in their own requirements.
Particular attention was given to the SBSA serial port requirement. Level 1 of the SBSA requires the platform implement a debug port which is register compatible with ARM’s pl011 UART. Ian Campbell and Stefano Stabellini from Citrix were concerned that implementing full pl011 emulation would perform poorly and would be require a lot of work to implement. However, Alexander Graf pointed out that an always available console device would eliminate a lot of the pain of failed booting without any log output. It was also pointed out that the SBSA does not actually require a full pl011 implementation. DMA and IRQ support are not necessary, which makes emulation trivial, and the virtual UART is only expected to be used during early boot scenarios. Normally console output will be reported first via the UEFI console before ExitBootServices() is called, and then via the VM’s preferred console device. At the close of the discussion we decided to require the SBSA debug port definition in the VM spec.
The requirement of UEFI for the firmware interface was mostly uncontroversial. In the earlier mailing list discussion, Dennis Gilmore did take issue with specifying UEFI over U-Boot given that UEFI is not in heavy use on 32-bit ARM. U-Boot is also making strides forward in standardizing the boot flow which would make it it more suitable for VM scenarios. Dennis is concerned that UEFI would require a lot of new effort to get working. However, that work has already been completed. There is a 32-bit port of UEFI running under QEMU, mainline GRUB includes ARM UEFI support, and merging kernel support is in progress.
None of the VM developers in the room today seemed concerned about requiring UEFI for virtual firmware, and the UEFI spec covers quite a few standard booting scenarios including, removable media, network booting, and booting from a block device. The feeling is that it is important for both 64-bit and 32-bit virtual machines to have the same behaviour and so the UEFI requirement will remain.
Deciding whether an FDT or an ACPI hardware description is required was more of an concern. Jon Masters from Red Hat has previously stated that Red Hat Enterprise Linux will only support booting with ACPI. There is concern that the specification will not be acceptable to Red Hat if it does not require ACPI. However, ACPI is still a work in progress and we don’t yet know how to implement it in a VM. Since all of the VMs already use FDT, and will continue to do so for the foreseeable future, it was decided to make FDT support mandatory in version 1 of the spec. A future version 2 will allow ACPI to be provided in addition to FDT with the expectation that an OS vendor can choose to make ACPI support mandatory for their product.
For the next steps, Christoffer is going to take all the comments from the mailing list and today’s meeting and he will post a second draft of the spec. Then after further feedback, the specification will probably get published, possibly as a Linaro whitepaper.
|Opened since 2014-02-01||15||76||32||(123)|
|Closed since 2014-02-01||33||66||29||(128)|
|Changed since 2014-02-01||33||256||36||(325)|
As part of the Lightsaber project, I’ve been looking for a low pin count way to add control since the ATTiny85 that I’m using only has 6 IO pins. For the prototype I connected a button and a potentiometer to a pin each. I’d like to have an accelerometer and another button or two, but that uses up pins pretty quickly. However, if I hang all the controls off an i2c bus, then I only need to use two IO pins.
The Wii Nunchuk just happens to be an i2c bus. It also happens to have 4 inputs built into it: 2 buttons, a 2-axis joystick and a 3 axis accelerometer. That’s pretty close to everything I want. It also aggregates reading all of those sensor inputs into a single i2c transaction which means less work for the ATTiny85 software.
Official Wii Nunchuks aren’t the cheapest things in the world. Even 8 years after the Wii was first released, a genuine Nintendo Nunchuk is £15. That blows my budget for this project. I can however order replica Nunchuks via Aliexpress for a mere £2.95 each including shipping. I ordered a lot of 5 to experiment a couple of weeks ago and they arrived today.
For such a low price I was not expecting much, and indeed, my expectation were met. They work, I’ll say that much for them, but I wouldn’t want to use them for actually playing a game. Button presses don’t always make contact and feel a bit sloppy. I’m not too worried about that though because I’m going to gut them for the electronics and throw away the plastic.
More troublesome though is that the clones don’t behave in exactly the same as an official Nintendo Nunchuk. In fact, in the lot of 5 I purchased I seem to have two different variants, each of which behaves differently. Two of them I was able to get working completely by following the instructions in this forum post. The other three are recognized by the new code, but the event reports are still encrypted. I need to do some debugging to figure out what else is needed.
Cracking the case open, it’s clear that the plastic moulding is a direct copy of the Nintendo part. The boards in both versions have exactly the same outline as the Nintendo one and all the plastic looks identical. One of them even uses the Nintendo tri-wing screws. The boards themselves look designed to be as cheap as possible. The main controller is an anonymous die-on-board held with a blob of epoxy. Soldering quality is marginal at best.
Not that any of this bothers me. If I can get it mounted inside the Lightsaber hilt then it will do the job nicely for less than it would cost to buy each button and sensor individually.
|Opened since 2014-02-21||1||23||7||(31)|
|Closed since 2014-02-21||5||25||4||(34)|
|Changed since 2014-02-21||11||232||11||(254)|
This was a relatively small release. Among the more notable changes in man-pages-3.61 are the following:
- As ever, Peng Haitao continued adding notes on thread-safety to various manual pages.
- A note from Christoph Hellwig prompted me to perform a task that has been queued for a while: merging the text of the man pages for the "directory file descriptor" APIs into their corresponding traditional pages. When the "directory file descriptor" pages were originally written (mostly in 2006), the APIs were not part of POSIX and (in most cases) were not available on other systems. So, it made some sense to wall them off into their own separate pages. Eight years later, with the APIs now all in POSIX (except scandirat()), it is much more sensible to document the newer APIs alongside their traditional counterparts, so that the newer APIs are not "hidden", and the reader can more easily see the differences between the APIs.
Thus, the text of 14 pairs of pages has been merged, and the "merged from" pages have been converted to links to the "merged to" pages. Along the way, a few other fixes were made to the pages, as noted below. The resulting merged pages are: access(2), chmod(2), chown(2), link(2), mkdir(2), mknod(2), open(2), readlink(2), rename(2), stat(2), symlink(2), unlink(2), mkfifo(3), and scandir(3).
One page that did not undergo such a change was utimensat(2), which is different enough from utime(2) that it warrants a separate page. Unlike the other pages, the utimensat(2) page was also already self-contained, rather than defining itself in terms of differences from the traditional API as the other pages did.
Last year I hacked up a small shell script to test various IO related things like “create a RAID5 array, put an XFS file system on it, create a bunch of files on it”.
Despite its crudeness, it ended up finding a bunch of kernel bugs. Unfortunately many of them were not easily reproducible, and required hours of runtime. There were also some problems with scaling the tests. Every time I wanted to add another test, or another filesystem, the overall runtime grew dramatically. Before my test box with 4 SATA disks died, it would take over 3 hours for a single run.
So I’ve been sketching up ideas for a replacement to address a number of these shortfallings.
Firstly, it’s in C. Shell was fun for coming up with an initial proof of concept, but for some things like better management of threads, it’s just not going to work. Speaking of threads, one of the reasons that the runtime was previously so long was that it never took advantage of idle disks. So if for example, I have 4 disks, and I want to run a 2 disk RAID0 stripe in one test, I should be able to launch additional threads to do something interesting with the other 2 idle disks.
The code for this is still very early, and doesn’t do much of anything yet, but it’ll show up on github at some point.
In the meantime, I’ve been trying to put together something to test on. For reasons unexplained, the quad opteron that held all my disks no longer powers up. I spent a couple hours trying to revive it with various spare parts, without luck.
Yesterday the idea occurred to me that I could just use a USB hub and a bunch of old memory sticks for now.
It would have the advantage of being easily portable while travelling. Then I rediscovered just how crap no-name chinese USB hubs are. Devices sometimes showing up, sometimes not. Devices falling off the bus. Sometimes the whole hub disappearing. Sometimes refusing to even power up. I tossed the idea. For now, I’ve got this usb-sata thing connected to an SSD. Portable, fast, and surprisingly, entirely stable.
I’ve got a bunch of other ideas for this tool beyond what the io-tests shell script did, and I suspect after next months VM/FS summit, I’ll have a load more.
To Red Hat's credit, having the CTO immediately and publicly accept responsibility and offer reparations seems like the best thing they could possibly do in the situation and demonstrates that there are members of senior management who clearly understand the importance of community collaboration to Red Hat's success. But that leaves open the question of how this happened in the first place.
Red Hat is big on collaboration. Workers get copies of the Red Hat Brand Book, an amazingly well-written description of how Red Hat depends on the wider community. New hire induction sessions stress the importance of open source and collaboration. Red Hat staff are at the heart of many vital free software projects. As far as fundamentally Getting It is concerned, Red Hat are a standard to aspire to.
Which is why something like this is somewhat unexpected. Someone in Red Hat made a deliberate choice to exclude Piston from the Summit. If the suggestion that this was because of commercial concerns is true, it's antithetical to the Red Hat Way. Piston are a contributor to upstream Openstack, just as Red Hat are. If Piston can do a better job of selling that code than Red Hat can, the lesson that Red Hat should take away is that they need to do a better job - not punish someone else for doing so.
However, it's not entirely without precedent. The most obvious example is the change to kernel packaging that happened during the RHEL 6 development cycle. Previous releases had included each individual modification that Red Hat made to the kernel as a separate patch. From RHEL 6 onward, all these patches are merged into one giant patch. This was intended to make it harder for vendors like Oracle to compete with RHEL by taking patches from upcoming RHEL point releases, backporting them to older ones and then selling that to Red Hat customers. It obviously also had the effect of hurting other distributions such as Debian who were shipping 2.6.32-based kernels - bugs that were fixed in RHEL had to be separately fixed in Debian, despite Red Hat continuing to benefit from the work Debian put into the stable 2.6.32 point releases.
It's almost three years since that argument erupted, and by and large the community seems to have accepted that the harm Oracle were doing to Red Hat (while giving almost nothing back in return) justified the change. The parallel argument in the Piston case might be that there's no reason for Red Hat to give advertising space to a company that's doing a better job of selling Red Hat's code than Red Hat are. But the two cases aren't really equal - Oracle are a massively larger vendor who take significantly more from the Linux community than they contribute back. Piston aren't.
Which brings us back to how this could have happened in the first place. The Red Hat company culture is supposed to prevent people from thinking that this kind of thing is acceptable, but in this case someone obviously did. Years of Red Hat already having strong standing in a range of open source communities may have engendered some degree of complacency and allowed some within the company to lose track of how important Red Hat's community interactions are in perpetuating that standing. This specific case may have been resolved without any further fallout, but it should really trigger an examination of whether the reality of the company culture still matches the theory. The alternative is that this kind of event becomes the norm rather than the exception, and it takes far less time to lose community goodwill than it takes to build it in the first place.
 And, in the spirit of full disclosure, a competitor to my current employer
 Furthering the spirit of full disclosure, a former employer
|Opened since 2014-02-14||0||3||23||3||(29)|
|Closed since 2014-02-14||0||10||14||7||(31)|
|Changed since 2014-02-14||0||11||40||7||(58)|
The 5-day course is intended for programmers developing system-level, embedded, or network applications for Linux and UNIX systems, or programmers porting such applications from other operating systems (e.g., Windows) to Linux or UNIX. The course is based on my book, The Linux Programming Interface (TLPI), and covers topics such as low-level file I/O; signals and timers; creating processes and executing programs; POSIX threads programming; interprocess communication (pipes, FIFOs, message queues, semaphores, shared memory), network programming (sockets), and server design.
The course has a lecture+lab format, and devotes substantial time to working on some carefully chosen programming exercises that put the "theory" into practice. Students receive a copy of TLPI, along with a 600-page course book containing the more than 1000 slides that are used in the course. A reading knowledge of C is assumed; no previous system programming experience is needed.
Some useful links for anyone interested in the course:
- course overview (includes sample course materials, course dates and locations, and prices);
- course topic list; and
- information about the trainer (i.e., me).
Some work today on trinity to rid it of some hard-coded limits on the number of child processes. Now, if you have some ridiculously overpowered machine with hundreds of processors, it should run at least one child process per thread instead of maxing out at 64 like before. (It also allows overriding the maximum number of running children with the -C parameter as always, now with no upper bound, other than memory allocation for all the arrays).
Asides from that, some digging through coverity, and some abortive attempts at cleaning up some more “big function” drivers. Some of them are such a mess they need bigger changes than simply hoisting code out into functions. There’s a point though where I start to feel uncomfortable changing them without hardware to test on.
The sole change in this release is the conversion of various man pages that contained non-ASCII characters, as well as the changelog, to UTF-8 encoding, a task completed thanks largely to some scripts provided by Peter Schiffer.
Update, 2014-02-18: turns out that a couple of section 7 pages had encoding errors added in man-pages-3.59. So, I've decided to make a quick small 3.60 release that fixes those issues, and includes a few other unrelated minor fixes.
- Last weeks cleanups to the staging/bcm driver had a neat side-effect. Dan Carpenters smatch tool started picking up some new warnings now that the functions are bite-sized enough for them to parse.
- This incentivized me enough to continue working on splitting up some of the mega-functions we have in the kernel. Not done by a stretch yet, but should have a bunch of patches for 3.15 by the end of the week.
- Finally got around to doing something about this atrocity. I haven’t really cared about reiserfs for the better part of a decade, but damn that was too ugly to live.
|Opened since 2014-02-07||0||5||16||16||(37)|
|Closed since 2014-02-07||1||4||12||14||(31)|
|Changed since 2014-02-07||0||12||32||20||(64)|