Problems getting stack traces from a Python program (Kubuntu 12.10 development version)

I’m trying to get to the bottom of this bug on Launchpad which completely breaks Synaptic touchpad configuration under KDE:

https://bugs.launchpad.net/ubuntu/+source/synaptiks/+bug/1039261

The tl;dr version is that the Python interpreter is somehow emitting two calls to the Xorg libXi function XIQueryVersion(), the first call sends a client XInput version number of 2.1 and then the second one sends 2.0 (seen using xtrace).

The second call causes a BadValue error, because you’re not meant to send a lower value on any later calls (as can be seen from this Xorg libXi git commit).

This causes the comical error:

The version of the XInput extension installed on your system is too old. Version 2.0 was found, but at least version 2.0 is required

The problem is that the Python code only has the second call sending the 2.0 version number, there is no other call in the package that will send anything else, let alone the 2.1 value.

So I want to generate a call trace every time the XIQueryVersion() function is called, but I’m struggling to get it to work.

The killer at the moment is that both ltrace and gdb (when told to trace children) hang when python runs dash to run ldconfig.real and that blocks – so I never get to the point where the function gets called the first time.

With GDB I’m using:

set detach-on-fork off
set follow-fork-mode child
set follow-exec-mode new
catch load /libXi/
break XIQueryVersion

…and this is what happens:

chris@chris-ultralap:~/Code/Ubuntu$ gdb /usr/bin/python
GNU gdb (GDB) 7.5-ubuntu
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
...
Reading symbols from /usr/bin/python...Reading symbols from /usr/lib/debug/usr/bin/python2.7...done.
done.
(gdb) set detach-on-fork off
(gdb) set follow-fork-mode child
(gdb) set follow-exec-mode new
(gdb) catch load /libXi/
Catchpoint 1 (load)
(gdb) break XIQueryVersion
Function "XIQueryVersion" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (XIQueryVersion) pending.
(gdb) run /usr/bin/synaptiks
Starting program: /usr/bin/python /usr/bin/synaptiks
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New process 3788]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Thread 0x7ffff6ccc700 (LWP 3788) is executing new program: /bin/dash
[New process 3789]
process 3789 is executing new program: /bin/dash
process 3789 is executing new program: /sbin/ldconfig.real

…and there it hangs, forever. We never even get to the point where the Python interpreter loads libXi.so, let alone calls the function. :-(

Any ideas?

Patch for Modules to use shell functions with BASH, not aliases

Whilst the Modules system is awesome in making life easy to maintain multiple versions of packages and their dependencies (and is heavily used in HPC centres like VLSCI) it can have some annoyances (and seems to be fairly half-heartedly maintained looking at the bugtracker on SourceForge). One thing that’s bitten us from time to time is that you can’t really use its “set-alias” functionality as the bash shell does not expand aliases in non-interactive shells and that includes jobs that are launched from an HPC queuing system like Torque, PBSPro, etc.

It does have the compile time option “--disable-shell-alias” but annoyingly the condition is only applied when your shell is “sh“, not “bash“, so I’ve ended up having to patch Modules to make this work for bash as well. This patch is against 3.2.9c:

--- utility.c.orig      2011-11-29 08:27:13.000000000 +1100
+++ utility.c   2012-05-16 15:08:34.012038000 +1000
@@ -1422,7 +1422,7 @@
         **  Shells supporting extended bourne shell syntax ....
         **/
        if( (!strcmp( shell_name, "sh") && bourne_alias)
-               ||  !strcmp( shell_name, "bash")
+               || ( !strcmp( shell_name, "bash") && bourne_alias )
                 ||  !strcmp( shell_name, "zsh" )
                 ||  !strcmp( shell_name, "ksh")) {
            /**
@@ -1471,7 +1471,7 @@
 
            fprintf( aliasfile, "'%c", alias_separator);
 
-        } else if( !strcmp( shell_name, "sh")
+        } else if( ( !strcmp( shell_name, "sh") || !strcmp( shell_name, "bash") )
                &&   bourne_funcs) {
        /**

Hopefully this patch will be of use to people..

The ZaTab from @ZaReason – a fully open source Android (or whatever you want to load) tablet (UPDATED)

ZaReason are a US company who only make Linux based computers and have recently been tweeting about building a completely open source tablet device, shipping with Android but unlocked so you can install whatever you would wish on it. They have even been working with the Software Freedom Conservancy to ensure that it passes muster as an open source device.

ZaReason ZaTab

However, other than some photos of it on Twitter details have been a little lacking, but now the ZaReason CEO (who is in New Zealand working on setting up a store there) tweeted the URL for pre-orders which includes the details about it:

  • Pure Android
  • Allwinner A10 SoC
  • 9.7″ IPS 1024×768 display
  • 5 point capacitive touchscreen
  • 16 GB internal storage + microSD for additional storage
  • 1 GB ram
  • b/g/n WiFi
  • Front and Back cameras
  • Sturdy metal back
  • High-capacity 8000 mAh battery
  • Ultra-light 630 grams

Ports:

  • Headphone
  • microSD card slot
  • mini-HDMI video out
  • 2x micro-USB ports

The device is shipping with CyanogenMod 9 (so based on Google’s Android Open Source Programme – AOSP – Ice Cream Sandwich release) and yes, it has root access available (CyanogenMod 9 doesn’t enable it by default, but it is just a configuration option). I would suspect this means it won’t ship with the Google Apps package which are not open source and so you won’t have access to the Google Play Store (formerly the Android Market), but you could still access the F-Droid open source application repository from it and should you feel the need for the proprietary Google Apps then you could reflash CyanogenMod9 with the Google Apps package available from their site. Most importantly it ships with all the source code.

The Socket on a Chip (SoC) is the Allwinner A10 which has a 1GHz ARM Coretex A8 and a MALI 400 MP1 GPU. Whilst the GPU manufacturer releases GPL driver code you apparently need a proprietary DDK to be able to produce a functional driver and so the Lima project has been born to create a fully open driver. Quite how the ZaReason people are dealing with this is unclear, if they are really shipping a fully open tablet then perhaps the Lima driver is a lot more stable than their project website claims. :-) (Clarified below in the update)

They’ve also promised the boot loader is unlocked so you can put whatever software you should wish to try out on it, on Twitter they said that they intend to try and get KDE going on it (presumably the KDE Plasma Active project, possibly using Mer as the supporting distribution). There’s even been a tongue-in-cheek reference to Gentoo.. :-)

Update:

Two updates on the ZaTab:

  • firstly that the pre-orders they are taking are limited to FOSS people, they want you to contact them via email as part of the process (they have concerns about fraud).
  • Secondly on the openness and GPU driver, they have confirmed that “Initially there may be some binary blobs. Lima isn’t far enough along at this time, but we have high hopes for it”.

Xorg.conf for multihead on Dell Lattitude Z600

I spent a little time recently creating a minimal xorg.conf file to get multi-display working on my work Dell Latitude Z600, such that it sets up an external HP monitor (via the Display Port connector DP1) to the right of the laptop display (LVDS1), with both display and monitor running at their native resolutions. Took a little fiddling to get it right, but this works for me (of course you’d want to adjust the PreferredMode to suit your equipment). Without this it defaults to cloning the main display and running both at 1280×1024, which isn’t very nice.

Section "Screen"
        Identifier "Screen0"
        Monitor "LVDS1"
EndSection

Section "Screen"
        Identifier "Screen1"
        Monitor "DP1"
EndSection

Section "Monitor"
        Identifier      "LVDS1"
        Option  "Primary"       "true"
        Option  "PreferredMode" "1600x900"
EndSection

Section "Monitor"
        Identifier      "DP1"
        Option  "Primary"       "false"
        Option  "RightOf"       "LVDS1"
        Option  "PreferredMode" "1920x1080"
EndSection

Section "Device"
        Identifier "Intel0"
        Option "Monitor-LVDS1" "LVDS1"
        Option "Monitor-DP1" "DP1"
EndSection

NB: Whilst the xorg.conf does set the primary display to the laptop screen (LVDS1) either KDE or X itself continues to use the external display (DP1) as the primary. However, I can change that in my KDE settings and that then persists across logins.

A Week or so with the Samsung Galaxy Nexus

After a couple of good years with my Nokia N900 I’ve come to the sad conclusion that there’s no future for that platform due to the combined actions of Nokia and Intel – Nokia for dumping Linux and going with Windows Mobile for their smart phones after getting a new CEO (ex-Microsoft) and then Intel through dumping Meego and setting up a partnership with Samsung for yet another mobile Linux platform called Tizen (which at least went for the code first, hype second path, unlike Meego). Intel are now on their third mobile Linux project as there was their Moblin project which was merged with Nokia’s Maemo to form Meego (announced less than 2 years ago) so they have form here as a serial abandoner.

Looking at what is left in the mobile space it was really a no-brainer as neither Windows Mobile nor Apple’s iOS appealed at all, so it had to be an Android phone. The timing was pretty good as Samsung and Google had just started shipping their jointly designed Galaxy Nexus with Android 4 (aka Ice Cream Sandwich or ICS). It has the advantage of apparently being a phone recommended for the AOSP (Android Open Source Program) should I feel the need once my warranty expires – though I can’t find a reference to that now! I ordered an unlocked Galaxy Nexus with 2 year warranty from Mobicity as I didn’t fancy the rubbish that carriers tend to put onto their phones, nor get handcuffed into a contract I didn’t want. As an added bonus Mobicity let you pick from 3 optional accessories for free – I picked the screen protector (the other were either a charger or a bluetooth headset from memory).

As an amusing aside I did try and see if Dick Smith Electronics would price match with Kogan for the Galaxy Nexus as Kogan was far cheaper and DSE was only selling them online, but with a manufacturers warranty (unlike Mobicity or Kogan). Unfortunately DSE declined to do so on the grounds that Kogan didn’t have a physical retail outlet which was a bit rich given that neither does DSE for these phones. But then I found out they are now owned by Woolworths and so I didn’t fancy supporting the largest owners of poker machines in Australia.

Despite the best efforts of UPS (who said it would take 6 days to cross Melbourne having taken 24 hours from Hong Kong – it actually arrived the following morning) I received it intact and on time.

Samsung / Google Galaxy Nexus

I’ve now been playing with it, er, using it in anger for over a week now and so far I’m very happy. I’d have to say the best description of the overall experience is “smooth”. Android 4 seems light years ahead of Android 2.3.3 on my wifes Huaewei Sonic, though part of that will be the fact that it’s just a much more capable phone with a larger screen and much more powerful processor.

Good bits:

  • Auto-language select – it started up in Chinese characters but before I could really wonder how I’d fix that it detected it had an Australian SIM in it and autoconfigured the locale to match.
  • No extra cruft – I’ve not spotted any “extras” from Samsung on the phone – the Market is the standard Android Market, etc.
  • Good size screen – the phone feels much smaller in the pocket than my old N900 due to its narrowness despite it having a much wider screen.
  • Android Market – heaps of apps, though the usual criticism of it not being easy to search for open source applications applies here.
  • Camera – it’s “only” 5 megapixels, but it’s still pretty good (though I’ve not yet figured out how to turn the flash off).
  • NFC – OK, a little bit of a toy at the moment, but there are a couple of apps that will read it and confirm that the reason my Myki and Uni ID card interfere is that they’re the same type of technology and so interfere with each other. As do my credit card and my bank card (same tech again).
  • Compass – my N900 had GPS and accelerometers (as does the Galaxy Nexus of course) but the compass allows neat things like Google Sky where you can just point your phone at the sky and have it show you a labelled view of planets stars and constellations.
  • IPv6 works on Wifi – I know people say IPv6 has worked on Wifi since Android 2.2, but it certainly doesn’t on my wifes Android 2.3 phone. But the Galaxy Nexus seems quite happy on my home network with native dual stack IPv6 courtesy of Internode.

Of course nothing is ever perfect, so here’s my feelings on the bad bits:

  • No real keyboard – I really miss the N900′s physical keyboard, it made typing easy. The on-screen keyboard that Android has is good, and quite usable for SMS, Twitter, etc, but for things like the Connectbot SSH client you can’t beat a real keyboard
  • No NTP synchronisation possible – you can get root on the phone (and void your warranty) but this *really* shouldn’t be necessary!
  • NITZ sucks – whilst it gets the time right the timezone is out by an hour. Probably a carrier issue but I don’t think phones should be relying on it. Had to set it by hand to fix it up.
  • Short notification sounds – a minor nit but the default notification sounds that are used for things like SMS, etc, are really short and quite easy to miss.
  • Not entirely open source – whilst the N900 wasn’t either it does seem to have been more open than Android, and it didn’t try and avoid GPL code at all costs like Android does.
  • No update to Android 4.0.2 available (yet) – so far it appears that Samsung haven’t pushed an Android 4.0.2 update to the region my phone was intended for – though other Galaxy Nexus owners around the world have reported getting updates at other times (including someone at Mobicity where I bought it). I suspect this is just an organisational delay and nothing more serious, but it is annoying. If it wasn’t for the warranty issue I’d consider reflashing the phone with the stock Google firmware for the Galaxy Nexus and pick the updates up directly from them in future.

To finish it off here are three images taken with the camera in the Samsung Galaxy Nexus (as I said I was happy with it), the first one was used on the weather slot as a background by the ABC News people last week!

Melbourne summer morning Swanston St Skyline The Light Side and the Dark Side

Vacation 1.2.7.1 Released

Vacation 1.2.7.1 is a bug fix only release which now complies with RFC-3834 “Recommendations for Automatic Responses to Electronic Mail”. A big shout of thanks to Dr. Tilmann Bubeck, the Fedora packager, for bug fixes and a German translation of the manual page.

You can download this latest version of Vacation from: http://sourceforge.net/projects/vacation/files/vacation/1.2.7.1/

It includes:

  • a fix from Dr. Tilmann Bubeck to stop Vacation from munging the GECOS information of users and instead pass it quoted to the MTA for it to deal with (fixes Fedora bug #553505 and SourceForge issue #2928189).
  • Vacation now adds the Auto-Submitted: header as per RFC3834 (fix from Dr. Tilmann Bubeck).
  • Vacation now abides by the RFC 3834 header “Auto-Submitted:” (fixes SourceForge issue #3062665).
  • Fixes up some Coverity grumbles (a redundant fopen() and others).
  • Compiles cleanly with GCC 4.6.2.
  • Now includes a vacation.spec file contributed by Magnus Stenman.
  • The old HTML version of the manual page was out-of-date and so it has been removed (along with html2man) leaving the nroff version the master.
  • Added German translation of the nroff manual page (Dr. Tilmann Bubeck).
  • Note that the English man page has been renamed to vacation-en.man and vacation.man is a symlink to it, so German speakers can just change that symlink before installing to pick up the German translation.
  • Clean up of some old directories in the source code that have been made obsolete by source code control (they contained old, applied, patches).

You can be involved in the development of Vacation by subscribing to the vacation-announce and vacation-list mailing lists and/or logging bugs and feature requests on the SourceForge tracker.

First Release Candidate for Vacation 1.2.7.1

Vacation 1.2.7.1 rc1 is the first release candidate for the first bug fix only release in the 1.2.7 branch.

This release fixes up a warning for orighdr in GCC 4.6.x. It also includes a German translation of the manual page courtesy of Dr. Tilmann Bubeck, the Fedora packager, and some cleanup work (removing obsolete directories).

Note that the English man page has been renamed to vacation-en.man and vacation.man is a symlink to it, so German speakers can just change that symlink before installing to pick up the Dr. Bubecks translation.

Please do grab this and test it out!

If I don’t hear any problems before next weekend I intend to release this as the official Vacation 1.2.7.1.

Old patch for Bonnie++ to use random data rather than 0′s

Way back in 2007 I posted a blog about testing ZFS/FUSE with Bonnie++ using random data rather than 0′s, and I said:

it’s not ready for production use as it isn’t controlled by a command line switch and relies on /dev/urandom existing. yes, I’m going to send the patch to Russell to look at

I didn’t get any feedback on the patch, so I’ve decided to post it here in case people are interested.

diff -ur bonnie++-1.03a/bonnie++.cpp bonnie++-1.03a-urand/bonnie++.cpp
--- bonnie++-1.03a/bonnie++.cpp 2002-12-04 00:40:35.000000000 +1100
+++ bonnie++-1.03a-urand/bonnie++.cpp   2007-01-01 13:03:41.644378000 +1100
@@ -41,6 +41,9 @@
 #include <string.h>
 #include <sys/utsname.h>
 #include <signal.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
 
 #ifdef AIX_MEM_SIZE
 #include <cf.h>
@@ -148,6 +151,28 @@
   }
 }
 
+void load_random_data(char *temp_buffer,int length)
+{
+       int filedes, numbytes;
+
+       filedes=open("/dev/urandom",O_RDONLY);
+       if(filedes<0)
+       {
+               perror("Open of /dev/urandom failed, falling back to 0's");
+               memset(temp_buffer, 0, length);
+       }
+       else
+       {
+               numbytes=read(filedes,temp_buffer,length);
+               if(numbytes!=length)
+                       {
+                               perror("Read from /dev/urandom failed, falling back to 0's");
+                               memset(temp_buffer, 0, length);
+                       }
+               close(filedes);
+       }
+}
+
 int main(int argc, char *argv[])
 {
   int    file_size = DefaultFileSize;
@@ -477,7 +502,8 @@
       return 1;
     globals.decrement_and_wait(FastWrite);
     if(!globals.quiet) fprintf(stderr, "Writing intelligently...");
-    memset(buf, 0, globals.chunk_size());
+    // memset(buf, 0, globals.chunk_size());
+    load_random_data(buf, globals.chunk_size());
     globals.timer.timestamp();
     bufindex = 0;
     // for the number of chunks of file data

Second beta release of Vacation 1.2.7.1

Vacation 1.2.7.1 beta2 is the second beta for the first bug fix only release in the 1.2.7 branch.

This release just fixes up some issues that Coverity revealed, none of which appeared to be harmful.

NB: If you compile with GCC 4.6 and see a complaint about orighdr being set but never used in rfc822.c don’t worry, its already fixed in git and will be in the first RC (assuming nothing bad is found in this version).

Please grab this beta release and test it and report any problems!

Recovering 5.25″ Floppy Disks

When I was at the University College of Wales, Aberystwyth they were just starting to supplement their serial terminals connected to X.25 PAD’s with some PC’s (Viglen I think), complete with hard disks and 5.25″ floppy drives. So I have had two boxes of 5.25″ floppies which dutifully came out to Australia with me when I emigrated from the UK back in 2002. These floppies are now well over 20 years old so I reckoned it would be about time to see whether they were still readable and, if so, what was on them. Labels like “Honeywell Backup Disk #1” only say so much.. ;-)

The first problem was that I didn’t have a 5.25″ floppy drive. Luckily my boss at VLSCI was able to lend me one. The second problem was I didn’t have a floppy cable with the 5.25″ connector on it. Fortunately Bernie at work had an old PC that was in bits which did have one, so I was able to borrow that. Then I found that the old Dell PC I was thinking of using had a really weird connector and wasn’t going to be that useful. My even older Olivetti Netstrada (a quad processor Pentium Pro monster) did have IDE, but the cables were going to be rather hard to get to as it has a bunch of SCSI drives and the cables for them were going to make it hard.

My final box was an VIA EPIA V box (originally from EverythingLinux back in 2003) which did have an easily accessibly floppy connector on the mother board, but only a single power connector for a drive. So it was either the internal IDE disk, or the floppy, but not both. I could have gone and bought a power splitter, but I thought I’d take the cheaper way and netboot it (the onboard ethernet chipset has PXE support) – it should be pretty easy.. Hah! :-) This is what the box looked like after some careful assembly..

Resurrecting 5.25" floppy disks - hardware

The first test was to see if the 5.25″ floppy disk drive worked. Luckily I had a floppy labelled as “system boot disk” and after some mucking around in the BIOS (it turns out you can set it to boot from floppy without having enabled the floppy controller, which results in it not booting from floppy and much cursing until you discover it) it booted first time – a 23 year old DOS boot disk complete with partly bilingual Welsh/English welcome screen from 1988!

UCW Aberyswyth 5.25" floppy system disk from 1988

This was very promising – the first disk had worked first time and a quick test of swapping it out for another and doing a “DIR” also worked. Now to get the data off these before they went to the great /dev/null in the sky..

My plan to recover the info was to netboot this machine as a Mythbuntu diskless front end box – Mythbuntu makes that easy to set up and with a little fiddling of the DHCP server to make sure it would only every try and serve this box, and do so with a static address, it worked. Or at least it would load the kernel. Which then complained that it couldn’t boot as it needed a CPU which had PAE support. :-(

This VIA EPIA V has a low power (5W) 533MHz VIA Eden CPU (appropriately the kernel detects it as a VIA Samuel 2) and whilst it is IA-32 it doesn’t have some of the newer features which are selected for Pentium class processors in the current Linux kernel.

Oh well, that’s fine, I knew I could use Debian instead, so I used the ltsp-build-client (creating an /etc/sysconfig/ltspdist file first containing the line VENDORDEF="Debian" first so it would use the right set of scripts), thus:

ltsp-build-client --chroot sid --mirror http://mirror.internode.on.net/pub/debian/ --dist sid --purge-chroot --arch i386 --accept-unsigned-packages

Problem was that all the various kernel command line options for specifying the NFS server for the root filesystem just didn’t seem to work, it would just sit there saying something like “Waiting for root filesystem” and eventually give up and drop me to a busybox shell prompt, and a simple cat of /proc/cmdline showed the options were being set correctly. A little more thought and an examination of the config file for the kernel showed that Debian doesn’t ship kernels with CONFIG_NFS_ROOT set, so it was never going to work. :-(

Whilst I could have rolled my own kernel I decided to instead have a look to see if I could find a Linux distro that included PXE booting as an option and a Google search for “linux distro pxe” turned up PLoP Linux as the first hit.

PLoP Linux is a small (75MB) distro aimed at data recovery operations that comes as an ISO, tar.gz or zip file for i586, x86-64 and (crucially for me) i486 processors. They even have a separate tar file for the PXE files. It was easy to set up and so I booted the PC with high hopes. Then I got the same error about the kernel requiring PAE support in the CPU that I got with Ubuntu. Whilst there was an i486 tar file there wasn’t an i486 PXE tar file! That was easily solved by grabbing the i486 ISO, doing a loopback mount of it and stealing the kernel and initrd.img files from it instead.

This time it booted, and I found that it had just what I was after – a Linux shell prompt, working networking, mtools (for mcopy and mdir) and (most importantly) ddrescue to let me create complete images of the floppies. I created a directory for each floppy disk and then did ddrescue /dev/fd0 floppy.img to make the image. I created another directory called Contents and from there did mcopy -vms a: (yes, I ordered the options that way deliberately) to copy all the files and subdirectories off complete with their last modification times (from 1987 and 1988 generally).

In all I was able to recover 20 of the 21 floppies with no errors at all, which amazed me as I was expecting them to have degraded over time (especially as one box was just a flimsy cardboard box). I was hoping to have original B source code from HoneyBoard (the bulletin board that Alan Cox and others I knew there wrote) and AberMUD from the Honeywell L66 but sadly that doesn’t appear to be the case. There are 3 B programs but one is just just 3 lines (calling a drl) and the other two appear to be two versions of some sort of shell which I didn’t write as the first has a btidy timestamp from April 1987, before I arrived at Aberystwyth.

The 21st disk was completely unreadable – the drive didn’t seem to want to acknowledge its existence and ddrescue couldn’t see anything as the floppy driver in the kernel couldn’t get the drive to provide any data. I might try ddrescue’s mode of copying data in reverse to see if that manages any better..