/gallery

Some tests I'd want to run...

03 Jun, 2013
0 Comments

Some days I wish I had spare hardware of decent size that I could just do testing on.

Requirements :
4x Machines, at least two of them identical in all aspects.

Of these, two would be set up to use Gluster, one would be load generator, and one would be load tester.

Some numbers that I want, just as comparision:
glusterfs fuse mount, replica 1 , stripe1 , 2 nodes, no mdadm raid.
Raw disk performance with 4, 8, 16, 32, 48 load threads, single client.
2 tests, Random read/writes ( majority read, but lots of them over big space) and a majority big-block write.
IO/s (IOPS) and Throughput being the most interesting ones.
--
nfs mount, same test. Comparision.
--
The same test but with each glusterfs brick being on raid1 (mdadm)
--
Add a fs-cache on SSD for the client, and compare the same numbers.
--
compare raid 1 and raid 10 ( 2 disk vs. 4 disk ) on the storage nodes.
--
Next up, the "interesting" tests.
Create 4 virtual machines, three on the "load generator" machine, 1 on the "measure" machine. On the load generator, have one machine do random interval of "sleep: read-heavy: write-heavy: sleep" and two others doing mostly idle, but scattered /random reads with small writes.

With this as "background load", wait until the system stabilizes, and once again measure the threaded abilities on the other machine. Watch latency-numbers.
--
With this background load, which is faster, raid1 vs. raid10? How do threading options in the cluster affect this? Read-ahead vs. cache?
--
How can you tune it to avoid high latency bursts?
How can you _Warn_ and know that the latency is about to hit the ceiling? How much load can be passive in the background before latency hits 30 sec or more for an operation?
( In a shared host scenario like this, 30 sec disk latency spikes would be considered unbearable )
--
Basically, I want to know when the thing breaks, and how to avoid it breaking too early. If you need to sacrifice throughput to get this, I'm fine with that. But, how does it break and when. And how do you avoid it?

Code Kata

02 Apr, 2013
0 Comments

So, The last two days I sat down to do a testing/Deployment related code kata for Python & Javascript.
It was a bit of hit&miss, but I'll at least document it to see what to think about.

The coding task was a generic/simple "count to a hundred" task. In bash it would look like this:
for i in `seq 1 100`; do echo $i;done
And not much worse in Python.

Except that it had to be both a package (library) and a standalone script at the same time. It needed documentation & test-cases.

Even this isn't much to go on, so the code was purposefully implemented as "crappy" as possible. Functions named "print1" and "print2" were made parts of the public API. ( Test-cases were written to make sure I didn't break this API )

Not bad enough. We needed code to test the output to screen as well. ( Mocking "print" would have been too pretty, so I went with an output pipeline ). And then we needed to object-orient it.

At this time you should have a sordid mess on your hands. Different object-classes for different numbers, various inconsistent names and calling conventions. Declare a couple of them as "public interfaces" and make sure you write test-cases for them in their own suite. This set of test-cases are not allowed to be refactored, and should capture bugs you make.

On the other end, you also write test-cases for the things that _aren't_ part of your API, but put these in a different suite.

Soon we had a nice list of objects, one for each number, with dynamic re-creation of it's own class with increments. At this point, don't hold back. Adopt strange conventions. Write temporary files and pass around. Toss in a lambda for no other reason than because you heard about them. Make a mess.

The step from there, to adding a text representation of itself ( implementing __str__ for the class ). This means that you need to refactor your mess. Toss out your "ugly" code, while retaining the public API. If your API has some objects, you need to still support them, while adding this functionality elsewhere. Revamp your functions to use the new organized class.

After this, split out the new functionality into it's own package. Copy the tests over, and update your old package to include & depend on the new one. Make sure you don't change your tests. No changes in include paths or other things.

After this, it's time to go Online. Make a simple HTTP fronted for it. Make sure it's REST-ful. Stable URI's for each number. Do not include English words in your URI's, you want to be translatable. It's all working in the webbrowser, add some basic HTML-templating to get it rendering centered.

This is where I went wrong. I should have spent the time to implement Web-based testcases. Selenium or something similar. My bad, and I'll redo this part of the kata.

However, at this time, change your mind and add another AJAX version of the interface. Since AJAX is all about JSON these days, bolt JSON on top of it. Accept: headers, and keep backwards-compability with the web-site version.

Are your URI's stable? Is the interface cache-able? Are you stateless?

In my next exercise for myself, I'll add failure modes to the HTTP interface, and try to learn error handling in Javascript. I don't know what the best way of doing it is, I think I'll figure something out.

And if you think the above is messy? Yeah, it is. It's supposed to be, the exercise is to wade deep into the mess, and to get your head above the surface and bring it together. Hold yourself to high standards. Running Lint at all times, making sure your documentation is there, that your functions aren't too long, and that you avoid duplication.

But not to start with. In the beginning, you need to make a mess. The better your mess, the more fruitful it is.

Update

02 Apr, 2013
0 Comments

A quick note,
The site is back after a move, but I may have managed to corrupt the gallery style. Work on that will have to come in a bit ( note to self )

Took the time to upgrade to habari-0.9, appears to have gone well.

Varnish is currently acting as a cache in front, will have to see about replacing lighttpd with something else.

Signing off.

Testing Gnome - The Report

09 Aug, 2012
0 Comments

So, today I posted my report on "Testing Gnome" to the Desktop Development mailing list

Below is the full text of the report, attached is a PDF. report-edited.pdf

License of this is CC-BY-NC-SA and if you contact me, we'll discuss other options.

Some notes Gnome Testing

Abstract

A short report on my work exploring the various methods of testing employed currently in the Gnome projects, the state of testing as a whole, and our current state of qa work.

First a brief section about various kinds of testing, then how they are currently in use or not. Following up with a suggestion on how to improve the current state of things.

About the Author

I'm D.S. "Spider" Ljungmark, Long time Linux user, developer, administrator and general geek.  Over the last year I've become more deeply involved in software testing; I decided to apply some more of this domain experience onto the Gnome ecosystem in order to see what I could learn and help out with.

Before this, I was a Gentoo packager for several years until retiring from the project, where I was also working. ( spider@gentoo.org )

This article is a distilled form of my talks on the subject, held at Guadec, Spain, July - August 2012.

About software testing

Software testing is a relatively young field in terms of popularity, and has been developing rather rapidly during the last decade, especially as development models have changed, causing the testing process to change as well.

When naming things in "Software Testing" there's an abundance of confusion, disparity and methodologies available. I'll just go over a few of these.

Unit Tests

Unit tests are probably the most well known method of testing, and is generally well perceived. They are a programmer aid, at a low level of the code, testing the smallest piece possible for known good / known bad functionality.  It is more a tool to prevent regressions when re-factoring and maintaining code, than a tool to increase an arbitrary term of "Quality".

The value of Unit testing vs. Code By Contract and strict syntax languages can be debated forever in academia, but it is generally recognized that unit tests increase in value as you add developers to a project.

Integration Tests

Integration tests come in (at least) two shapes, but are in all forms testing the edges and cooperation between projects and others.

In the simplest form, this is a linking level test, making sure that the application still builds against libraries after upgrades.

In the more advanced setting it is a protocol level test, where tests go against an external server in order to make sure that communication doesn't break.  Integration tests of this kind usually belongs on the lower end of the protocol stack, in libraries defining HTTP/IMAP access, Dbus calls, and similar.

Smoke Tests

Smoke testing has it's name from the hardware world, where a unit was plugged into a power source. If blue smoke didn't happen, it passed smoke-tests.

In the software world it's usually centred around software starting after build, supporting something minimal.

Performance Tests (KPI tests)

KPI stand for "Key Performance Index" and they are a way of measuring that there are no regressions in performance. The testing is either done manually (stopwatch and button presses) or programmatically. Commonly they contain start-up, common actions, or load testing.

Example: measuring the time of an import job ran repeatedly;  measuring the CPU load of a user login to a web forum. Me

Acceptance Tests

Acceptance Tests come in two forms, classical, and "Agile"

In Classical acceptance testing, it is the job where a client verifies a delivered product ( Bridge, software, car ) against the contract or work order, and is only performed at the end of the development cycle. It usually involves massive amounts of work and long hours of lawyers.

In the more modern "Agile" method of the same name, this is limited to testing the desired function of a single feature after completion. This usually also involves making sure that development process was followed (documentation, unit tests, check-ins) as well as ensuring that the user story was fulfilled according to spec.

Acceptance tests of this kind are usually Automated, while the first kind are a mix of automation and manual.

Functional Testing, System Testing

Functional Testing is what is considered when we ask if something meets design criteria. It is usually pitted against System Testing, which is asking if something meets User Criteria.

Both are commonly performed manually.

Gnome Method of Testing

The Gnome Method of Feature Testing was established in the 1.4 to 2.0 days, and reintroduced to popularity in the 3.0 release. It involves removing all traces of a feature from the code base, waiting for a release cycle and listening for complaints that it was removed and that Gnome developers aren't listening to feedback. If there are no complaints, the feature was broken and doesn't need to be fixed. If there are complaints, it's considered for redevelopment.

For more information about Software Testing, see http://www.satisfice.com/ or similar resources.

The Status and History of Software Testing in Gnome

Unit Tests

Unit Testing is commonplace in many OSS projects, and are what people mostly consider when they hear "testing". Various projects use it to differing degrees, but overall it's well considered, recommended and in use.

Integration Tests

Integration tests is done by several parties in the Development and Distribution cycle. Of the two flavours, a few projects have service type of integration tests, testing the ability to connect to localhost with sftp, imap servers, libsoup tests against remote servers. On the whole, this kind of testing requires a formal setup and laboratory with various kinds of servers to integrate against, something that few OSS developers have access to, and which would require the infrastructure support by a larger organisation to support. There is certainly some of this happening behind closed doors at various distributions, but this remains a hidden effort, if it exists.

The other kind of integration is an area where the Distributors today are doing a stellar job. Both testing new libraries against previously compiled software,  as well as testing rebuilds of the whole software stacks to ensure source level compatibility.  The job is spread out between many parties but with "upstream first" and similar incentives. A lot of work is being shared between parties. Trivial fixes tend to be solved on many places at once, while more advanced fixes usually end up upstream with a "formal blessing".  Of course, there have been rare cases of the procedure not working, though on the whole, this level of testing is a solved issue.

Smoke Tests

Smoke testing is basically done by a developer in the compile - launch - crash cycle, and is universally performed by developers as well as distributions.  There are some notable exceptions such as Fedora's Rawhide  ( re: https://lwn.net/Articles/506831/ )

Performance Tests

As a rule there are few performance concerns, and fewer tests still. The GTK+ toolkit has some, but there are no regular tracking of performance regressions ( due to in large part, lack of continuous integration and associated quality reports )

Examples of regular KPI's that could be done, and maybe, should be done, would be "start-up until icons rendered" in Gnome Documents ( currently, 8+ seconds at times) "Start-up time of Rhythmbox" (until it can select a song and play it) ( varies depending on library size. ), "nautilus browsing of /bin" or the time it takes Evolution to display a mailing list folder.

On lower levels, there are tests on performance of gstreamer encoding, decoding, as well as timing issues of certain unit tests, however there appears to be little concern for this overall, and no continuous reporting of progress, regression or stagnation.

Automated Acceptance Testing

There has been attempts in the past, with Mago ( last 2.12 or similar? ) Linux Desktop Testing Project and Dogtail.

Some applications ( PiTiVi ) have actively maintained acceptance tests using this/that work and are a core part of the current development, but as a whole this has been a largely abandoned and ignored part of development culture. The various attempts to organise such testing have stagnated as they are not part of the daily development.

Musings

 In order to have such tests working, they have to be part of the project and be maintained together with the project. This means being part of the build system (not building without testing infrastructure in place) as well as being part of the commit and release criteria. Staged commits, automatic reverts and a culture of accepting the fact that "breaking tests are bad" needs to work before we can enforce testing. Also, adding testing as an external project is doomed to obsolescence and breakage during the next development cycle. Making the tests a burden rather than a gain.

System Testing

There is no concentrated effort in Gnome at the moment to do this kind of testing. Various distributions have organised test days (Fedora, Ubuntu) test teams ( Ubuntu, SUSE ) which do manual testing with users as well as professional testers.

These tests are probably overlapping, and vary in quality and range of what is tested. All from "Create a new launcher in the dash" to "merge conflicts in file copying between Samba and local content."

Other than that, there is the well known Debian model, to release software to a range of users (experimental, testing) , and if nobody complains for a time, assume that it's working. This method has dubious value to the quality of actual testing going on, and while not time efficient it is definitely cost efficient. That is as long as time is not related to money.

The Development Model and Testing

 The current development model is based in a few layers;

  First the Original Developer

   codes

   smoke tests

   (unit tests)

   (acceptance tests)

  Then the code reaches "other developers" (jhbuild users)

   smoke tests

   functional acceptance tests ("This feature doesn't work")

   integration tests ( limited to compiling and launching)

   (unit tests)

  Then a release is rolled out, and Distributions take over

   smoke tests

   integration tests

   Function tests (Gedit can't edit files!)

   ( Functional Acceptance Tests )

  The distributions then roll out to beta users

   Function Tests

   Acceptance Tests

   Integration tests

  And then it reaches users

  

The Proposal

Without changing the world, building CI systems and forcing a culture of testing and reverts on the world, there is an somewhat easy way of changing the current state of things.

My suggestion is to write test cases  for System Testing and to give our Distributors a coherent set of tests for System Testing as well.

The tests do not have to be all encompassing, but they should target the "core applications" to make sure that we retain functionality and do not regress there. Currently, there are several such test-cases both from Ubuntu, SUSE and Fedora,  and organising this effort upstream appears to make sense.

This could later be coordinated with Live Images for a faster feedback loop between "Code hitting tree" and "user testing it", and would also give a basic set of functionality that should not be broken before a release of the software.

Furthermore, I suggest having these kinds of tests for all features announced in Release Notes, as those features are seen as "promises" to the community.

Other gains

Other than the immediate gains, this would give us an easier way of getting interested parties involved. By bridging the step between "just use jhbuild" and "I want to try something" we can engage more users in a smoother way.

Having this set of tests would be a way for us to indicate the features that we consider "core" to the heart, and where we have an assumption on the distributions that they should not break without very good reason.

It may also serve as a check on our own development model that we do not accidentally break work flow and user experiences during redesigns.

Coverage

We do not need to aim for complete coverage of the tests, doing so is unnecessary and will complicate things. It is better to have a small amount of tests that we can maintain, than to reach too far and become unsupported.

85% test coverage is a noble goal in Unit Testing, but is impossibly high for Acceptance Testing and System Testing.

Implementation Details

At the beginning - it's better to keep things simple, working from the Wiki and organising test cases. Based on feedback and interest from various parts, we can then branch out and maybe formalise and systematise things further.

Why not?

Why not automated? Isn't automation better?

Not in this case. Mostly for many cases we would end up spending more time running and maintaining test suites than here. Tests suites are also not proof that the software is sane and doing something useful.

This would also allow us to involve our large community.

Guadec speech: Testing Gnome, A Software Testers Approach

28 Jul, 2012
0 Comments

So, I haven't finished the report attached with these slides yet, nor are the work on testcases finished enough to publish.

However, the slides should be available anyhow, release early and so on.

Enjoy, 2 pdf files,
gnome-testing (pdf) ( First part, about the current state of testing )
new-tests (pdf) ( Second part, about what we can do better )
White slides are speaker notes, a bit more verbose than usual, the rest are shown.

Images are licensed CC BY-NC-SA , slide texts are CC BY-SA .

I'm on a ....

24 Jul, 2012
0 Comments

Well, today I'm off to Guadec, 2012 in A Coruña Spain where I'll amongst other things, will talk about "Testing Gnome" where I apply some of the domain knowledge of a software tester onto the OSS world and see some of the fallout.

It's been a generally interesting piece of work, and hopefully I'll be able to continue working on it. But now, I need to find my shoes, cause I have an airport to attend to.

Fedora 17, NetworkManager and VLAN

16 Jun, 2012
0 Comments

If you want VLAN (802.1q) to work with Fedora 17 NetworkManager setups, the configuration for the VLAN interface should NOT contain TYPE=Ethernet

If it does, it will fail. Hard.

So, this (example) is a working one:
ifcfg-em1.1:

VLAN=yes
DEVICE=em1.1
REORDER_HDR=yes
PHYSDEV=em1
UUID=73771fae-1cdc-b68b-632c-312f9aa400f7
NAME="Vlan 1"
HWADDR=8C:89:A5:C1:1A:C2
BOOTPROTO=dhcp
# TYPE=Ethernet
DEFROUTE=no
USERCTL=no
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
ONBOOT=yes
PEERDNS=yes
PEERROUTES=yes
NM_CONTROLLED=yes

ifcfg-em1:

DEVICE=em1
PHYSDEV=em1
VLAN=yes
NAME="External"
HWADDR=8C:89:A5:C1:1A:C2
UUID="b503e152-be11-47b4-ac42-1fbf2f7d882c"
BOOTPROTO=dhcp
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
PEERDNS=yes
PEERROUTES=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_PRIVACY=rfc3041
NM_CONTROLLED=yes

A Debugging Workshop

14 Jun, 2012
0 Comments

So, for my Dayjob I prepared some lectures and workshops, of which at least one will be useful for a wider audience, so I'm reproducing parts of it here.

This post will be edited for formatting at some point later.

So, in our business, we're working with _many_ servers that aren't really like eachother. For most of these there isn't much in the way of administration, which means that debugging things turn into a day to day occurrence.

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

– Brian W. Kernighan

Basics: The environment

Free disk space
$ df -h

Free inodes
$ df -i

CPU usage
$ top
Shift-P

  • Is something using a lot of CPU?

  • Wedged process?

  • Does the list look like what you expect?

  • Load average? Over 1.0 is usually trouble
  • Memory usage:
    $ top
    Shift-M

  • Check if things look bogus there.
  • System logs:
    $ less /var/log/messages

    Look out for errors / repeated things

    Network connectivity

    Some basics:
    If you can SSH into the machine, you only need to check network resolution and reachability from the outside/inside networks.

    $ host google.com
    => network is reachable
    => DNS works
    If you can't resolve, try internal networks.

    --
    Local name resolving
    $ hostname
    $ host $(hostname)
    Not able to resolve your own hostname? Bad news my friend. Check /etc/hosts, hostname should point at 127.0.0.1

    $ ping ping.sunet.se
    Routing works, check for speed patterns and slow jumps.

    $ route -n
    0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0
    Do you have a routing table? Default route is 0.0.0.0 goes via 192.168.1.1, if you don't have a route for 0.0.0.0, you have issues.

    Can it be reached over network?
    $ curl -v http://my-hostname-internally/
    If not, check the server firewall
    $ iptables -v -L

    For debugging, make sure that INPUT policy is ACCEPT.

    Further debugging for the various services you may need, check result of
    $ ps aux |grep
    R and S states are good (running, sleeping) Z, D and T are bad (Zombie, Waiting for IO, stopped).
    Compare the running user with the expected user. Also check the timestamp for when it was started.

    Dig into the application you're looking at, config files, logs, check ports and connectivity with `curl`.

    Further explore using :
    $ lsof -nP -p $PID
    and
    $ strace -ttTf -s0 -e open,connect,send,recv -p $PID

    Spotify, Aargh

    24 Mar, 2012
    0 Comments

    So,
    I go back in the day, there's a post around here on OiNK going down due to copyright courts.

    Recently, I'm in an office where I don't have access to my own music library, meaning no downloads, only streaming services. Since Google Music ( Or whatever they call it this week, Google Play Music I think?) is unavailable in my country, thanks-a-lot, and WiMP requires the discontinued Adobe Air leaving me with Spotify for my daily needs.

    And some days it is a complete act of frustration.

    To start with, something simple, the lack of music. Artists such as "Godspeed You! Black Emperor" are nowhere to be found. Along with a lot of others.

    Then there comes the ephemeral feeling of things. I had a "Grawwl" playlist, featuring Novembre and Eluveitie. Now, what happens? Both the albums I had in this playlist has disappeared. Not _all_ the albums by the bands, just some.

    What sense this makes? No clue.

    Even more annoying, when you search for them, and the albums you had in lists, are there. It's just managed to disappear from your playlists.

    Next up, we have the completely -broken- metadata. Deep Purple's "Burn" Is appearantly from 2005. And the anniversary edition from 2004. Oh, right, three songs are missing from the first of those. Their first album ( "Shades of Deep Purple", 1968 ) came out in 2003, a year without any re-release of that album according to Discogs.com

    Then we have the Live pollution. There is absolutely no separation of live albums from studio albums, and while I guess some live recordings really are better than a bands studio albums, I generally try to separate the two. Especially when I'm looking for a certain song.

    Then we can complain further on how _utterly horrible_ it is to hit artist collisions. Things like ISIS get a horrible mixture of things. Click around a bit on spotify and find a mix or R'n'B/Hip Hop artists with the Post Rock that I was looking for. Jarring and disturbing for all the wrong reasons.

    Then we come to the classical music. Oh my fucking god, Is Beethoven an artist or a composer? What, you can't search for composers? Oh right, which orchestra/Director/Solist was it on that Bach sonata? Azerbaijan symphonic tincan orchestra?

    No separation, improper tagging, no way of following it up, and a bad selection.

    Then we can add certain things like "no-gap recordings" and audio quality. Well, they simply don't deal with no-gap recordings properly at all, Dream Theater - Awakeis a typical example. The Mirror that turns into Lie? Right, it's missing the beats.

    So. Poor selection, worse tags than even the _pirates_ have, and an annoying UI where you can't even copy an artist's name and put into a chat message?

    Right, of course. I forgot to mention. Low quality album art.

    So, overall. It's a frustrating thing. It feels ephemeral, you can't trust it.

    Converting Fedora 16 from i686 to x86_64

    22 Mar, 2012
    1 Comment

    So,
    After a piece of hardware melting down on me, it turned out to be time to upgrade to 64bit distribution on one of my machines that had previously been on 32bit.

    Crazy that I am, I decided to try and do it on the fly rather than get a USB stick and sorting it out. These are some of my notes from that.

    First, it's time to clean out your machine. Make sure that
    yum check
    as well as
    yum distro-sync
    are clean and working satisfyingly, this should have your machine up to the same set that the official mirrors are.
    ( Also, clean out any packages you don't use, the less you have to work with, the easier the next steps are)

    After this, grab a current kernel from the upgrades repo of the x86_64 branch.
    A check of `/etc/yum.repos.d/fedora-updates.repo` and you'll see :
    #baseurl=http://download.fedoraproject.org/pub/fedora/linux/updates/$releasever/$basearch/
    download the kernel, and install it: `rpm -Uvh --ignorearch kernel*.x86_64*rpm`

    Now reboot into your new kernel and see that things work. Yey.

    Time to get a list of installed programs, and sort things out a bit more.
    install the `yum-downloadonly plugin` for yum.
    Then run `rpm -qa --qf '%{name}.%{arch}\n' |grep -E '\.i.86' |sed s/i.86/x86_64/g >rpmlist.txt`

    then it's time to download the packages (this will take some time)
    yum --downloadonly --downloaddir=$SOMEPLACEWITHSPACE install $(cat rpmlist.txt)

    After this, it's a huge slew of rpm upgrade packages to be done:
    rpm -Uvh *rpm --replacepkgs --replacefiles --ignorearch

    This churned on for a while, then hung with a db4 compat issue. removing the __db.00* files in /var/lib/rpm and doing an rpm --rebuilddb seems to have fixed that database without complications.
    After this, I tested, went down into runlevel 1 ( Probably should have done this before doing the mass upgrade. It seems like a saner thing to do ) and started some commands, verified that things looked okay. Then rebooted it all to make sure.

    Success.

     1 2 3  26 Next →