Date: Mon, 13 Jan 2003 07:10:46 -0800
From: Juli Mallett <jmallett@FreeBSD.org>
To: Niklas J. Saers <niklas@saers.com>
Subject: Re: Regression testing

* De: "Niklas J. Saers" <niklas@saers.com> [ Data: 2003-01-13 ]
        [ Subjecte: Regression testing ]
> Hi, I'm writing a master thesis on FreeBSD's methodology, and I was
> wondering if there are any documents on testing (and in paricular
> regression testing) for the FreeBSD project? I ask you because I noticed
> in src/MAINTAINERS your status on "regression" with willing to help. If
> there are none, would you mind telling me about how much this is used,
> how large parts of the system are tested, what frameworks are used and
> such?

If you take a look at src/tools/regression you'll see most of what we
currently have.  It consists of some library function tests for things
in libc, and a generalised infrastructure for doing regression tests
of most types of user applications that FreeBSD provides.  The latter
is something I came up with which allows arbitrary input / flags to be
given to a program, and then for the output to be checked for validity
against a known good copy of output.  Another big set of regression
work in the system is phk@'s GEOM test framework which is very very
extensive and involves running bits of the GEOM code in userland, and
so on.  Those are in the same general area of the source tree.

If you would like anything more specific, let me know, by all means.

Thanx,
juli.
--

Date: Mon, 13 Jan 2003 07:49:22 -0800
From: Juli Mallett <jmallett@FreeBSD.org>
To: Niklas J. Saers <niklas@saers.com>
Subject: Re: Regression testing

* De: "Niklas J. Saers" <niklas@saers.com> [ Data: 2003-01-13 ]
        [ Subjecte: Re: Regression testing ]
> I wonder if there is something written down surrounding this topic? What
> choices have been made and why? If I understand this framework correct, it
> is a set of macros that will test a utillity with given input to a
> predicted output, and geom does something else. How much of userland has
> been covered by these tests, and are there any particular targets for how
> much is wanted? How common is it for developers to use this to write
> tests?

Anything under the area in question where you see 'regress.sh' there is
a set of tests written in this, which are processed via m4 using the macros
in question, and then passed to a shell interpreter.  Utilities like
sed(1) are covered because they had a number of bugs until recently, and
I've been encouraged to make such tests for utilities I have an interest
in, and a handful of others have as well.

It's sadly infrequent that tests are written, but there have been a
handful of times people other than myself have done them, or whatnot.
An interesting test in the same location but not using the m4 is a
Makefile which is a self-test for the make(1) program in that it is
a check to make sure that make is behaving properly, and so it tries
to do both right and wrong things.

> Also, why has this strategy been chosen. I mean, the combination of
> possible options is enormous, and having a predicted output for every
> combination isn't only an enormous amount pr tool, but by the size of it
> it the output is likely to contain errors from errors when they were
> generated. I am sure you have discussed many testing strategies, so I'm
> curious. :)

The only options to be tested are the very basic ones, which will likely
lead to seeing (clearly) any fundamental errors (e.g. off-by-one in
output vs. input), and the ones where bugs have been fixed, to be sure
the old bug doesn't come back.

It was chosen because people said we needed tests for such things, but
nobody wanted to do them, and as I was very concerned with how things
like xargs(1) worked at the time (I practically rewrote it) I needed
a set of tests anyway, to make sure _I_ didn't change any behaviour.

And there have been cases where the tests were wrong!  For example
when I added Base64 support to uuencode and uudecode I had tests here,
but the input was not _exotic_ enough, it was not _binary_ data,
as it were.  Now, the test input and output are from my /dev/urandom
a long time ago :)  So we were doing the wrong thing, but the test
cases didn't show that.  Now, of course they do.

What it has come down to, what I have done, is what I need to accomplish
what I need to accomplish.  A lot of it is that I'm stubborn, but not
so stubborn as to want to accidentally break something :)  I like
to know when I do that, and this has helped me a few times.

> BTW, on what mailinglists have discussions surrounding tests been held?
> I'd love to have a read. :)

There haven't really been any.  The initial stuff was just shell, and
eventually when people showed interest in doing more tests, and when I
had to add a lot of tests, I decided based on many recommendations
to use m4.  I think the m4 tests still test both GNU and BSD m4, since
we now use OpenBSD's, which supports a lot, most of the important ones
I mean, of the GNU extensions to m4.

Such suggestions likely came from other developers on IRC.

Thanx,
juli.
--

Date: Mon, 13 Jan 2003 08:46:06 -0800
From: Juli Mallett <jmallett@FreeBSD.org>
To: Niklas J. Saers <niklas@saers.com>
Subject: Re: Regression testing

* De: "Niklas J. Saers" <niklas@saers.com> [ Data: 2003-01-13 ]
        [ Subjecte: Re: Regression testing ]
> You talk about GNU and OpenBSD tests. Do you import tests from these
> projects?

No.

> > I've been encouraged to make such tests for utilities I have an interest
> > in, and a handful of others have as well.
> 
> Have many other than pkh and rwatson contributed tests?

Brian Fundakowski Feldman (green@) did, Tim J. Robbins (tjr@) has, and
is fairly active with libc tests.

> > It's sadly infrequent that tests are written, but there have been a
> > handful of times people other than myself have done them, or whatnot.
> 
> Actually, I've found when talking to developers that many have written
> their own little tests, but strip them away when the code is committed.
> I haven't set out to test this specifically, but I think that elements
> such as that they don't consider the tests general enough, that the tests
> are specific to their equipment and that they haven't put enough work into
> the tests to give them the high standard they feel they should have,
> contribute to them not committing the tests.

That's very much true.  Lots of code in the tree even has #ifdef TEST code,
off the top of my head, I immediately think of xargs(1)'s strnsubst.c, which
went through a number of incarnations, and so I kept testing to be sure I
was getting it right.  That was for xargs -I.

> > The only options to be tested are the very basic ones,
> 
> Why are these the only options to be tested?

Becuase there's no point to test everything, I mean, there is, but that's
not really necessary.  The basic ones show off fundamental problems.  The
more specific ones show off specific problems.  Testing everything would
show everything but would be cumbersome, and likely to fall out of date at
more rapid speed.

> > What it has come down to, what I have done, is what I need to accomplish
> > what I need to accomplish.
> 
> Do you plan on using energy encouraging other committers to make tests for
> their applications?

I have, but it doesn't go far.  Personally, I'll do more with regression tests
as it occurs to me to do so.  I worked extensively on make(1), so for a while
I did a lot of make(1) regression work.  Right now I am not doing anything
that really makes sense in that context, so.

> > A lot of it is that I'm stubborn, but not so stubborn as to want to
> > accidentally break something :)  I like to know when I do that, and this
> > has helped me a few times.
> 
> Hehe, this is what led me to write tests myself as well. It's good not
> having to wonder if the right thing has been done. :)

Computers will do what you tell them to, so if you can tell them to make
sure they're actually doing it, then you can tell it to do more and more,
and not have to second guess yourself until you tire of working on them.

> Thanks for your quick and helpful answers. :)

Sure.  My pleasure!

Thanx,
juli.
-- 

Date: Wed, 15 Jan 2003 04:02:25 -0800
From: Juli Mallett <jmallett@FreeBSD.org>
To: Niklas J. Saers <niklas@saers.com>
Subject: Re: Regression testing

* De: "Niklas J. Saers" <niklas@saers.com> [ Data: 2003-01-15 ]
        [ Subjecte: Re: Regression testing ]
> Hi Juli, and thanks for your answers. I only have a few more questions. :)
> 
> > > You talk about GNU and OpenBSD tests. Do you import tests from these
> > > projects?
> > No.
> 
> Do you import any tests at all? Or is the only similarity that you all use
> M4?

That's the only similarity.

> > I have, but it doesn't go far.  Personally, I'll do more with regression
> > tests as it occurs to me to do so.  I worked extensively on make(1), so
> > for a while I did a lot of make(1) regression work.  Right now I am not
> > doing anything that really makes sense in that context, so.
> 
> Ok. In what way did you encourage people to write tests?

By pointing out situations where they have helped, and generally encouraging
people to write them on the grounds upon which they are obviously useful.
Not as aggressive as I could be, but better than forcing people to do them,
or all but.  Maybe our current tests will gain prominence in some fashion,
and that will encourage people to write more.  I'm not sure.

Thanx,
juli.
-- 

Date: Wed, 15 Jan 2003 15:08:15 -0800
From: Juli Mallett <jmallett@FreeBSD.org>
To: Niklas J. Saers <niklas@saers.com>
Subject: Re: Regression testing

* De: "Niklas J. Saers" <niklas@saers.com> [ Data: 2003-01-15 ]
        [ Subjecte: Re: Regression testing ]
> Hi Juli,
> 
> > > Do you import any tests at all? Or is the only similarity that you all
use
> > > M4?
> > That's the only similarity.
> 
> Why aren't the tests imported? I mean, that sounds to me like it would
> take far less time than to write them ourselves?

The framework to support them may be too extensive to justify polluting the
tree with.  Or we may not be able to use such at all.  Or they may simply
be of no use to us.  They may test similar things in similar ways, but that
doesn't mean much.

Thanx,
juli.
--