Comments on: Testing, 1, 2, 3.. Testing... (Reflective Testing)
Testing, 1, 2, 3.. Testing. In my old TV studio days, that's how we tested microphones (and set the audio channel gain as well).
Thinking back on it now, it's funny how much we tested microphones. In fact, to someone who's not been part of live performance shows, those of us who setup microphones may seem to be... well, overly paranoid.
But, more often than not, it was the microphones that failed during live shows. Failures did not happen often, maybe once a year (1 in 1000 shows), and normally during the large audience 6 PM evening news. For any of a variety of reasons, one of the microphones would fail. Murphy's law. And, we'd all wonder (normally because the director was shouting it)... didn't we test that microphone? Did he pick up the wrong one? Or, did a wire come loose inside? In the worst case, we'd have the floor manager hand over a new mic or get the talent to trade mics.
Anyway, what does this have to do with REBOL? Well, REBOL 3.0 needs testing. And, it needs a lot of testing - more than even microphones.
Much of this testing can be done by me, the designer. But, there's only so much I can do. With 56 datatypes times 68 polymorphic actions per datatype, as well as a few hundred built-in natives and mezzanines, there are a lot of combinations. Add in function refinements (modifiers) and you get a sense of the level of testing required. I figure we need at least a minimum of 100'000 tests in the test suite. 500'000 may even be more likely in the end.
Right now for R3, I have a "hand made" test suite with 3'000 tests (what I call test vectors, because each tests one specific element of the language). Realizing that more is needed, and also that time is short, I've started using the technique I'd call "reflective testing". (There's probably an official term for it, not sure.)
Reflective testing is where the language itself provides the information to generate the tests. Those of you who know REBOL know how this works, and some of you have already written various reflective tests and analyzers over the years. To those of you, there is nothing new here.
I must admit that in the past I've not seen the value of reflective tests and generally discounted them because they are not created from specifying expected behavior, but from what the language actually does. That permits serious errors to go undetected simply because the language allows them.
These days, I'm rethinking the approach. Reflective testing can be valuable if you certify the results by inspection. That is, you carefully check the results.
The basic method becomes:
- Use REBOL as a test generator to generate tests for specific datatypes or functions.
- Inspect the results. Although this is time consuming, it's a lot less time consuming than typing-in thousands of tests and figuring out all the valid test combinations manually.
- Once certified, a test can now be stored and used later to check for regressions (new bugs).
Using this approach, tens of thousands of test vectors can be created in a week. While it is true that there will still be various holes in the tests created by special-case combinations, I think this method yields greater coverage and leverage in the testing process. In fact, I've already found several bugs or oddities during the tests.
And, no, we never tested microphones by tapping on them. If you did, you'd risk the control room engineer running after you with a sharp object.
Coverage tests and function usage profiles would help too, particularly to assist in figuring out which functions are the most important to optimize.
[And, no, we never tested microphones by tapping on them. If you did, you'd risk the control room engineer running after you with a sharp object.]
..hahaha.. it's though the simplest/most effective test there is ;-)..So how did you test it? Wizzle?
Im currious Carl how much brain cracking it needs to build this "splendit Idea!" Reflective test tool..Hope it takes off positive..
Will you describe reflective testing in DocBase? I can at this time not wrap my head around how to write one, but would like to participate at some point, if there are some basic examples somewhere. I imagine that this will be an ongoing process for a long time.
One still has limited testing.
If you can define each particular.
"To define is to limit."
Baby steps. ;-) Patience. Persistence. A little guru meditation.
BH: profiles could be gathered from existing published code libraries, if someone wants to do that. (Feels like a flashback of RISC days.)
N: Yes, tapping is a simple "logic test", a true or false. The vocal test was necessary to set the rough gain level on the board. Also, a tap test won't show certain types of distortion (dead battery, loose connection, impedance mismatch.)
H: I'll write a bit more soon on it, and the new R3 works better for it with a few new functions like spec-of, words-of, etc.
EA: Yes, limited, but even a bit more testing is a bit more. Define... well, it defines itself actually, in this case. Rest sounds oddly familiar.
Before anyone spends too much time on an engine to generate tests, ping me. I've made a start on this.
I also think it would be very cool to have a tool that people could run, that would select N random tests from the entire suite, and as them whether they think the answer is right, or what the answer should be. It would be like a puzzle. People could do a few and submit their results, tests are vetted, we find out where there is consensus...or not, it's fun, and it shows the power of REBOL.
Currently, we have a 1 page reflective test that generates about 30'000 test vectors. Of that about 20% are exceptions, but those are also valid tests.
Gregg: I'll have to revisit your tests... and see how we can integrate them.
What I mean there--and I should have elaborated more--focused on a happy relief (i.e., "at least it is a limited amount to test"; or "only so much ground to cover").
funny all this testing stuff... I *just* wrote an engine which allows one to store tests within a function body invisibly (for R2)..
you simply store a block as the first thing within the func body which stores a list of unit tests to perform.
A simple automated engine, scans all functions of a context, testing all funcs which have unit tests within.
it traps eval error, allows timed iterations to detect speed regression, but most of all, the tests are local to the functions, so there is no need for external files, and the tests follow the func wherever it ends up :-)
as it goes, you get a nice report (VERY easy to rapidly scan and read).
next step is storing the results in comparative log files so you can draw charts of the results. I also tought of building image maps of the results, but this will take for more time to build than the testing "suite" itself.
the whole engine is under 2k, basically 2 funcs, anyone interested, I'll be happy to share. :-)
Post a Comment:
You can post a comment here. Keep it on-topic.