What is Validation?

by Guest 9 May 2005 14:00

Some questions keep popping up about validation and I thought I'd try to clarify it a bit.

What is Validation

I've never seen a good definition for what validation in IS is supposed to be or what it's supposed to do anywhere. There is a lot of discussion about what components do when validating, but before launching into a discussion about it, I thought it would be helpful to give a little history and explain a little of the philosophy behind validation. This will likely be better than a straight definition because, hopefully, you'll understand the evolution and reasoning behind it rather than just a static description.

Early Design Iterations

Early on, I remember walking into perhaps the first design review of what was later to become the runtime. Among all the proposals in that review, there was a new method on the task interfaces called Custom or something similar. (Can't remember, after all, it _has_ been 5 years now) Nobody liked the name but everyone liked the concept. The idea was that it would be a method that the runtime would call "sometime" before execution to give the task a chance to "do stuff" before executing. Pretty vague, yes. I wished all ideas and designs would just spring into my mind fully formed on the first day of a project. Just doesn't seem to happen that way. If you know of anyone that does this, let us know. We're hiring. :)

As I recall, Gert Draper, who was our PUM at that time suggested that we call it Validate. Hmmm, has a nice ring to it. Makes sense, because, typically, what folks want to do before executing is make sure that execution will succeed. Gert asked why we need to pass in variables, and connections etc. Well, I said, somewhat unsure of my answer. After all, this was Gert. :) "Well, because the tasks may need to use them for the work they do during validation." Knees knocking, hands shaking, teeth chattering. Gert says, "Oh, OK". Boom? A new method is born and with that the start of a new concept that needed to be developed. It's interesting how one little old method can spawn multiple discussions and philosophical discussions.

Validation caused a whole lot of confusion and discussion. We talked about "deep validation" vs. "light validation". That debate raged for months. What type of validation should components do? We talked about what other components should do. At the time, we still didn't have the notion of extensible log providers, connections or enumerators. Later, when those were "componentized", we only passed variables. The logic went, "Why would anyone need connections for a log provider?" Hmmmm...

The light validation discussion was very interesting. A lot of talk about how to indicate to the task that the validation was just preliminary vs. pre-execution validation etc. The thought being that you wouldn't want to validate a task strictly if the task was being configured or had property mappings etc. We didn't have the notion of warnings and information events either. Only errors which were inflexible and for purposes of validation, terminating. Out of these discussions grew the notions of Warnings, later came information events. Eventually, we scrapped the whole idea of light validation vs. deep validation. Warnings enabled tasks to give information and yet still say they could function in spite of the issue they were reporting. Slowly we migrated to a point where validation was just validation. No variants. Then we arrived at the definition we have today.

Validation is what a component does to detect any issues that would cause it to fail during execution.

This led to a few other "rules":

  • When a component validates, it should always validate that given it's current property settings, it will succeed during execution.
  • Components should not return when they find the first error. They should continue to validate until finding all errors. This allows for a better picture when the whole error stack of errors is visible.
  • Components should emit warnings for cases where the error is not fatal, but could cause problems. For example, when the send mail task doesn't have a subject. Non-terminative errors.
  • Some others I can't remember right now...

Now the problem is, when do components get validated? Essentially we look at validation in a number of different ways depending on the situation. If you're designing a package, you'd like to know that it has errors during design, not when you go to execute it. So, the designer takes some liberties here and validates components whenever the UI for the component is modified. When a package is opened, the designer validates it as well. These are design time validations that should not be confused with execution time validations. These are done to ensure that the package writer is alerted ASAP to problems with the package.

Execution Validation

Execution validation is perhaps where most folks get tripped up. Execution time validation happens at two key points. When the package is executed and when the runtime executes tasks in the package. In the designer, this can be confusing. Because it's validating all the time and because it appears that the package is running in the same process as the designer, it seems there is no clear distinction between design time and execution time validation. But the designer doesn't run the package in it's own process. It actually runs it out of process (for a number of reasons I won't go into here). So a host process loads the package and executes it. When the designer calls Execute() on the package, the package validates. Everything in the package gets validated from the package down to the containers to the components. This is general validation. Then, when the runtime calls execute on each task, the TaskHost calls validate again. This is component validation.

Early vs. Late Validation

Why? This is the confusing part. Since we want folks to always do strict validation, (remember no light validation), and since packages have the notion of late configuration or dynamic configuration via property expressions and foreach loops etc., and since we make the assumption that you would rather have a task fail validation then to corrupt data or otherwise fail execution in a destructive way, we validate twice. We validate the whole package and we validate each individual task right before execution. In fact, if you were to look at the TaskHost code, you'd see something like this:

ExecResult Execute(parameters)
{
ExecResult = Task.Validate(Parameters)
if(FAILED(ExecResult))
return ExecResult;
return Task.Execute(Parameters)
}

Now, some tasks aren't going to be ready to execute when the package Execute() method gets called. They may rely on a variable value that gets set by another task further up the package call chain. They may be waiting for a file to be generated or dropped etc. So, if there was no way to validate tasks after Execute() gets called on the package, the package would never run successfully. However, the runtime needs to know when this is the case. There is no way for the runtime to know when to validate early or late. Enter the "DelayValidation" property.

DelayValidation

This property tells the runtime, "Don't validate me until the very last instant". This property is on all containers and all hosted objects. It's a simple flag. When the early, package level, validation happens, the runtime checks this flag. If set to true, the runtime skips validating that part of the package. Think scope here. If there is a container with multiple children containers with multiple grandchildren containers, and that container has set DelayValidation to true, none of it get's validated early. The whole thing gets skipped.

Late Validation

Later in the package execution, the runtime will call the individual Execute() method for tasks. Then the runtime will call the Validate() method of the task implicitly when it calls the Execute() method. Now, if the task fails validation, you're sure that it would have also failed execution because there is very little chance that anything will change between the time that the runtime calls Validate() and Execute().

Validation is important because it gives early warning of critical issues. It's the way all the nifty little icons pop up in the designer when there's an error. When you move your mouse cursor over the little red x in a task, and the error message shows up in the tooltip? That's the result of validation. Validation keeps your system safer because components check to ensure that the operation will be successful, or at least has a good chance of success before performing potentially invasive or damaging operations. Validation is to packages what compiling is to source code.

Reproduced by kind permission of Kirk Haselden (Microsoft). 

Comments (1) -

5/16/2010 10:29:40 AM #

Dmitry

Hi Guest,

Interesting article. But I would ask is there any way to disable validation at all? There are some bugs in validation that break our data - in some cases we seem not to be able to avoid it runs actual query instead of validating it. I could validate the tasks by running them one-by-one from Visual Studio and then I would like not to validate then at all. Is it possible?

Thanks,
Dmitry

Dmitry Russia

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

RecentComments

Comment RSS