About Classification

Supervised Classification

Working with the Sample Editor

The Sample Editor window is the principal tool for inputting samples. For a selected class, it shows histograms of selected features of samples in the currently active map. The same values can be displayed for all image objects at a certain level or all levels in the image object hierarchy.

You can use the Sample Editor window to compare the attributes or histograms of image objects and samples of different classes. It is helpful to get an overview of the feature distribution of image objects or samples of specific classes. The features of an image object can be compared to the total distribution of this feature over one or all image object levels.

Use this tool to assign samples using a Nearest Neighbor classification or to compare an image object to already existing samples, in order to determine to which class an image object belongs. If you assign samples, features can also be compared to the samples of other classes. Only samples of the currently active map are displayed.

  1. Open the Sample Editor window using Classification > Samples > Sample Editor from the main menu
  2. By default, the Sample Editor window shows diagrams for only a selection of features. To select the features to be displayed in the Sample Editor, right-click in the Sample Editor window and select Select Features to Display
  3. In the Select Displayed Features dialog box, double-click a feature from the left-hand pane to select it. To remove a feature, click it in the right-hand pane
  4. To add the features used for the Standard Nearest Neighbor expression, select Display Standard Nearest Neighbor Features from the context menu.


The Sample Editor window compares the Active Class and Compare Class with the red arrow indicating the feature value of a selected image object

Comparing Features

To compare samples or layer histograms of two classes, select the classes or the levels you want to compare in the Active Class and Compare Class lists.

Values of the active class are displayed in black in the diagram, the values of the compared class in blue. The value range and standard deviation of the samples are displayed on the right-hand side.

Viewing the Value of an Image Object

When you select an image object, the feature value is highlighted with a red pointer. This enables you to compare different objects with regard to their feature values. The following functions help you to work with the Sample Editor:

In addition, the Sample Editor window allows you to generate membership functions. The following options are available:

Selecting Samples

A Nearest Neighbor classification needs training areas. Therefore, representative samples of image objects need to be collected.

  1. To assign sample objects, activate the input mode. Choose Classification > Samples > Select Samples from the main menu bar. The map view changes to the View Samples mode.
  2. To open the Sample Editor window, which helps to gather adequate sample image objects, do one of the following:
    • Choose Classification > Samples > Sample Editor from the main menu.
    • Choose View > Sample Editor from the main menu.
  3. To select a class from which you want to collect samples, do one of the following:
    • Select the class in the Class Hierarchy window if available.
      **Select the class from the Active Class drop-down list in the Sample Editor window.
      This makes the selected class your active class so any samples you collect will be assigned to that class.
  4. To define an image object as a sample for a selected class, double-click the image object in the map view. To undo the declaration of an object as sample, double-click it again. You can select or deselect multiple objects by holding down the Shift key.
    As long as the sample input mode is activated, the view will always change back to the Sample View when an image object is selected. Sample View displays sample image objects in the class color; this way the accidental input of samples can be avoided.
  5. To view the feature values of the sample image object, go to the Sample Editor window. This enables you to compare different image objects with regard to their feature values.
  6. Click another potential sample image object for the selected class. Analyze its membership value and its membership distance to the selected class and to all other classes within the feature space. Here you have the following options:
    • The potential sample image object includes new information to describe the selected class: low membership value to selected class, low membership value to other classes.
    • The potential sample image object is really a sample of another class: low membership value to selected class, high membership value to other classes.
    • The potential sample image object is needed as sample to distinguish the selected class from other classes: high membership value to selected class, high membership value to other classes.
      In the first iteration of selecting samples, start with only a few samples for each class, covering the typical range of the class in the feature space. Otherwise, its heterogeneous character will not be fully considered.
  7. Repeat the same for remaining classes of interest.
  8. Classify the scene.
  9. The results of the classification are now displayed in the map view. In the View Settings dialog box, the mode has changed from Samples to Classification.
  10. Note that some image objects may have been classified incorrectly or not at all. All image objects that are classified are displayed in the appropriate class color. If you hover the cursor over a classified image object, a tool -tip pops up indicating the class to which the image object belongs, its membership value, and whether or not it is a sample image object. Image objects that are unclassified appear transparent. If you hover over an unclassified object, a tool-tip indicates that no classification has been applied to this image object. This information is also available in the Classification tab of the Image Object Information window.
  11. The refinement of the classification result is an iterative process:
    • First, assess the quality of your selected samples
    • Then, remove samples that do not represent the selected class well and add samples that are a better match or have previously been misclassified
    • Classify the scene again
    • Repeat this step until you are satisfied with your classification result.
  12. When you have finished collecting samples, remember to turn off the Select Samples input mode. As long as the sample input mode is active, the viewing mode will automatically switch back to the sample viewing mode, whenever an image object is selected. This is to prevent you from accidentally adding samples without taking notice.


Map view with selected samples in View Samples mode. (Image data courtesy of Ministry of Environmental Affairs of Sachsen-Anhalt, Germany.)

Assessing the Quality of Samples

Once a class has at least one sample, the quality of a new sample can be assessed in the Sample Selection Information window. It can help you to decide if an image object contains new information for a class, or if it should belong to another class.


The Sample Selection Information window.
  1. To open the Sample Selection Information window choose Classification > Samples > Sample Selection Information or View > Sample Selection Information from the main menu
  2. Names of classes are displayed in the Class column. The Membership column shows the membership value of the Nearest Neighbor classifier for the selected image object
  3. The Minimum Dist. column displays the distance in feature space to the closest sample of the respective class
  4. The Mean Dist. column indicates the average distance to all samples of the corresponding class
  5. The Critical Samples column displays the number of samples within a critical distance to the selected class in the feature space
  6. The Number of Samples column indicates the number of samples selected for the corresponding class.
    The following highlight colors are used for a better visual overview:
    • Gray: Used for the selected class.
    • Red: Used if a selected sample is critically close to samples of other classes in the feature space.
    • Green: Used for all other classes that are not in a critical relation to the selected class.

The critical sample membership value can be changed by right-clicking inside the window. Select Modify Critical Sample Membership Overlap from the context menu. The default value is 0.7, which means all membership values higher than 0.7 are critical.


The Modify Threshold dialog box

To select which classes are shown, right-click inside the dialog box and choose Select Classes to Display.

Navigating Samples

To navigate to samples in the map view, select samples in the Sample Editor window to highlight them in the map view.

  1. Before navigating to samples you must select a class in the Select Sample Information dialog box.
  2. To activate Sample Navigation, do one of the following:
    • Choose Classification > Samples > Sample Editor Options > Activate Sample Navigation from the main menu
    • Right-click inside the Sample Editor and choose Activate Sample Navigation from the context menu.
  3. To navigate samples, click in a histogram displayed in the Sample Editor window. A selected sample is highlighted in the map view and in the Sample Editor window.
  4. If there are two or more samples so close together that it is not possible to select them separately, you can use one of the following:
    • Select a Navigate to Sample button.
    • Select from the sample selection drop-down list.


For sample navigation choose from a list of similar samples

Deleting Samples

Training and Test Area Masks

Existing samples can be stored in a file called a training and test area (TTA) mask, which allows you to transfer them to other scenes.

To allow mapping samples to image objects, you can define the degree of overlap that a sample image object must show to be considered within in the training area. The TTA mask also contains information about classes for the map. You can use these classes or add them to your existing class hierarchy.

Creating and Saving a TTA Mask


The Create TTA Mask from Samples dialog box
  1. From the main menu select Classification > Samples > Create TTA Mask from Samples
  2. In the dialog box, select the image object level that contains the samples that you want to use for the TTA mask. If your samples are all in one image object level, it is selected automatically and cannot be changed
  3. Click OK to save your changes. Your selection of sample image objects is now converted to a TTA mask
  4. To save the mask to a file, select Classification > Samples > Save TTA Mask. Enter a file name and select your preferred file format.

Loading and Applying a TTA Mask

To load samples from an existing Training and Test Area (TTA) mask:


Apply TTA Mask to Level dialog box
  1. From the main menu select Classification > Samples > Load TTA Mask.
  2. In the Load TTA Mask dialog box, select the desired TTA Mask file and click Open.
  3. In the Load Conversion Table dialog box, open the corresponding conversion table file. The conversion table enables mapping of TTA mask classes to existing classes in the currently displayed map. You can edit the conversion table.
  4. Click Yes to create classes from the conversion table. If your map already contains classes, you can replace them with the classes from the conversion file or add them. If you choose to replace them, your existing class hierarchy will be deleted.
    If you want to retain the class hierarchy, you can save it to a file.
  5. Click Yes to replace the class hierarchy by the classes stored in the conversion table.
  6. To convert the TTA Mask information into samples, select Classification > Samples > Create Samples from TTA Mask. The Apply TTA Mask to Level dialog box opens.
  7. Select which level you want to apply the TTA mask information to. If the project contains only one image object level, this level is preselected and cannot be changed.
  8. In the Create Samples dialog box, enter the Minimum Overlap for Sample Objects and click OK.
    The default value is 0.75. Since a single training area of the TTA mask does not necessarily have to match an image object, the minimum overlap decides whether an image object that is not 100% within a training area in the TTA mask should be declared a sample.
    The value 0.75 indicates that 75% of an image object has to be covered by the sample area for a certain class given by the TTA mask in order for a sample for this class to be generated.
    The map view displays the original map with sample image objects selected where the test area of the TTA mask have been.

The Edit Conversion Table

You can check and edit the linkage between classes of the map and the classes of a Training and Test Area (TTA) mask.

You must edit the conversion table only if you chose to keep your existing class hierarchy and used different names for the classes. A TTA mask has to be loaded and the map must contain classes.


Edit Conversion Table dialog box
  1. To edit the conversion table, choose Classification > Samples > Edit Conversion Table from the main menu
  2. The Linked Class list displays how classes of the map are linked to classes of the TTA mask. To edit the linkage between the TTA mask classes and the classes of the current active map, right-click a TTA mask entry and select the appropriate class from the drop-down list
  3. Choose Link by name to link all identical class names automatically. Choose Unlink all to remove the class links.

Creating Samples Based on a Shapefile

You can use shapefiles to create sample image objects. A shapefile, also called an ESRI shapefile, is a standardized vector file format used to visualize geographic data. You can obtain shapefiles from other geo applications or by exporting them from eCognition maps. A shapefile consists of several individual files such as .shx, .shp and .dbf.

To provide an overview, using a shapefile for sample creation comprises the following steps:

Creating the Samples


Edit a process to use a shapefile
Add a shapefile to an existing project
Add a Parent Process
Add segmentation Child Process

The segmentation finds all objects of the shapefile and converts them to image objects in the thematic layer.

Classify objects using shapefile information

The child process identifies image objects using information from the thematic layer – use the threshold classifier and a feature created from the thematic layer attribute table, for example ‘Image Object ID’ or ‘Class’ from a shapefile ‘Thematic Layer 1’

Converting Objects to samples


Process to import samples from shapefile

Selecting Samples with the Sample Brush

The Sample Brush is an interactive tool that allows you to use your cursor like a brush, creating samples as you sweep it across the map view. Go to the Sample Editor toolbar (View > Toolbars > Sample Editor) and press the Select Sample button. Right-click on the image in map view and select Sample Brush.

Drag the cursor across the scene to select samples. By default, samples are not reselected if the image objects are already classified but existing samples are replaced if drag over them again. These settings can be changed in the Sample Brush group of the Options dialog box. To deselect samples, press Shift as you drag.

The Sample Brush will select up to one hundred image objects at a time, so you may need to increase magnification if you have a large number of image objects.

Setting the Nearest Neighbor Function Slope

The Nearest Neighbor Function Slope defines the distance an object may have from the nearest sample in the feature space while still being classified. Enter values between 0 and 1. Higher values result in a larger number of classified objects.

  1. To set the function slope, choose Classification > Nearest Neighbor > Set NN Function Slope from the main menu bar.
  2. Enter a value and click OK.


The Set Nearest Neighbor Function Slope dialog box

Using Class-Related Features in a Nearest Neighbor Feature Space

To prevent non-deterministic classification results when using class-related features in a nearest neighbor feature space, several constraints have to be mentioned:

Supervised Classification Algorithms (former name 'Classifier')

Overview

The supervised classification algorithm allows classifying based on different statistical classification algorithms:

The algorithm can be applied either pixel- or object-based. For an examples project containing please refer to the eCognition Knowledge-Base and Tutorials.

Bayes

A Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem (from Bayesian statistics) with strong independence assumptions. In simple terms, a Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4” in diameter. Even if these features depend on each other or upon the existence of the other features, a Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple. An advantage of the naive Bayes classifier is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

KNN (K Nearest Neighbor)

The k-nearest neighbor algorithm (k-NN) is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). The 5-nearest-neighbor classification rule is to assign to a test sample the majority class label of its 5 nearest training samples. If k = 1, then the object is simply assigned to the class of its nearest neighbor.

This means k is the number of samples to be considered in the neighborhood of an unclassified object/pixel. The best choice of k depends on the data: larger values reduce the effect of noise in the classification, but the class boundaries are less distinct.

eCognition software has the Nearest Neighbor implemented as a classifier that can be applied using the algorithm classifier (KNN with k=1) or using the concept of classification based on the Nearest Neighbor Classification.

SVM (Support Vector Machine)

A support vector machine (SVM) is a concept in computer science for a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. The standard SVM takes a set of input data and predicts, for each given input, which of two possible classes the input is a member of. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are pided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. Support Vector Machines are based on the concept of decision planes defining decision boundaries. A decision plane separates between a set of objects having different class memberships.

Important parameters for SVM

There are different kernels that can be used in Support Vector Machines models. Included in eCognition are linear and radial basis function (RBF). The RBF is the most popular choice of kernel types used in Support Vector Machines. Training of the SVM classifier involves the minimization of an error function with C as the capacity constant.

Decision Tree (CART resp. classification and regression tree)

Decision tree learning is a method commonly used in data mining where a series of decisions are made to segment the data into homogeneous subgroups. The model looks like a tree with branches - while the tree can be complex, involving a large number of splits and nodes. The goal is to create a model that predicts the value of a target variable based on several input variables. A tree can be “learned” by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions. The purpose of the analyses via tree-building algorithms is to determine a set of if-then logical (split) conditions.

Important Decision Tree parameters

The minimum number of samples that are needed per node are defined by the parameter Min sample count. Finding the right sized tree may require some experience. A tree with too few of splits misses out on improved predictive accuracy, while a tree with too many splits is unnecessarily complicated. Cross validation exists to combat this issue by setting eCognitions parameter Cross validation folds. For a cross-validation the classification tree is computed from the learning sample, and its predictive accuracy is tested by test samples. If the costs for the test sample exceed the costs for the learning sample this indicates poor cross-validation and that a different sized tree might cross-validate better.

Random Trees

The random trees classifier is more a framework that a specific model. It uses an input feature vector and classifies it with every tree in the forest. It results in a class label of the training sample in the terminal node where it ends up. This means the label is assigned that obtained the majority of "votes". Iterating this over all trees results in the random forest prediction. All trees are trained with the same features but on different training sets, which are generated from the original training set. This is done based on the bootstrap procedure: for each training set the same number of vectors as in the original set ( =N ) is selected. The vectors are chosen with replacement which means some vectors will appear more than once and some will be absent. At each node not all variables are used to find the best split but a randomly selected subset of them. For each node a new subset is construced, where its size is fixed for all the nodes and all the trees. It is a training parameter, set to . None of the trees that are built are pruned.

In random trees the error is estimated internally during the training. When the training set for the current tree is drawn by sampling with replacement, some vectors are left out. This data is called out-of-bag data - in short "oob" data. The oob data size is about N/3. The classification error is estimated based on this oob-data.