Let's first explain a very simple biner. Bin is equidistant, so that the whole range of an attribute is divided into n intervals. Data points with attribute values in the k-th interval are considered to belong to the k-th bin. Therefore, the output is the original table and the box information is attached to each instance (i.e. row). This node also needs a dialog box, because the user should be able to determine the number of bins and specify the columns in which the values should be boxed.
Node model:
Before we begin to implement the actual box splitting algorithm in the execute method, we must define the fields we need in the NodeModel. (after creation, NodeModel already contains sample code that can be deleted). SettingsModel provides a convenient way to exchange settings from NodeModel to NodeDialog. As you will see later, NodeDialog is also used with SettingsModel, which is why we use them for columns where the number and value of bin should be boxed:
// Setting model of bin quantity Private end SettingsModelIntegerBounded m_numberOfBins = new SettingsModelIntegerBounded(NumericBinnerNodeModel.CFGKEY_NR_OF_BINS, NumericBinnerNodeModel.DEFAULT_NR_OF_BINS, 1, Integer.MAX_VALUE); // Setting model for storing columns in bin private final SettingsModelString m_column = new SettingsModelString( NumericBinnerNodeModel.CFGKEY_COLUMN_NAME, "");
In order to get settings from the dialog box, they must be written to the NodeSettings object. NodeSettings transfers settings from the dialog box to the model and vice versa. Each field requires a key to identify and retrieve it from NodeSettings. It is a good practice to define a static final string used as a NodeModel key.
/** bin Number of configuration keys.*/ public static final String CFGKEY_NR_OF_BINS = "numberOfbins"; /** Configuration key for the selected column.*/ public static final String CFGKEY_COLUMN_NAME = "columnName";
The settings are transferred from NodeModel to NodeDialog by implementing the validateSettings, loadValidatedSettings, and saveSettings methods. All of these methods can be safely delegated to SettingsModels. In the validateSettings method, it checks whether these values exist and are valid (for example, within the valid range, etc.).
/** * @see org.knime.core.node.NodeModel * #validateSettings(org.knime.core.node.NodeSettingsRO) */ @cover protected void validateSettings(Final NodeSettingsRO Settings) Throw InvalidSettingsException { // Delegate this to the setup model m_numberOfBins.validateSettings(settings); m_column.validateSettings(settings); }
When the loadValidatedSettings method is called, the settings have been validated and can be loaded into the local field, in this case the number of bin s and SettingsModels for the selected column.
/** * @see org.knime.core.node.NodeModel * #loadValidatedSettingsFrom(org.knime.core.node.NodeSettingsRO) */ @cover protected void loadValidatedSettingsFrom(final NodeSettingsRO Settings) Throw InvalidSettingsException { // Load the values in the model. // It is safe to assume that the settings are // The following method. m_numberOfBins.loadSettingsFrom(settings); m_column.loadSettingsFrom(settings); }
In the saveSettings method, the local field is written to the setting so that the dialog box displays the current value.
/** * @see org.knime.core.node.NodeModel * #saveSettingsTo(org.knime.core.node.NodeSettings) */ @cover protected void saveSettingsTo(final NodeSettingsWO Settings){ // Save the settings to the configuration object. m_numberOfBins.saveSettingsTo(settings); m_column.saveSettingsTo(settings); }
The above method is just a step to check whether the node can be executed with the current settings. It is also important to check whether it applies to the incoming data table. This is done through the configuration method. Once the input port is connected, the configuration method is executed. In our small example of the number combiner, a check is performed to see if at least one number column is available and if the incoming data table contains a column with the name of the selected column. Otherwise, the node is not executable. DataTableSpec contains the required information and passes it to the configure method.
/** * @see org.knime.core.node.NodeModel * #configure(org.knime.core.data.DataTableSpec[]) */ Protected DataTableSpec[] Configuration (final) DataTableSpec[] inSpecs) Throw InvalidSettingsException { // First, verify the incoming data table specification boolean hasNumericColumn = false; boolean containsName = false; for (int i = 0; i < inSpecs[IN_PORT].getNumColumns(); i++) { DataColumnSpec columnSpec = inSpecs[IN_PORT].getColumnSpec(i); // We can only use it if it contains at least one // Numeric column If (columnSpec.getType().isCompatible(DoubleValue.class)) { // A numeric column was found hasNumericColumn = true; } // If a column name is set, it must be included in the data // Table specification If (m_column != null && columnSpec.getName().equals(m_column.getStringValue())) { containsName = true; } } If (! hasNumericColumn){ throw new InvalidSettingsException("Input table must be included in" + "At least one column of numbers"); } If (! Includes name){ throw new InvalidSettingsException("The input table does not contain" + "column " + m_column.getStringValue() + " . Please (reconfigure) " + "node"); } // So far, the input has been checked and the algorithm can work with // incoming data ...
Just as we rely on the incoming data specification, subsequent nodes also need information about the data format, which is provided after execution. For this reason, the output specification of our node must also be created in the configuration method.
... // Now generate the output table specification, // That is, specify the output of this node DataColumnSpec newColumnSpec = createOutputColumnSpec(); // DataTableSpec for and additional parts DataTableSpec appendSpec = new DataTableSpec(newColumnSpec); // Because it is only additional, the new output specification contains two: // Original and additional specifications DataTableSpec outputSpec = new DataTableSpec(inSpecs[IN_PORT], Additional specifications); Return to new DataTableSpec[]{outputSpec}; ...
Since you must create a DataColumnSpec for the newly added column in the configure and execute methods, extract the code used to create the DataColumnSpec in a separate method:
Private data column specification createOutputColumnSpec() { // We're going to add a column with a bin number DataColumnSpecCreator colSpecCreator = new DataColumnSpecCreator( "Bin Number", IntCell.TYPE); // If we know the number of bin, we also know the possible number // The value of the new column DataColumnDomainCreator domainCreator = new DataColumnDomainCreator( new IntCell(0), new IntCell(m_numberOfBins.getIntValue() - 1)); // And this domain information can be added to the output specification colSpecCreator.setDomain(domainCreator.createDomain()); // You can now create a column specification DataColumnSpec newColumnSpec = colSpecCreator.createSpec(); Returns the new column specification; }
Once completed and implemented, the actual algorithm of equidistant box division can be written. The algorithm that operates on the data must be placed in the execute method. In this example, only one column is attached to the original data. For this purpose, the so-called column rearranger is used. It requires a CellFactory that returns additional cells for a given row.
... // Instantiated cell factory CellFactory cellFactory = new NumericBinnerCellFactory( createOutputColumnSpec(), splitPoints, colIndex); // Create column rearranger ColumnRearranger outputTable = new ColumnRearranger( inData[IN_PORT].getDataTableSpec()); // Add new column outputTable.append(cellFactory); ...
After the columnrealranger is created, it can be transferred to the ExecutionContext together with the input table to create a BufferedDataTable, which is returned by the execute method, that is, provided at the output port. Each node buffers data in BufferedDataTable. In order to avoid redundant buffering of the same data, column realranger is used. In this way, only additional columns are buffered in our node. This is why we must retrieve BufferedDataTable from ExecutionContext:
... // And create the actual output table BufferedDataTable bufferedOutput = exec.createColumnRearrangeTable( inData[IN_PORT], outputTable, exec); // Return it Return to new BufferedDataTable[]{bufferedOutput}; ...
For the purpose of CellFactory, it is necessary to implement a NumericBinnerCellFactory. This extends SingleCellFactory and implements only the getCell method. Check the passed rows to find out which bin contains the value from the selected column. It returns the bin number as a DataCell.
/** * @see org.knime.core.data.container.SingleCellFactory#getCell( * org.knime.core.data.DataRow) */ @cover public DataCell getCell(DataRow (row){ DataCell currCell = row.getCell(m_colIndex); // Check cells for missing values If( currCell.isMissing()){ return DataType.getMissingCell(); } double currValue = ((DoubleValue)currCell).getDoubleValue(); int binNr = 0; for (Double intervalBound : m_intervalUpperBounds) { If( currValue <= intervalBound){ Return to new IntCell(binNr); } binNr++; } return DataType.getMissingCell(); }
Node dialog box:
After creating the NumericBinnerNodeDialog, you will see that the constructor already contains some sample code. You can delete it and add the code of the required control element. For NumericBinnerNodeDialog, we need two GUI elements: one to set the number of bin and the other to select the column of binning. The KNIME framework provides a very convenient setting to apply standard dialog elements to NodeDialog. Therefore, your NumericBinnerNodeDialog extends defaultnodesettingspan by default. If the default dialog components are not suitable for your needs, for example, if some components should be enabled or disabled according to the user's settings, you can directly extend NodeDialogPane. In our example, we need to add a DialogComponentNumber representing the number of bin and a DialogComponentColumnSelection. Each component's constructor requires a new instance of SettingsModel. SettingsModel requires a string identifier, which is used to store and load the value of the component, and a default value that remains until the new value is loaded. Additional parameters are necessary, depending on the type of component. Loading and saving from settings to settings is performed automatically through the keys passed in the constructor. We recommend using the keys defined in the NodeModel. If you do, you must make it public at this time. Loading and saving from settings to settings is performed automatically through the keys passed in the constructor. We recommend using the keys defined in the NodeModel. If you do, you must make it public at this time. Loading and saving from settings to settings is performed automatically through the keys passed in the constructor. We recommend using the keys defined in the NodeModel. If you do, you must make it public at this time.
The public class NumericBinnerNodeDialog extends defaultnodesettingspan{
/** * A new pane of the configure NumericBinner node dialog box. * Contains control elements for adjusting the number of bin s * And select the columns to merge. * No warning here: This is inevitable because * Allow types to be passed as generic arrays. */ @SuppressWarnings("(unchecked) Protected NumericBinnerNodeDialog() { Excellent (); // bins controls the number of elements addDialogComponent(new DialogComponentNumber( new SettingsModelIntegerBounded( NumericBinnerNodeModel.CFGKEY_NR_OF_BINS, NumericBinnerNodeModel.DEFAULT_NR_OF_BINS, 1, Integer.MAX_VALUE), "Number of cases:", /*step*/ 1)); // Column to bin add to DialogComponent(new DialogComponentColumnNameSelection( New setup model string( NumericBinnerNodeModel.CFGKEY_COLUMN_NAME, "Select a column"), "Select the columns to bin", NumericBinnerNodeModel.IN_PORT, DoubleValue.class)); }
}
After creating a node and implementing NodeModel and NodeDialog, don't forget to edit the node description in the XML file (with the same name as NodeFactory). Describe your nodes, dialog settings, input and output ports, and later views. This is explained in detail in Section 8