Detailed explanation of RowFilter of HBase Filter

Posted by jonners on Tue, 05 May 2020 08:00:28 +0200

**This paper introduces the use of the Java & shell API of HBase RowFilter in detail, and posts the relevant sample code for reference. RowFilter filters based on row keys. When it comes to data filtering through HBase Rowkey, you can consider using it. For details and principle of comparator, please refer to the previous revision: Comparator principle and source code learning of HBase Filter

One. Java Api

Header code

public class RowFilterDemo {

    private static boolean isok = false;
    private static String tableName = "test";
    private static String[] cfs = new String[]{"f"};
    private static String[] data = new String[]{"row-ac:f:c1:v1", "row-ab:f:c2:v2", "row-bc:f:c3:v3", "row-abc:f:c4:v4"};

    public static void main(String[] args) throws IOException {

        MyBase myBase = new MyBase();
        Connection connection = myBase.createConnection();
        if (isok) {
            myBase.deleteTable(connection, tableName);
            myBase.createTable(connection, tableName, cfs);
            myBase.putRows(connection, tableName, data); // Manufacturing data
        }
        Table table = connection.getTable(TableName.valueOf(tableName));
        Scan scan = new Scan();

Middle code Swipe the scroll bar to the right to see the output.

1. BinaryComparator construction filter

        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("row-ac"))); // [row-ac]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes("row-ac"))); // [row-ab, row-abc, row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.GREATER, new BinaryComparator(Bytes.toBytes("row-ac"))); // [row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryComparator(Bytes.toBytes("row-ac"))); // [row-ac, row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.LESS, new BinaryComparator(Bytes.toBytes("row-ac"))); // [row-ab, row-abc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryComparator(Bytes.toBytes("row-ac"))); // [row-ab, row-abc, row-ac]

2. Binaryprefixcompator construction filter

        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryPrefixComparator(Bytes.toBytes("row-a"))); // [row-ab, row-abc, row-ac]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.NOT_EQUAL, new BinaryPrefixComparator(Bytes.toBytes("row-a"))); // [row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.GREATER, new BinaryPrefixComparator(Bytes.toBytes("row-a"))); // [row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.GREATER_OR_EQUAL, new BinaryPrefixComparator(Bytes.toBytes("row-a"))); // [row-ab, row-abc, row-ac, row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.LESS, new BinaryPrefixComparator(Bytes.toBytes("row-a"))); // []
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.LESS_OR_EQUAL, new BinaryPrefixComparator(Bytes.toBytes("row-a"))); // [row-ab, row-abc, row-ac]

3. Substring comparator constructs filter

        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new SubstringComparator("ab")); // [row-ab, row-abc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.NOT_EQUAL, new SubstringComparator("ab")); // [row-ac, row-bc]

4. RegexStringComparator constructs filter

        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.NOT_EQUAL, new RegexStringComparator("abc")); // [row-ab, row-ac, row-bc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("abc")); // [row-abc]
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("a")); // [row-ab, row-abc, row-ac]

5. NullComparator construction filter

        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new NullComparator()); // []
        RowFilter rowFilter = new RowFilter(CompareFilter.CompareOp.NOT_EQUAL, new NullComparator()); // [row-ab, row-abc, row-ac, row-bc]

Tail code

        scan.setFilter(rowFilter);
        ResultScanner scanner = table.getScanner(scan);
        Iterator<Result> iterator = scanner.iterator();
        LinkedList<String> rowkeys = new LinkedList<>();
        while (iterator.hasNext()) {
            Result result = iterator.next();
            String rowkey = Bytes.toString(result.getRow());
            rowkeys.add(rowkey);
        }
        System.out.println(rowkeys);
        scanner.close();
        table.close();
        connection.close();
    }
}

Two. Shell Api

1. BinaryComparator construction filter

Mode 1:

hbase(main):006:0> scan 'test',{FILTER=>"RowFilter(=,'binary:row-ab')"}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
1 row(s) in 0.0140 seconds

The supported comparison operators: =! = > and > = < and =, no more examples.

Mode 2:

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.filter.RowFilter

hbase(main):016:0> scan 'test',{FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), BinaryComparator.new(Bytes.toBytes('row-ab')))}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
1 row(s) in 0.0310 seconds

Supported comparison operators: LESS, LESS or EQUAL, EQUAL, not EQUAL, grader, grader or EQUAL. No more examples.

Recommended use mode 1, more concise and convenient.

2. Binaryprefixcompator construction filter

Mode 1:

hbase(main):023:0> scan 'test',{FILTER=>"RowFilter(=,'binaryprefix:row-ab')"}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
 row-abc                                         column=f:c4, timestamp=1588156704669, value=v4                                                                                                
2 row(s) in 0.0360 seconds

Mode 2:

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.BinaryPrefixComparator
import org.apache.hadoop.hbase.filter.RowFilter

hbase(main):027:0> scan 'test',{FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), BinaryPrefixComparator.new(Bytes.toBytes('row-ab')))}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
 row-abc                                         column=f:c4, timestamp=1588156704669, value=v4                                                                                                
2 row(s) in 0.0110 seconds

Others as above.

3. Substring comparator constructs filter

Mode 1:

hbase(main):001:0> scan 'test',{FILTER=>"RowFilter(=,'substring:row-ab')"}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
 row-abc                                         column=f:c4, timestamp=1588156704669, value=v4                                                                                                
2 row(s) in 0.3200 seconds

Mode 2:

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.filter.RowFilter

hbase(main):007:0> scan 'test',{FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), SubstringComparator.new('row-ab'))}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
 row-abc                                         column=f:c4, timestamp=1588156704669, value=v4                                                                                                
2 row(s) in 0.0230 seconds

The difference is that the string is passed in directly for comparison, and only EQUAL and not even are supported.

4. RegexStringComparator constructs filter

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.RegexStringComparator
import org.apache.hadoop.hbase.filter.RowFilter

hbase(main):007:0> scan 'test',{FILTER => RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'), RegexStringComparator.new('row-ab'))}
ROW                                              COLUMN+CELL                                                                                                                                   
 row-ab                                          column=f:c2, timestamp=1588156704669, value=v2                                                                                                
 row-abc                                         column=f:c4, timestamp=1588156704669, value=v4                                                                                                
2 row(s) in 0.0230 seconds

The comparator directly passes in strings for comparison, and only supports EQUAL and not even comparators. If you want to use the first method, you can pass it in to regexstring. My version is a little low, which is not supported temporarily. I will not show it again.

Note that the regular match here refers to the inclusion relationship, corresponding to the underlying find() method.

In addition, RowFilter does not support the use of LongComparator comparator, and BitComparator and NullComparator comparator are rarely used, which will not be introduced.

For the full source code of the article, please visit the following GitHub address:

https://github.com/zhoupengbo/demos-bigdata/blob/master/hbase/hbase-filters-demos/src/main/java/com/zpb/demos/RowFilterDemo.java

Reprint please indicate the source! Welcome to my WeChat official account [HBase working notes]

Topics: Big Data HBase Apache Hadoop Java