Flink Sql With 1.14 Queries - Overview

Posted by thientanchuong on Wed, 23 Feb 2022 15:22:09 +0100

Inquiry #

The SELECT statement and VALUES statement are TableEnvironment This method takes the result of the SELECT statement (or VALUES statement) as a Table ATable can be used for Subsequent SQL and Table API queries，Convert to DataStream or Write TableSink . SQL and Table API queries can be seamlessly mixed, optimized as a whole and converted into a single program.

In order to access a table in an SQL query, it must Register in TableEnvironment . Can from TableSource,Table,CREATE TABLE statement,DataStream Registry. Alternatively, the user can Register directory in TableEnvironment To specify the location of the data source.

For convenience, Table Tostring() automatically registers TableEnvironment under its unique name and returns that name. Therefore, Table can inline objects directly into SQL queries, as shown in the following example.

Note: queries containing unsupported SQL functions will result in TableException The following sections list the SQL functions supported on batch and stream tables.

Specify query #

The following example shows how to specify SQL queries on the registry and inline tables.

Java

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env);

// ingest a DataStream from an external source
DataStream<Tuple3<Long, String, Integer>> ds = env.addSource(...);

// SQL query with an inlined (unregistered) table
Table table = tableEnv.fromDataStream(ds, $("user"), $("product"), $("amount"));
Table result = tableEnv.sqlQuery(
  "SELECT SUM(amount) FROM " + table + " WHERE product LIKE '%Rubber%'");

// SQL query with a registered table
// register the DataStream as view "Orders"
tableEnv.createTemporaryView("Orders", ds, $("user"), $("product"), $("amount"));
// run a SQL query on the Table and retrieve the result as a new Table
Table result2 = tableEnv.sqlQuery(
  "SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'");

// create and register a TableSink
final Schema schema = Schema.newBuilder()
    .column("product", DataTypes.STRING())
    .column("amount", DataTypes.INT())
    .build();

final TableDescriptor sinkDescriptor = TableDescriptor.forConnector("filesystem")
    .schema(schema)
    .format(FormatDescriptor.forFormat("csv")
        .option("field-delimiter", ",")
        .build())
    .build();

tableEnv.createTemporaryTable("RubberOrders", sinkDescriptor);

// run an INSERT SQL on the Table and emit the result to the TableSink
tableEnv.executeSql(
  "INSERT INTO RubberOrders SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'");

Python

env = StreamExecutionEnvironment.get_execution_environment()
table_env = StreamTableEnvironment.create(env)

# SQL query with an inlined (unregistered) table
# elements data type: BIGINT, STRING, BIGINT
table = table_env.from_elements(..., ['user', 'product', 'amount'])
result = table_env \
    .sql_query("SELECT SUM(amount) FROM %s WHERE product LIKE '%%Rubber%%'" % table)

# create and register a TableSink
schema = Schema.new_builder()
    .column("product", DataTypes.STRING())
    .column("amount", DataTypes.INT())
    .build()

sink_descriptor = TableDescriptor.for_connector("filesystem")
    .schema(schema)
    .format(FormatDescriptor.for_format("csv")
        .option("field-delimiter", ",")
        .build())
    .build()

t_env.create_temporary_table("RubberOrders", sink_descriptor)

# run an INSERT SQL on the Table and emit the result to the TableSink
table_env \
    .execute_sql("INSERT INTO RubberOrders SELECT product, amount FROM Orders WHERE product LIKE '%Rubber%'")

Back to the top

Execute query #

You can use this tableenvironment The executesql () method executes a SELECT statement or VALUES statement to collect content locally. This method takes the result of the SELECT statement (or VALUES statement) as tableresult Similar to the SELECT statement, a Table can use this Table The execute () method executes the object and collects the contents of the query to the local client. TableResult. The collect () method returns a closable row iterator. The selection job will not complete unless all result data has been collected. We should actively close the job to avoid leaking resources through this CloseableIterator#close() method. TableResult.print() we can also print the selection result to the client console through this method. The result data tableresult in can only be accessed once. Therefore, collect() cannot call print() to each other.

TableResult.collect() and tableresult Print () has slightly different behavior under different checkpoint settings (to enable checkpointing for streaming jobs, see Checkpoint configuration).

There are no checkpoints for job stream or job. Result Collect() is tableresult Print () is not guaranteed once or at least once. Once the query result is generated, the client can access it immediately, but an exception will be thrown when the job fails and restarts.
For streaming jobs with one-time checkpoints, tableresult Collect() and tableresult Print () ensures end-to-end one-time record delivery. The client can access the results only after the corresponding checkpoint is completed.
For streaming jobs with at least one checkpoint, tableresult Collect() and tableresult Print () guarantees at least one record delivery from end to end. Once the query result is generated, the client can access it immediately, but the same result may be delivered multiple times.

Java

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env, settings);

tableEnv.executeSql("CREATE TABLE Orders (`user` BIGINT, product STRING, amount INT) WITH (...)");

// execute SELECT statement
TableResult tableResult1 = tableEnv.executeSql("SELECT * FROM Orders");
// use try-with-resources statement to make sure the iterator will be closed automatically
try (CloseableIterator<Row> it = tableResult1.collect()) {
    while(it.hasNext()) {
        Row row = it.next();
        // handle row
    }
}

// execute Table
TableResult tableResult2 = tableEnv.sqlQuery("SELECT * FROM Orders").execute();
tableResult2.print();

Python

env = StreamExecutionEnvironment.get_execution_environment()
table_env = StreamTableEnvironment.create(env, settings)
# enable checkpointing
table_env.get_config().get_configuration().set_string("execution.checkpointing.mode", "EXACTLY_ONCE")
table_env.get_config().get_configuration().set_string("execution.checkpointing.interval", "10s")

table_env.execute_sql("CREATE TABLE Orders (`user` BIGINT, product STRING, amount INT) WITH (...)")

# execute SELECT statement
table_result1 = table_env.execute_sql("SELECT * FROM Orders")
table_result1.print()

# execute Table
table_result2 = table_env.sql_query("SELECT * FROM Orders").execute()
table_result2.print()

Back to the top

Grammar #

Flink uses SQL that supports standard ANSI Apache compute parses SQL.

The following BNF syntax describes a superset of SQL functions supported in batch and streaming queries. operation The section shows examples of supported features and indicates which features support only batch or streaming queries.

grammar ↕

Flink SQL uses a lexical strategy similar to Java identifiers (tables, attributes, function names):

The case of the identifier is preserved regardless of whether the identifier is referenced or not.
After that, the identifier is case sensitive.
Unlike Java, backquotes allow identifiers to contain non alphanumeric characters (for example). “SELECT a AS my field FROM t”

String literals must be enclosed in single quotes (for example, SELECT 'Hello World'). Copy single quotes for escape (for example, SELECT 'It''s me ').

<span style="color:#000000"><span style="background-color:#ffffff"><code class="language-text">Flink SQL> SELECT 'Hello World', 'It''s me';
+-------------+---------+
|      EXPR$0 |  EXPR$1 |
+-------------+---------+
| Hello World | It's me |
+-------------+---------+
1 row in set
</code></span></span>

Unicode characters are supported in string literals. If you need explicit Unicode code points, use the following syntax: