Source plugin : File [Flink]
Read data from the file system
name | type | required | default value |
---|---|---|---|
format.type | string | yes | - |
path | string | yes | - |
schema | string | yes | - |
common-options | string | no | - |
parallelism | int | no | - |
The format for reading files from the file system, currently supports csv
, json
, parquet
, orc
and text
.
The file path is required. The hdfs file
starts with hdfs://
, and the local file
starts with file://
.
-
csv
- The
schema
ofcsv
is a string ofjsonArray
, such as"[{\"type\":\"long\"},{\"type\":\"string\"}]"
, this can only specify the type of the field , The field name cannot be specified, and the common configuration parameterfield_name
is generally required.
- The
-
json
- The
schema
parameter ofjson
is to provide ajson string
of the original data, and theschema
can be automatically generated, but the original data with the most complete content needs to be provided, otherwise the fields will be lost.
- The
-
parquet
- The
schema
ofparquet
is anAvro schema string
, such as{\"type\":\"record\",\"name\":\"test\",\"fields\":[{\"name\" :\"a\",\"type\":\"int\"},{\"name\":\"b\",\"type\":\"string\"}]}
.
- The
-
orc
- The
schema
oforc
is the string oforc schema
, such as"struct<name:string,addresses:array<struct<street:string,zip:smallint>>>"
.
- The
-
text
- The
schema
oftext
can be filled withstring
.
- The
Source plugin common parameters, please refer to Source Plugin for details
The parallelism of an individual operator, for FileSource
FileSource{
path = "hdfs://localhost:9000/input/"
format.type = "json"
schema = "{\"data\":[{\"a\":1,\"b\":2},{\"a\":3,\"b\":4}],\"db\":\"string\",\"q\":{\"s\":\"string\"}}"
result_table_name = "test"
}