Skip to content

fartzy/sql-to-kafka

Repository files navigation

SQL To Kafka

Expediting the Refactoring of SQL Batch Queries to Streaming By Using an External DSL

Group By is the first SQL Operator is be implemented.

This application will perform a group by operation, similar to a group by in SQL.

There are currently 5 aggregate operations that group by will support

Sum - This will sum up all values in a column.

Avg - This will give the average of all values in a column. The output will be in a decimal format.

Min - This will find the minimum value in a column. This will handle numeric or date columns.

Max - This will find the maximum value in a column. This will handle numeric or date columns.

Concat - This will concat values together given a seperator. Default seperator is "||".

Multi-Column Aggregations - An infinite number of input columns can be aggregated. The ordinal position of the 1 - input columns, which are given as aggregation.columns, the 2 - column aliases, which are given as aggregation.outputcolumns, and the 3 - operations, which are given as aggregation.operations, are what determine the operation and alias that are applied to the column.

aggregation.dateformat - optional date format field - it is good to provide the date format of the date fields. This ensures the date will be parsed correctly. If there is no date format given, most of common date formats will be parsed.

For the file format - the format has to be identical to below format.

This is an example file :

groupBySteps: [ 
{

  topic.in = "test-input-topic"
  topic.out = "groupby-orders"
  groupBy.columns = ["Col1"]
  aggregation.columns = ["Col2","DateCol3","DateCol4","Col5","Col6"]
  aggregation.outputcolumns = ["SumC2","MinC3","MaxC4","ConcatC5","AvgC6"]
  aggregation.operations = ["sum","min","max","concat","avg"]
  aggregation.dateformat = ["yyyy-MM-dd HH:mm:ss"]
  
  }
]

Examples :

"sum" - input: 3, 1, 6 output: 10.0

"sum" - input: 3.5, 4.5, 1.3 output: 9.3

"avg" - input: 5, 6, 7 output: 6.0

"avg" - input: 5, 8, 7, 5 output: 6.25

"min" - input: '2018-10-19 18:00:00', '2018-10-19 17:00:00', '2018-10-20 18:00:00' output: '2018-10-19 17:00:00'

"min" - input: 5, 8, 7 output: 5

"max" - input: '2018-10-19 18:00:00', '2018-10-19 17:00:00', '2018-10-20 18:00:00' output: '2018-10-20 18:00:00'

"max" - input: 5, 8, 7 output: 8

"concat" - input: 5, 8, 7 output: 5||8||7

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published