Skip to main content

single-job-guide

This guide provides a quick guide for commandline single-job

Introduction

Different from batch-job, the command single-job only run a single job each time and should be noted as one of arguments when running a job. For example, when running a sample job which reads data from source-table and write them into target-table, the command is as follows:

# run single job by `spark-submit`
spark-submit --class com.github.sharpdata.sharpetl.spark.Entrypoint spark/build/libs/spark-1.0.0-SNAPSHOT.jar single-job --name=source-table --period=1440 --datasource=sales.online_order --default-start-time="2021-09-30 00:00:00" --local --once

# run single job locally
./gradlew :spark:run --args="single-job --name=source-table --period=1440 --datasource=sales.online_order --default-start-time='2021-09-30 00:00:00' --local --once"

Parameters

common command params

  1. --local

Declare that the job is running in standalone mode. If --local not provided, the job will try running with Hive support enabled.

  1. --release-resource

The function is to automatically close spark session after job completion.

  1. --skip-running

When there is a flash crash, use --skip-running to set last job status(in running state) as failed and start a new one.

  1. --default-start / --default-start-time

Specify the default start time(eg, 20210101000000)/incremental id of this job. If the command is running for the first time, the default time would be the time set by the argument. If not, the argument would not work.

  1. --once

It means that the job only run one time(for testing usage).

  1. --env

Specify the default env path: local/test/dev/qa/prod running the job.

  1. --property

Using specific property file, eg --property=hdfs:///user/admin/etl-conf/etl.properties

  1. --override

Overriding config in properties file, eg --override=etl.workflow.path=hdfs:///user/hive/sharp-etl,a=b,c=d

single-job params

  1. --name

Specify the name of the job to run and the name is required.

  1. --period

Specify the period of job execution.

  1. -h / --help

Take an example of parameters and its default value is false.