CloverETL 3.5 Reference Manual

Table of Contents

Quick Links

Download this manual

CloverETL Server

Reference Manual

Table of Contents

Chapters

Table of Contents

Need help?

Do you have a question about the CloverETL 3.5 and is the answer not in the manual?

Questions and answers

Summary of Contents for CloverETL CloverETL 3.5

Page 1 CloverETL Server Reference Manual...
Page 2 CloverETL Server: Reference Manual This Reference Manual refers to CloverETL Server 4.3.x release. Copyright © 2016 Javlin, a.s. All rights reserved. Javlin www.cloveretl.com www.javlininc.com Feedback welcome: If you have any comments or suggestions for this documentation, please send them by e-mail to support@cloveretl.com.
Page 3: Table Of Contents
JBoss Enterprise Application Platform ..............23 Oracle WebLogic Server ..................27 Installation of CloverETL Server License ................ 29 Installation of CloverETL Server License using a Web Form ........29 Installation of CloverETL Server License using a license.file Property ......32 Separate License WAR ..................32 IBM InfoSphere MDM Plugin Installation ...............
Page 4 Job Config Properties ....................108 WebDAV Access to Sandboxes ................... 111 WebDAV Clients ....................111 WebDAV Authentication/Authorization ..............111 16. CloverETL Server Monitoring ..................113 Standalone Server Detail ..................... 113 Cluster Overview ....................... 118 Node Detail ......................119 Server Logs ......................120 17.
Page 5 CloverETL Server API Extensibility ................193 Groovy Code API ....................193 Embedded OSGi Framework ................194 25. Recommendations for Transformations Developers ............. 196 26. Extensibility - CloverETL Engine Plugins ................. 197 27. Troubleshooting ......................198 VI. Cluster ..........................199 28. Clustering Features ....................... 200 High Availability .......................
Page 6: Cloveretl Server
Part I. CloverETL Server...
Page 7: What Is Cloveretl Server
CloverETL Server into existing application portfolios and processes. The CloverETL Server is a Java application built to J2EE standards. We support a wide range of application servers including Apache Tomcat, Jetty, IBM WebSphere, Sun Glassfish, JBoss AS, and Oracle WebLogic.
Page 8: Cloveretl Server And Cloveretl Engine Comparison
Chapter 1. What is CloverETL Server? Table 1.1. CloverETL Server and CloverETL Engine comparison CloverETL Server CloverEngine as executable tool possibilities of executing by calling http (or JMX, etc.) APIs (See by executing external process or by graphs details in Simple HTTP API (p.
Page 9: Installation Instructions
Part II. Installation Instructions...
Page 10: System Requirements For Cloveretl Server
> 50 GB This may vary depending on total number of nodes and cores in license. Minimum value, the disk space depends on data. Disk space for shared sandboxes is required only for CloverETL Cluster. Software Requirements Operating system • Microsoft Windows Server 2003/2008/2012 32/64 bit •...
Page 11: System
Chapter 2. System Requirements for CloverETL Server Table 2.2. CloverETL Server Compatibility Matrix CloverETL 3.5 CloverETL 4.0 CloverETL 4.1, 4.2, and 4.3 Application Server Java 6 and 7 Java 7 Java 7 Java 8 Tomcat 6 Tomcat 7 Tomcat 8 Pivotal tc Server Standard (3.1.3,...
Page 12 Chapter 2. System Requirements for CloverETL Server • IMAP/POP3 (EmailReader) • FTP/SFTP/FTPS (readers, writers)
Page 13: Installing
(p. 12) includes details about further testing and production on your chosen app-container and database. To create a fully working instance of Enterprise CloverETL Server you should: • install an application server • create a database dedicated to CloverETL server •...
Page 14: Evaluation Server
Chapter 3. Installing Evaluation Server The default installation of CloverETL Server does not need any extra database server. It uses the embedded Apache Derby DB. What is more, it does not need any subsequent configuration. CloverETL Server configures itself during the first startup. Database tables and some necessary records are automatically created on first startup with an empty database.
Page 15 6. Check whether CloverETL Server is running on URLs: • Web-app root http://[host]:[port]/[contextPath] The default Tomcat port for the http connector is 8080 and the default contextPath for CloverETL Server is "clover", thus the default URL is: http://localhost:8080/clover/ • Web GUI...
Page 16 Chapter 3. Installing 7. CloverETL Server is now installed and prepared for basic evaluation. There are couple of sandboxes with various demo transformations installed.
Page 17: Enterprise Server
This section describes installation of CloverETL Server on various app-containers in detail, also describes the ways how to configure the server. If you need just quickly evaluate CloverETL Server features which don't need any configuration, evaluation installation may be suitable: Evaluation Server (p.
Page 18 The restart of operating system is needed to apply changes. In case that Tomcat is installed as a Windows service, CloverETL configuration is performed using configuration of the respective service. The configuration can be performed either by graphical utility [tomcat_home]/bin/ Tomcat8w.exe or by command line utility [tomcat_home]/bin/Tomcat8.exe.
Page 19 • JAVA_HOME or JRE_HOME environment variable has to be set. • Apache Tomcat 6.0.x or 7.0.x or 8.0.x is installed. CloverETL Server is developed and tested with the Apache Tomcat 6.0.x, 7.0.x and 8.0.x containers (it may work unpredictably with other versions). See...
Page 20 5. Check whether CloverETL Server is running on URLs: • Web-app root http://[host]:[port]/[contextPath] The default Tomcat port for the http connector is 8080 and the default contextPath for CloverETL Server is "clover", thus the default URL is: http://localhost:8080/clover/ • Web GUI...
Page 21: Jetty
Configuration of CloverETL Server on Jetty (p. 16) Installation of CloverETL Server 1. Download the web archive file (clover.war) containing the CloverETL Server application which is built for Jetty. 2. Check if prerequisites are met: • Oracle JDK or JRE (See Java Virtual Machine (p.
Page 22: Ibm Websphere
(p. 5) for required Java version.) Important In order to ensure reliable function of CloverETL Server always use the latest version of IBM Java SDK. At least SDK 7.0 SR6 (package IBM WebSphere SDK Java Technology Edition V7.0.6.1) is recommended. Using older SDKs may lead to deadlocks during execution of specific ETL graphs.
Page 23 • Go to Integrated Solutions Console ( http://localhost:9060/ibm/console/) • Go to Applications →New Application →New Enterprise Application Here select a WAR archive of the CloverETL server and deploy it to the application server, but do not start it. 4. Configure application class loading...
Page 24 Note Please note, that some CloverETL features using third party libraries don't work properly on IBM WebSphere • Hadoop is guaranteed to run only on Oracle Java 1.6+, but Hadoop developers do make an effort to remove any Oracle/Sun-specific code. See Hadoop Java Versions on Hadoop Wiki.
Page 25: Glassfish / Sun Java System Application Server
It is accessible at http://localhost:4848/ by default. • Go to Applications →Web Applications and click Deploy ..• Upload WAR file with CloverETL server application or select the file from filesystem if it is present on the machine running Glassfish.
Page 26: Jboss Application Server
Configuration of CloverETL Server on JBoss AS (p. 22) Installation of CloverETL Server 1. Get CloverETL Server web archive file ( clover.war ) that is built for JBoss AS. 2. Check if you meet prerequisites • Oracle JDK or JRE (See Java Virtual Machine (p.
Page 27 Chapter 3. Installing 3. Create a separate JBoss server configuration However it may be useful to use a specific JBoss server configuration, when it is necessary to run CloverETL: • isolated from other JBoss applications • with a different set of services •...
Page 28: Jboss Enterprise Application Platform
Configuration of CloverETL Server on JBoss EAP (p. 25) Installation of CloverETL Server 1. Get CloverETL Server web archive file ( clover.war ) which is built for JBoss EAP. 2. Check if you meet prerequisites • Oracle JDK or JRE (See Java Virtual Machine (p.
Page 29 In order to be able to connect to the database, one needs to define global module so that the driver is available for CloverETL web application - copying the driver to the lib/ext directory of the server will not work. Such module is created and deployed in few steps (the example is for MySQL and module's name is mysql.driver...
Page 30 <module name="mysql.driver" slot="main" /> </global-modules> <spec-descriptor-property-replacement>false</spec-descriptor-property-replacement> <jboss-descriptor-property-replacement>true</jboss-descriptor-property-replacement> </subsystem> 6. Configure CloverETL Server according to a description in the next section (p. 25) . 7. Deploy WAR file Copy the file clover.war to [jboss-home]/standalone/deployments 8. Run [jboss-home]/bin/standalone.sh (or standalone.bat on Windows OS) to start the JBoss platform.
Page 31 ). You can set the path to the license file, too. • Alternatively, you can set "JDBC" datasource.type and configure the database connection to be managed directly by CloverETL Server (provided that you have deployed proper JDBC driver module to the server): datasource.type=JDBC jdbc.url=jdbc:mysql://localhost:3306/cloverServerDB...
Page 32: Oracle Weblogic Server
Java Virtual Machine (p. 5) for required Java version.) • WebLogic (CloverETL Server is tested with WebLogic Server 11g (10.3.6) and WebLogic Server 12c (12.1.2), see http://www.oracle.com/technetwork/middleware/ias/downloads/wls-main-097127.html) WebLogic has to be running and a domain has to be configured. You can check it by connecting to...
Page 33 • Set JAVA_OPTIONS variable in the WebLogic domain start script [domainHome]/startWebLogic.sh JAVA_OPTIONS="${JAVA_OPTIONS} -Dclover_config_file=/path/to/clover-config.properties • This change requires restarting WebLogic. Important When CloverETL Server is deployed on WebLogic and JNDI Datasource pointing to Oracle DB is used, there must be an extra config property in the config file: quartz.driverDelegateClass=org.quartz.impl.jdbcjobstore.oracle.weblogic.WebLogicOracleDelegate Continue with: Installation of CloverETL Server License (p.
Page 34: Installation Of Cloveretl Server License
Installation of CloverETL Server License using a Web Form If the CloverETL Server has been started without assigning any license, you can use Add license form in the server gui to install it. In this case the hyperlink No license available in system. Add new license is displayed on...
Page 35 You can paste a license text into License key or use Browse button to search for license file in the filesystem. After clicking Update button the license is validated and saved to the database table clover_licenses. If the license is valid, a table with license's description appears. To proceed to CloverETL Serve console click Continue to server console.
Page 36 Update of CloverETL Server License in the Configuration Section If the license has been already installed, you can still change it by using form in the server web gui. • Go to server web GUI →Configuration →CloverETL Info →License • Click Update license.
Page 37: Installation Of Cloveretl Server License Using A License.file Property
If you assign more valid licenses, the most recent one is used. Installation of CloverETL Server License using a license.file Property 1. Get the license.dat file. 2. Set the CloverETL Server license.file parameter to the path to license.dat. Set its value to full path to the license.dat file. See Chapter 9, List of Properties (p.
Page 38: Ibm Infosphere Mdm Plugin Installation
Server. 4. To verify that the plugin was loaded successfully, login to the Server's Reporting Console and look in the Configuration > CloverETL Info > Plugins page. In the list of plugins you should see cloveretl.engine.initiate.
Page 39: Possible Issues During Installation
Chapter 3. Installing Possible Issues during Installation Since CloverETL Server is considered a universal JEE application running on various application servers, databases and jvm implementations, problems may occur during the installation. These can be solved with a proper configuration of the server environment. This section contains tips for the configuration.
Page 40 Apache Tomcat Context Parameters Do Not Have Any Effect Tomcat may sometimes ignore some of context parameters. It may cause weird CloverETL Server behaviour, since it looks like configured, but only partially. Some parameters are accepted, some are ignored. Usually it works fine, however it may occur in some environments.
Page 41 Failed to load webapp: Failed to load webapp: Context root /* is already bound. Cannot start application CloverETL If you can see it, then this is the case. Getting rid of the issue, the easiest way is to stop all other (sample) applications and leave only clover.war running on the server.
Page 42 If you are setting environment variables like clover_license_file or clover_config_file , remember you should not be running more than one CloverETL Server. Therefore if you ever needed to run more instances at once, use other ways of setting parameters (see Part III, “Configuration” (p. 43) for description of all possibilities) The reason is the environment variable is shared by all applications in use causing them to share configurations and fail unexpectedly.
Page 43 Chapter 3. Installing could not execute query You have an error in your SQL syntax; check the manual that coresponds to your MySQL server version for the right to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1...
Page 44: Postinstallation Configuration
Thus whole application server, together with WARs and EARs running on it, share one memory space. Default JVM memory settings is too low for running application container with CloverETL Server. Some application servers, like IBM WebSphere, increase JVM defaults themselves, however they still may be too low.
Page 45: Maximum Number Of Open Files
Therefore it is recommended to increase the limit for production systems. Reasonable limits vary from 10,000 to about 100,000 depending on the expected load of CloverETL Server and the complexity of your graphs. The current limit can be displayed in most UNIX-like systems using the ulimit -Hn command.
Page 46: Upgrading Server To Newer Version
Examples of DB Connection Configuration (p. 54) • Having a separate sandbox with test graph that can be run anytime to verify that CloverETL Server runs correctly and allows for running jobs Upgrade Instructions 1. Suspend all sandboxes, wait for running graphs to finish processing 2.
Page 47 Chapter 5. Upgrading Server to Newer Version 9. Review that contents of all tabs in CloverETL Server Console, especially scheduling and event listeners looks 10.Update graphs to be compatible with the particular version of CloverETL Server (this should be prepared and tested in advance) 11.Resume the test sandbox and run a test graph to verify functionality...
Page 48: Configuration
Part III. Configuration We recommend the default installation (without any configuration) only for evaluation purposes. For production use, we recommend configuring a dedicated database and properly configuring the SMTP server for sending notifications.
Page 49: Configuration Sources And Their Priorities
Source is a common properties file (text file with key-value pairs): [property-key]=[property-value] By default, CloverETL tries to find the config file [workingDir]/cloverServer.properties. Properties File on Specified Location A file has the same file structure as in case above, but its location is specified with a clover_config_file or clover.config.file environment variable or system property.
Page 50: Context Parameters (Available On Apache Tomcat)
Example for Apache Tomcat On Tomcat, it is possible to specify context parameters in a context configuration file. [tomcat_home]/conf/ Catalina/localhost/clover.xml which is created automatically just after deployment of a CloverETL Server web application. You can specify a property with adding this element: <Parameter name="[propertyName]"...
Page 51: Setup
Chapter 7. Setup CloverETL Server Setup helps you with configuration of CloverETL server. Instead of typing the whole configuration file in a text editor, the Setup generates content of the configuration file according to your instructions. It let you set up License and configure Database Connection, LDAP Connection, SMTP Server Connection, Sandbox Paths, Encryption and Cluster Configuration.
Page 52 Chapter 7. Setup See also Jetty (p. 16). Glassfish Add clover.config.file property in application server GUI (accessible on http://localhost:4848). The property can be added under Configuration →System Properties. See also Glassfish / Sun Java System Application Server (p. 20). JBoss See also JBoss Application Server (p.
Page 53: Setup Tabs
Chapter 7. Setup Configuring Particular Items Use Setup. Items configured in Setup are saved into a file defined with clover.config.file. If you need encryption, configure the Encryption first. Configure connection to database and then update license. Later, you can configure other setup items. Some Setup items (Database and Cluster) require restart of an application server.
Page 54 Chapter 7. Setup Database Database tab let you configure connection to database. You can connect via JDBC. Or you can use JNDI to access the datasource on an application server level. Choose a suitable item of a JNDI tree. Sandboxes Sandboxes let you configure path to sandboxes: shared, local, partitioned.
Page 55 Chapter 7. Setup Encryption Encryption tab let you enable encryption of sensitive items of the configuration file. You can choose an encryption provider and an encryption algorithm. An alternative encryption provider can be used; the libs have to be added to classpath.
Page 56 Chapter 7. Setup LDAP LDAP tab let you use an existing LDAP database for user authentication.
Page 57 Chapter 7. Setup Firstly, you should specify connection to the LDAP server. Secondly, define pattern for user DN. The login can be validated using any user matching the pattern. See also LDAP Authentication (p. 92). Cluster Cluster tab let you configure clustering features.
Page 58 Chapter 7. Setup Note You can use the setup in a fresh installation of CloverETL Server, even if it had not been activated yet: log in into Server Console and use Close button to access the menu.
Page 59: Examples Of Db Connection Configuration
Chapter 8. Examples of DB Connection Configuration In standalone deployment (non-clustered), configuration of DB connection is optional, since embedded Apache Derby DB is used by default and it is sufficient for evaluation. However, configuration of external DB connection is strongly recommended for production deployment. It is possible to specify common JDBC DB connection attributes (URL, username, password) or JNDI location of DB DataSource.
Page 60: Embedded Apache Derby
This subdirectory will be created in the directory which is set as derby.system.home (or in the working directory if derby.system.home is not set). Value databases/cloverDb is a default value, you may change it. Derby JDBC 4 compliant driver is bundled with CloverETL Server, thus there is no need to add it on the classpath.
Page 61: Mysql
Chapter 8. Examples of DB Connection Configuration MySQL CloverETL Server supports MySQL 5, up to version 5.5 included. If you use a properties file for configuration, specify these parameters: jdbc.driverClassName, jdbc.url, jdbc.username, jdbc.password, jdbc.dialect. For example: jdbc.driverClassName=com.mysql.jdbc.Driver jdbc.url=jdbc:mysql://127.0.0.1:3306/clover?useUnicode=true&characterEncoding=utf8 jdbc.username=root jdbc.password= jdbc.dialect=org.hibernate.dialect.MySQLDialect...
Page 62: Db2
Database clover has to be created with suitable PAGESIZE. DB2 has several possible values for this property: 4096, 8192, 16384 or 32768. CloverETL Server should work on DB with PAGESIZE set to 16384 or 32768. If PAGESIZE value is not set properly, there should be error message in the log file after failed CloverETL Server startup: ERROR: DB2 SQL Error: SQLCODE=-286, SQLSTATE=42727, SQLERRMC=16384;...
Page 63: Db2 On As/400
DB2 does not allow ALTER TABLE which trims DB column length. This problem depends on DB2 configuration and we've experienced this only on some AS400s so far. CloverETL Server applies set of DP patches during the first installation after application upgrade. Some of these patches may apply column modifications which trims length of the text columns.
Page 64: Oracle
Please don't forget to add a JDBC 4 compliant driver on the classpath. A JDBC Driver which doesn't meet the JDBC 4 specification won't work properly. These are privileges which have to be granted to schema used by CloverETL Server: CONNECT...
Page 65: Ms Sql
Chapter 8. Examples of DB Connection Configuration MS SQL MS SQL requires configuration of DB server. • Allowing of TCP/IP connection: • execute tool SQL Server Configuration Manager • go to Client protocols • switch on TCP/IP (default port is 1433) •...
Page 66: Postgre Sql
Chapter 8. Examples of DB Connection Configuration Postgre SQL If you use a properties file for configuration, specify these parameters: jdbc.driverClassName, jdbc.url, jdbc.username, jdbc.password, jdbc.dialect. For example: jdbc.driverClassName=org.postgresql.Driver jdbc.url=jdbc:postgresql://localhost/clover?charSet=UTF-8 jdbc.username=postgres jdbc.password= jdbc.dialect=org.hibernate.dialect.PostgreSQLDialect Please don't forget to a add JDBC 4 compliant driver on the classpath. A JDBC Driver which doesn't meet the JDBC 4 specification won't work properly.
Page 67: Jndi Db Datasource
Chapter 8. Examples of DB Connection Configuration JNDI DB DataSource CloverETL Server can connect to database using JNDI DataSource, which is configured in application server or container. However there are some CloverETL parameters which must be set, otherwise the behaviour may be unpredictable: datasource.type=JNDI # type of datasource;...
Page 68: Encrypted Jndi
Encrypted JNDI on WebLogic (p. 68) Encrypted JNDI on Tomcat You need secure-cfg-tool to encrypt the passwords. Use the version of secure-cfg-tool corresponding to the version of CloverETL Server. Usage of the tool is described in Chapter 10, Secure Configuration Properties (p. 75).
Page 69 Chapter 8. Examples of DB Connection Configuration type="javax.sql.DataSource" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://127.0.0.1:5432/clover410m1?charSet=UTF-8" username="conf#Ws9IuHKo9h7hMjPllr31VxdI1A9LKIaYfGEUmLet9rA=" password="conf#Cj1v59Z5nCBHaktn6Ubgst4Iz69JLQ/q6/32Xwr/IEE=" maxActive="20" maxIdle="10" maxWait="-1"/> Encrypted JNDI on Jetty 9 (9.2.6) http://eclipse.org/jetty/documentation/current/configuring-security-secure-passwords.html Configuration of a JNDI jdbc connection pool is stored in the plain text file, $JETTY_HOME/etc/jetty.xml. <New id="MysqlDB" class="org.eclipse.jetty.plus.jndi.Resource"> <Arg></Arg>...
Page 70: Connection Configuration
Chapter 8. Examples of DB Connection Configuration <datasources> <local-tx-datasource> <jndi-name>MysqlDS</jndi-name> <connection-url>jdbc:mysql://127.0.0.1:3306/clover</connection-url> <driver-class>com.mysql.jdbc.Driver</driver-class> <user-name>user</user-name> <password>password</password> </local-tx-datasource> </datasources> Encrypt the data source password Linux java -cp client/jboss-logging.jar:lib/jbosssx.jar org.jboss.resource.security.SecureIdentityLoginModule password Windows java -cp client\jboss-logging.jar;lib\jbosssx.jar org.jboss.resource.security.SecureIdentityLoginModule password NOTE: in the JBoss documentation client/jboss-logging-spi.jar is used, but there is no such a file in my JBossAS [6.0.0.Final "Neo"], but client/jboss-logging.jar can be used instead.
Page 71 <max-pool-size>30</max-pool-size> </pool> <security> <user-name>user</user-name> <password>password</password> </security> </datasource> <drivers> <driver name="mysql" module="com.cloveretl.jdbc"> <driver-class>com.mysql.jdbc.Driver</driver-class> </driver> </drivers> <datasources> In JBOSS_HOME directory run cli command: java -cp modules/system/layers/base/org/picketbox/main/picketbox-4.0.19.SP2-redhat-1.jar:client/jboss-logging.jar The command will return an encrypted password, e.g. 5dfc52b51bd35553df8592078de921bc. Add a new security-domain to security-domains, the password value is a result of the command from the previous step.
Page 72 Chapter 8. Examples of DB Connection Configuration </pool> <security> <security-domain>EncryptDBPassword</security-domain> </security> </datasource> <drivers> <driver name="mysql" module="com.cloveretl.jdbc"> <driver-class>com.mysql.jdbc.Driver</driver-class> </driver> </drivers> </datasources> The same mechanism can be probably used also for JMS. http://middlewaremagic.com/jboss/?p=1026 Encrypted JNDI on Glassfish 3 (3.1.2.2) Configuration of jdbc connection pool is stored in the plain text file, $DOMAIN/config/domain.xml.
Page 73 Chapter 8. Examples of DB Connection Configuration The same mechanism can be used also for JMS connection. (Configuring an external JMS provider: https://www.ibm.com/developerworks/community/blogs/timdp/entry/ using_activemq_as_a_jms_provider_in_websphere_application_server_7149?lang=en Encrypted JNDI on WebLogic Password in a JNDI datasource file is encrypted by default when created by admin's web console (Service/ Datasource).
Page 74: List Of Properties
Extensibility - CloverETL Engine Plugins (p. 197). datasource.type Set this explicitly to JNDI if you need CloverETL JDBC Server to connect to a DB using JNDI datasource. In such case, "datasource.jndiName" and "jdbc.dialect" parameters must be set properly. Possible values: JNDI...
Page 75 "datasource.type" is set to "JNDI". clover_server jdbc.driverClassName class name for jdbc driver name jdbc.url jdbc url used by CloverETL Server to store data jdbc.username jdbc database user name jdbc.password jdbc database user name jdbc.dialect hibernate dialect to use in ORM quartz.driverDelegateClass...
Page 76 Switch whether the A1 Digest for HTTP Digest Authentication should be generated and stored or not. Since there is no CloverETL Server API using the HTTP Digest Authentication by default, it's recommended to keep it disabled. This option is not automatically enabled when any feature is specified security.digest_authentication.features_list...
Page 77: Parameters
Max number of records deleted in one batch. It is used for deleting of archived run records. launch.http_header_prefix Prefix of HTTP headers added by launch services to the X-cloveretl HTTP response. task.archivator. Prefix of archive files created by the archivator.
Page 78 Users are strongly discouraged from modification of the property. The property name changed since CloverETL 4.2, however also the obsolete name is still accepted to maintain backwards compatibility.
Page 79 Chapter 9. List of Properties Table 9.2. Defaults for job execution configuration - see Job Config Properties (p. 108) for details description default executor.tracking_interval An interval in milliseconds for scanning of a current status of a 2000 running graph. The shorter interval, the bigger log file. executor.log_level Log level of graph runs.
Page 80: Secure Configuration Properties
Basic Utility Usage 1. Get a utility archive file (secure-cfg-tool.zip) and unzip it. The utility is available in the download section of your CloverETL account - at the same location as the download of CloverETL Server. 2. Execute the script given for your operating system, encrypt.bat for MS Windows, encrypt.sh for Linux.
Page 81 Chapter 10. Secure Configuration Properties Important Values encrypted by a Secure parameter form (Chapter 13, Secure Parameters (p. 88) ) cannot be used as a value of a configuration property. Advanced Usage - Custom Settings The way how configuration values are encrypted described so far, uses default configuration settings (a default provider and algorithm).
Page 82 Configuring an application server CloverETL Server application needs to know how the values have been encrypted, therefore the properties must be passed to the server (see details in Part III, “Configuration” (p. 43)). For example:...
Page 83 Chapter 10. Secure Configuration Properties security.config_properties.encryptor.providerClassName=org.bouncycastle.jce.provider.BouncyCastleProvider security.config_properties.encryptor.algorithm=PBEWITHSHA256AND256BITAES-CBC-BC Important If a third-party provider is used, its classes must be accessible for the application server. Property security.config_properties.encryptor.providerLocation will be ignored.
Page 84: Logging
(p. 80) Main Logs The CloverETL Server uses the log4j library for logging. The WAR file contains the default log4j configuration. The log4j configuration file log4j.xml is placed in WEB-INF/classes directory. By default, log files are produced in the directory specified by system property "java.io.tmpdir" in the cloverlogs subdirectory.
Page 85 By default, these log files are saved in the subdirectory cloverLogs/graph in the directory specified by "java.io.tmpdir" system property. It’s possible to specify a different location for these logs with the CloverETL "graph.logs_path" property. This property does not influence main Server logs.
Page 86: Administration
Part IV. Administration...
Page 87: Temp Space Management
Chapter 12. Temp Space Management Many of the components available in the CloverETL Server require temporary files or directories in order to work correctly. Temp space is a physical location on the file system where these files or directories are created and maintained.
Page 88: Management
(p. 86) Initialization When CloverETL Server is starting the system checks temp space configuration: in case no temp space is configured a new default temp space is created in the directory where java.io.tmpdir system property points. The directory is named as follows: •...
Page 89 Chapter 12. Temp Space Management Figure 12.2. Newly added global temp space. Using environment variables and system properties Environment variables and system properties can be used in the temp space path as a placeholder; they can be arbitrarily combined and resolved paths for each node may differ in accord with its configuration. Note The environment variables have higher priority than system properties of the same name.
Page 90 Chapter 12. Temp Space Management Figure 12.3. Temp spaces using environment variables and system properties Disabling Temp Space To disable a temp space click on "Disable" link in the panel. Once the temp space has been disabled, no new temporary files will be created in it, but the files already created may be still used by running jobs. In case there are files left from previous or current job executions a notification is displayed.
Page 91 Chapter 12. Temp Space Management Figure 12.4. Disable operation reports action performed Enabling Temp Space To enable a temp space click on "Enable" link in the panel. Enabled temp space is active, i.e. available for temporary files and directories creation. Removing Temp Space To remove a temp space click on "Remove"...
Page 92 Chapter 12. Temp Space Management Figure 12.5. Remove operation asks for confirmation in case there are data present in the temp space...
Page 93: Secure Parameters
Secure parameters are automatically decrypted by server in graph runtime. A parameter value can also be encrypted in the CloverETL Server Console in the Configuration > Secure Parameters page - use the Encrypt text section. Figure 13.2. Graph parameters tab with initialized master password If you change the master password, the secure parameters encrypted using the old master password cannot be decrypted correctly anymore.
Page 94: Secure Parameters Configuration Parameters
Castle JCE provider. Another provider would be installed similarly. 1. Download Bouncy Castle provider jar (e.g. bcprov-jdk15on-150.jar) from http://bouncycastle.org/ latest_releases.html 2. Add the jar to the classpath of your application container running CloverETL Server, e.g. to directory WEB- INF/lib 3. Set value security.job_parameters.encryptor.providerClassName...
Page 95 Chapter 13. Secure Parameters security.job_parameters.encryptor.algorithm=PBEWITHSHA256AND256BITAES-CBC-BC security.job_parameters.encryptor.providerClassName=org.bouncycastle.jce.provider.BouncyCastleProvider...
Page 96: Users And Groups
Chapter 14. Users and Groups The CloverETL Server has a built-in security module that manages users and groups. User groups control access permissions to sandboxes and operations the users can perform on the Server, including authenticated calls to Server API functions. A single user can belong to multiple groups.
Page 97: Ldap Authentication
Each user, event though logged-in using LDAP authentication, must have his own "user" record (with related groups) in the CloverETL security module. So there must be the user with the same username and domain set to "LDAP". Such record has to be created by a Server administrator before the the user can log in.
Page 98 Chapter 14. Users and Groups Basic LDAP connection properties # Implementation of context factory security.ldap.ctx_factory=com.sun.jndi.ldap.LdapCtxFactory # URL of LDAP server security.ldap.url=ldap://hostname:port # User DN pattern that will be used to create LDAP user DN from login name. security.ldap.user_dn_pattern=uid=${username},dc=company,dc=com Depending on the LDAP server configuration the property security.ldap.user_dn_pattern can be pattern for user's actual distinguished name in the LDAP directory, or just the login name - in such case just set the property to ${username}.
Page 99 Chapter 14. Users and Groups security.ldap.user_search.filter=(uid=${username}) # Scope specifies type of search in "base". There are three possible values: SUBTREE | ONELEVEL | OBJECT # http://download.oracle.com/javase/6/docs/api/javax/naming/directory/SearchControls.html security.ldap.user_search.scope=SUBTREE Following properties are names of attributes from the search defined above. They are used for getting basic info about the LDAP user in case the user record has to be created/updated by Clover security module: (step [6] in the login process above) security.ldap.user_search.attribute.firstname=fn...
Page 100: Users
First name Last name E-mail E-mail address which may be used by CloverETL administrator or by CloverETL server for automatic notifications. See Task - Send Email (p. 153) for details.
Page 101 Chapter 14. Users and Groups Edit user record User with permission "Create user" or "Edit user" can use this form to set basic user parameters. Figure 14.2. Web GUI - edit user Change users Password If user looses his password, the new one must be set. So user with permission "Change passwords" can use this form to do it.
Page 102 Chapter 14. Users and Groups Figure 14.4. Web GUI - groups assignment Disabling / enabling users Since user record has various relations to the logs and history records, it can't be deleted. So it's disabled instead. It basically means, that the record doesn't display in the list and the user can't login. However disabled user may be enabled again.
Page 103: Groups
Every single CloverETL user is assigned to this group by default. It is possible to remove user from this group, but it is not a recommended approach. This group is useful for some permissions to sandbox or some operation, which you would like to make accessible for all users without exceptions.
Page 104 Chapter 14. Users and Groups Figure 14.6. Web GUI - users assignment Groups permissions Groups permissions are structured as a tree, where permissions are inherited from the root to leafs. Thus if some permission (tree node) is enabled (blue dot), all permissions in sub tree are automatically enabled (white dot). Permissions with red cross are disabled.
Page 105: Server Side Job Files - Sandboxes
Server and accessed remotely. Nonetheless, you can do everything with Server Projects the same way as with local projects – copy and paste files, create, edit, and debug graphs, etcetera. See the CloverETL Designer manual for details on configuring a connection to the Server.
Page 106 Instead of the absolute path, it's recommended to use ${sandboxes.home} placeholder, which may be configurable in the CloverETL Server configuration. So e.g. for the sandbox with ID "dataReports" the specified value of the "root path"...
Page 107: Referencing Files From The Etl Graph Or Jobflow
(ETL graph or Jobflow). • sandbox:// URLs Sandbox URL allows user to reference the resource from different sandboxes with standalone CloverETL Server or the cluster. In cluster environment, CloverETL Server transparently manages remote streaming if the resource is accessible only on some specific cluster node.
Page 108: Sandbox Content Security And Permissions
Another users may have access according to sandbox settings. Figure 15.2. Sandbox Permissions in CloverETL Server Web GUI Permissions to a specific sandbox are modifiable in Permissions tab in sandbox detail. In this tab, selected user groups may be allowed to perform particular operations.
Page 109: Sandbox Content
Chapter 15. Server Side Job Files - Sandboxes Sandbox Content Sandbox should contain jobflows, graphs, metadata, external connection and all related files. Files, especially graph or jobflow files, are identified by relative path from sandbox root. Thus you need two values to identify specific job file: sandbox and path in sandbox.
Page 110: Web Gui - Download Sandbox As Zip
Chapter 15. Server Side Job Files - Sandboxes Figure 15.5. Web GUI - download sandbox as ZIP Upload ZIP to sandbox Select a sandbox in left panel. You must have write permission to the selected sandbox. Then select tab "Upload ZIP"...
Page 111: Web Gui - Upload Zip Results
Chapter 15. Server Side Job Files - Sandboxes Figure 15.7. Web GUI - upload ZIP results Table 15.3. ZIP upload parameters Label Description Encoding of packed file File names which contain special characters (non ASCII) are encoded. By this select names box, you choose right encoding, so filenames are decoded properly.
Page 112 Chapter 15. Server Side Job Files - Sandboxes Download file HTTP API It is possible to download/view sandbox file accessing "download servlet" by simple HTTP GET request: http://[host]:[port]/[Clover Context]/downloadFile?[Parameters] Server requires BASIC HTTP Authentication. Thus with linux command line HTTP client "wget" it would look like this: wget --user=clover --password=clover http://localhost:8080/clover/downloadFile?sandbox=default\&file=data-out/data.dat...
Page 113: Job Config Properties
Chapter 15. Server Side Job Files - Sandboxes Job Config Properties Each ETL graph or Jobflow may have set of config properties, which are applied during the execution. Properties are editable in web GUI section "sandboxes". Select job file and go to tab "Config properties". The same config properties are editable even for each sandbox.
Page 114 Chapter 15. Server Side Job Files - Sandboxes Property name Default value Description "DEFAULT_PATH_SEPARATOR_REGEX". Directory path must always end with a slash character "/", otherwise ClassLoader doesn't recognize it's a directory. Server always automatically adds "trans" subdirectory of job's sandbox, so It doesn't have to be added explicitly.
Page 115: Job Config Properties
Chapter 15. Server Side Job Files - Sandboxes Property name Default value Description graph from the server console sets the debug_mode to false. delete_obsolete_temp_files false If true, system will remove temporary files produced during previous finished runs of the respective job. This property is useful together with enabled debug mode ensuring that obsolete debug files from previous runs of a job are removed from temp...
Page 116: Webdav Access To Sandboxes
WebDAV Authentication/Authorization CloverETL Server WebDAV API uses the HTTP Basic Authentication by default. However it may be reconfigured to use HTTP Digest Authentication. Please see Part III, “Configuration” (p. 43) for details. Digest Authentication may be useful, since some WebDAV clients can't work with HTTP Basic Authentication,...
Page 117 Chapter 15. Server Side Job Files - Sandboxes HTTP Digest Authentication is feature added to the version 3.1. If you upgraded your older CloverETL Server distribution, users created before the upgrade cannot use the HTTP Digest Authentication until they reset their passwords.
Page 118: Cloveretl Server Monitoring
Monitoring section in the server Web GUI displays useful information about current performance of the standalone CloverETL Server or all cluster nodes if the clustering is enabled. Monitoring section of the standalone server has slightly different design from cluster environment. In case of standalone server, the server-view is the same as node detail in cluster environment.
Page 119: Cpu Load
Chapter 16. CloverETL Server Monitoring Performance The Performance panel contains a chart with two basic performance statistics: a number of running jobs and an amount of used heap memory. The graph displays values gathered within a specific interval. The interval can be set up with the combo box above the graph or it can be configured by "cluster.node.sendinfo.history.interval"...
Page 120 Chapter 16. CloverETL Server Monitoring Figure 16.5. Running jobs System System panel contains info about operating system and license. Figure 16.6. System Status History Status history panel displays node statuses history since restart of the server. Figure 16.7. Status History User's Access User's Access panel lists info about activities on files performed by users.
Page 121: Classloader Cache
Status panel displays current node status since last server restart. It displays current server status (ready, stopped, ...), exact Java version, exact CloverETL Server version, way of access to database, URLs for synchronous and asynchronous messaging, available heap and non-heap memory, etc.
Page 122 Chapter 16. CloverETL Server Monitoring Threads Threads panel lists java threads and their states. Figure 16.12. Threads Quartz Quartz panel lists scheduled actions: their name, description, start time, end time, time of previous event, time of next event and expected final event.
Page 123: Cluster Overview
Chapter 16. CloverETL Server Monitoring Cluster Overview Cluster overview displays info collected from all cluster nodes. The info is grouped in several panels: • List of nodes with a toolbar - allows manipulation with selected nodes • Status history - Displays last 10 status changes for all cluster nodes •...
Page 124: Node Detail
Chapter 16. CloverETL Server Monitoring Node Detail Node Detail is similar to the "Standalone server detail" mentioned above, however it displays detail info about node selected in the tree on the left. Figure 16.15. Node detail...
Page 125: Server Logs
• CLUSTER - Only cluster - related messages are visible in this log • LAUNCH_SERVICES - Only requests for launch services • AUDIT - Detail logging of operations called on the CloverETL Server core. Since the full logging may affect server performance, it's disabled by default. See Server Audit Logs (p.
Page 126: Server Configuration Migration
Chapter 17. Server Configuration Migration CloverETL Server provides means to migrate its configuration (e.g. event listeners, schedules etc.) or parts of the configuration between separate instances of the server. A typical use case is deployment from test environment to production - this involves not only deployment of CloverETL graphs, but also copying parts of configuration such as file event listeners etc.
Page 127: Server Configuration Export
XSD schema. The schema for a configuration XML document can be found at http://[host]:[port]/[contextPath]/schemas/clover-server-config.xsd. The XML file contains selected items of the CloverETL server instance. The file can by modified before the import to another server instance - for example to import schedules only.
Page 128: Server Configuration Import
Configuration Import Process Uploading Configuration The first step in the configuration import is to upload the XML file to the CloverETL server. After clicking on CloverETL Configuration File button a window is opened where user can select an XML file with the configuration to import.
Page 129: Server Configuration Uploaded
• Changes only button will display only items that have been either added or actually changed by update • All updates button will display all of imported items, event those identical to already present ones Example 17.1. Example of simple configuration defining one new server user. <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cloverServerConfiguration xmlns="http://cloveretl.com/server/data" timeZone="Europe/Berlin"> <usersList> <user disabled="false"> <username>johnsmith</username>...
Page 130 Chapter 17. Server Configuration Migration <groupCode>job_managers</groupCode> </userGroups> </user> </usersList> </cloverServerConfiguration> Figure 17.4. Outcome of the import preview for configuration from Example 17.1 (p. 124) The Summary in the Import Log says whether the dry run was successful. Should there by any problems with items imported, the item is displayed along with the cause of the error (see Figure 17.4 (p.
Page 131 User is notified about these problems in Import Log with link to the problematic item. One should check such items in appropriate section of the CloverETL Server console and change their settings to fix the issue or remove them.
Page 132: Diagnostics
Chapter 18. Diagnostics CloverETL Server allows you to create a thread dump or a heap dump. The thread and heap dumps are useful for investigation of performance and memory issues. In server GUI, go to Configuration →System Info →Diagnostics. Heap Dump Heap Dump is content of a JVM process memory stored in a binary file.
Page 133: Using Graphs
Part V. Using Graphs...
Page 134: Graph/Jobflow Parameters
Chapter 19. Graph/Jobflow Parameters The CloverETL Server passes a set of parameters to each graph or jobflow execution. Keep in mind that ${paramName} placeholders (parameters) are resolved only during the initialization (loading of XML definition file), so if you need the parameters to be resolved for each job execution, you cannot set the job to be pooled.
Page 135: Parameters By Execution Type
Chapter 19. Graph/ Jobflow Parameters Parameters by Execution Type Additional parameters are passed to the graph depending on how the graph is executed. Executed from Web GUI Graphs executed from a web gui have no additional parameters. Executed by Launch Service Invocation Service parameters which have Pass to graph attribute enabled are passed to the graph not only as "dictionary"...
Page 136: Executed By Task "Graph Execution" By File Event Listener
Chapter 19. Graph/ Jobflow Parameters ALL parameters from a "source" job are passed to the executed job. This switch is implemented for backwards compatibility. Regarding to the default behaviour: in the editor of graph event listener, you can specify a list of parameters to pass.
Page 137: Manual Task Execution
Chapter 20. Manual Task Execution Since 3.1 Manual task execution allows you to invoke a task directly with an immediate effect, without defining and triggering an event. There are a number of task types that are usually associated with a triggering event, such as a file listener or a graph/jobflow listener.
Page 138: Scheduling
Chapter 21. Scheduling The scheduling module allows you to create a time schedule for operations you need to trigger in a repetitive or timely manner. Similar to “cron” from Unix systems, each schedule represents a separate time schedule definition and a task to perform.
Page 139: Timetable Setting
Chapter 21. Scheduling Timetable Setting This section should describe how to specify WHEN schedule should be triggered. Please keep in mind, that exact trigger times are not guaranteed. There may be couple of seconds delay. Schedule itself can be specified in different ways.
Page 140: Periodical Schedule Attributes
Chapter 21. Scheduling Figure 21.3. Web GUI - schedule form - calendar Periodical schedule by Interval This type of schedule is the simplest periodical type. Trigger times are specified by these attributes: Table 21.2. Periodical schedule attributes Type "periodic" Periodicity "interval"...
Page 141: Cron Periodical Schedule Attributes
Chapter 21. Scheduling Figure 21.4. Web GUI - periodical schedule form Periodical schedule by timetable (Cron Expression) Timetable is specified by powerful (but a little bit tricky) cron expression. Table 21.3. Cron periodical schedule attributes Type "periodic" Periodicity "interval" Not active before date/time Date and time, specified with minutes precision.
Page 142: Cron Periodical Schedule Form
Chapter 21. Scheduling Figure 21.5. Cron periodical schedule form...
Page 143: Tasks
Chapter 21. Scheduling Tasks Task basically specifies WHAT to do at trigger time. There are several tasks implemented for schedule and for graph event listener as follows: • Task - Execution of Graph (p. 138) • Task - Execution of Jobflow (p.
Page 144 Chapter 21. Scheduling Table 21.4. Attributes of "Graph execution" task Task type "Start a graph" Node IDs to process the task This attribute is accessible only in the cluster environment. It's comma- separated list of node IDs which may process the task. If it's empty, it may be any node, if there are nodes specified, the task will be processed on the first node which is online and ready.
Page 145 Chapter 21. Scheduling Figure 21.6. Web GUI - Graph execution task Task - Execution of Jobflow Please note that behaviour of this task type is almost the same as Task - Execution of Graph (p. 138)
Page 146 Chapter 21. Scheduling Table 21.5. Attributes of "Jobflow execution" task Task type "Start a jobflow" Node IDs to process the task This attribute is accessible only in the cluster environment. It's a comma- separated list of node IDs which may process the task. If it's empty, it may be any node, if there are nodes specified, the task will be processed on the first node which is online and ready.
Page 147 Chapter 21. Scheduling Table 21.6. Attributes of "Abort job" task Task type "Abort job" Node IDs to process the task This attribute is accessible only in the cluster environment. It's a comma- separated list of node IDs which may process the task. If it's empty, it may be any node, if there are nodes specified, the task will be processed on the first node which is online and ready.
Page 148 IDs which may process the task. If it's empty, it may be any node, if there are nodes specified, the task will be processed on the first node which is online and ready. CloverETL Server contains Groovy version 2.0.0 Table 21.8. List of variables available in Groovy code...
Page 149 CloverETL Server serverFacade com.cloveretl.server.facade. Reference to the facade every time api.ServerFacade interface. Useful for calling CloverETL Server core. WAR file contains JavaDoc of facade API and it is accessible on URL: http://host:port/ clover/javadoc/index.html sessionToken String Valid session token of the every time user who owns the event.
Page 150 Chapter 21. Scheduling Table 21.9. Attributes of "Archivator" task Task type "Archivator" Node IDs to process the task This attribute is accessible only in the cluster environment. It's a comma- separated list of node IDs which may process the task. If it's empty, it may be any node, if there are nodes specified, the task will be processed on the first node which is online and ready.
Page 151 Chapter 21. Scheduling Figure 21.10. Web GUI - archive records...
Page 152: Viewing Job Runs - Executions History
Chapter 22. Viewing Job Runs - Executions History Executions History shows the history of all jobs that the Server has executed – transformation graphs, jobflows, and Data Profiler jobs. You can use it to find out why a job failed, see the parameters that were used for a specific run, and much more.
Page 153: Persistent Run Record Attributes
Job version Revision of the job file. It's a string generated by CloverETL Designer and stored in the job file. Status Status of the job execution.
Page 154: Executions History - Overall Perspective
Chapter 22. Viewing Job Runs - Executions History Figure 22.2. Executions History - overall perspective Since the detail panel and expecially job logs may be wide, it may be useful to hide a table on the left, so the detail panel spreads.
Page 155: Listeners
Chapter 23. Listeners Listeners can be seen as hooks. They wait for a specific event and take a used-defined action if the event occurs. The event is specific to the particular listener (Graph Event Listeners (p. 151), Jobflow Event Listeners (p.
Page 156: Graph Event Listeners
(Jobflow Event Listeners (p. 160)) – for CloverETL Server both are simply “jobs”. In the Cluster, the event and the associated task are executed on the same node the job was executed on by default. If the graph is distributed, the task will be executed on the master worker node. However, you can override where the task will be executed by explicitly specifying a Node IDs in the task definition.
Page 157: Listener
Chapter 23. Listeners graph timeout Graph timeout event is created, when graph runs longer than for a specified interval. Thus you shell specify a "Job timeout interval" attribute for each listener of a graph timeout event. You can specify this interval in seconds or in minutes or in hours.
Page 158 Chapter 23. Listeners Note: You can use task of any type for both scheduling and graph event listener. Description of task types is divided into two sections just to show the most obvious use cases. In the Cluster environment, all tasks have an additional attribute "Node IDs to process the task". It's the comma separated list of cluster nodes, which may process the task.
Page 159 Chapter 23. Listeners Figure 23.2. Web GUI - send e-mail Note: Do not forget to configure connection to SMTP server (See Part III, “Configuration” (p. 43) for details). Placeholders Placeholder may be used in some fields of tasks. They are especially useful for e-mail tasks, where you can generate content of e-mail according to context variables.
Page 160 Chapter 23. Listeners Some of them may be empty depending on type of event. E.g., if task is processed because of graph event, then run and sandbox variables contain related data, otherwise they are empty, Table 23.2. Placeholders useful in e-mail templates Variable name Contains Current date-time...
Page 161: Attributes Of Jms Message Task
Chapter 23. Listeners Table 23.3. Attributes of JMS message task Task type "JMS message" Initial context class name A full class name of javax.naming.InitialContext implementation. Each JMS provider has its own implementation. I.e., for Apache MQ it is "org.apache.activemq.jndi.ActiveMQInitialContextFactory". If it is empty, server uses the default initial context.
Page 162: Use Cases
Chapter 23. Listeners Use Cases Possible use cases are the following: • Execute graphs in chain (p. 157) • Email notification about graph failure (p. 158) • Email notification about graph success (p. 158) • Backup of data processed by graph (p.
Page 163: Web Gui - E-Mail Notification About Graph Failure
Chapter 23. Listeners Email notification about graph failure Figure 23.5. Web GUI - e-mail notification about graph failure Email notification about graph success Figure 23.6. Web GUI - email notification about graph success...
Page 164 Chapter 23. Listeners Backup of data processed by graph Figure 23.7. Web GUI - backup of data processed by graph...
Page 165: Jobflow Event Listeners
(p. 152)) in many ways, since ETL Graphs and Jobflows are both "jobs" from the point of view of the CloverETL Server. In the Cluster, the event and the associated task are executed on the same node the job was executed on. If the jobflow is distributed, the task will be executed on the master worker node.
Page 166: Listener
Chapter 23. Listeners jobflow timeout A Jobflow timeout event is created, when jobflow runs longer then specified interval. Thus you have to specify "Job timeout interval" attribute for each listener of jobflow timeout event. You can specify this interval in seconds or in minutes or in hours.
Page 167: Jms Messages Listeners
Oracle website: http://docs.oracle.com/javaee/6/tutorial/doc/bncdq.html Note that the JMS implementation is dependent on the application server that the CloverETL Server is running in. In Cluster, you can either explicitly specify which node will listen to JMS or not. If unspecified, all nodes will register as listeners.
Page 168 Chapter 23. Listeners Attribute Description URL of a JMS message broker Durable subscriber (only If it is false, message consumer is connected to the broker as "non-durable", so for Topics) it receives only messages which are sent while the connection is active. Other messages are lost.
Page 169: Variables Accessible In Groovy Code
ServletContext javax.jms.Message instance of a JMS message com.cloveretl.server.api.ServerFacade serverFacade instance of serverFacade usable for calling CloverETL Server core features. String sessionToken sessionToken, needed for calling serverFacade methods Message data available for further processing A JMS message is processed and the data it contains is stored into two data structures: Properties and Data.
Page 170: Data" Elements
Chapter 23. Listeners Table 23.6. Properties Elements description JMS_PROP_[property key] For each message property is created one entry, where "key" is made of a "JMS_PROP_" prefix and property key. JMS_MAP_[map entry key] If the message is instance of MapMessage, for each map entry is created one entry, where "key"...
Page 171 Chapter 23. Listeners The “Data” container is passed to a task that can use it, depending on its implementation. For example, the task "execute graph" passes it to the executed graph as “dictionary entries.” In the Cluster environment, you can specify explicitly node IDs, which can execute the task. However, if the “data” payload is not serializable and the receiving and executing node differ, an error will be thrown as the Cluster cannot pass the “data”...
Page 172: Universal Event Listeners
For example, you can continually check for essential data sources before starting a graph. Or, you can do complex checks of a running graph and, for example, decide to kill it if necessary. You can even call the CloverETL Server core functions using the ServerFacade interface, see Javadoc: http://host:port/clover/javadoc/index.html...
Page 173 ServletContext com.cloveretl.server.api.ServerFacade serverFacade instance of serverFacade usable for calling CloverETL Server core features. String sessionToken sessionToken, needed for calling serverFacade methods...
Page 174: File Event Listeners
Chapter 23. Listeners File event listeners Since 1.3 File Event Listeners allow you to monitor changes on a specific file system path – for example, new files appearing in a folder – and react to such an event with a predefined task. You can either specify an exact path or use a wildcard, then set a checking interval in seconds, and finally, define a task to process the event.
Page 175: Observed File
Chapter 23. Listeners Observed File Observed file is specified by directory path and file name pattern. User may specify just one exact file name or file name pattern for observing more matching files in specified directory. If there are more changed files matching the pattern, separated event is triggered for each of these files. There are three ways how to specify file name pattern of observed file(s) •...
Page 176: Check Interval, Task And Use Cases
CloverETL Server may detect it. File moving/renaming should be atomic operation. Event of this type does not occur when the file has been updated (change of timestamp or size) between two checks.
Page 177: Api
Chapter 24. API Simple HTTP API The Simple HTTP API is a basic Server automation tool that lets you control the Server from external applications using simple HTTP calls. Most of operations is accessible using the HTTP GET method and return plain text. Thus, both “request” and “response”...
Page 178 Chapter 24. API Operation graph_run Call this operation to start execution of the specified job. The operation is called graph_run for backward compatibility, however it may execute ETL graph, jobflow or profiler job. parameters Table 24.1. Parameters of graph_run parameter name mandatory default description...
Page 179 Description is returned as plain text with a pipe as a separator, or as XML. A schema describing XML format of the XML response is accessible on CloverETL Server URL: http://[host]:[port]/clover/ schemas/executions.xsd In dependence on waitForStatus parameter it may return result immediately or wait for a specified status.
Page 180 Chapter 24. API http://localhost:8080/clover/request_processor/graph_kill?runID=123456&returnType=DESCRIPTION Operation server_jobs parameters returns List of runIDs of currently running jobs. example http://localhost:8080/clover/request_processor/server_jobs Operation sandbox_list parameters returns List of all sandbox text IDs. In next versions will return only accessible ones. example http://localhost:8080/clover/request_processor/sandbox_list Operation sandbox_content parameters Table 24.4.
Page 181 Chapter 24. API Table 24.5. Parameters of executions_history parameter name mandatory default description sandbox text ID of sandbox from Lower datetime limit of start of execution. The operation will return only records after (and equal) this datetime. Format: "yyyy-MM-dd HH:mm" (must be URL encoded). Upper datetime limit of start of execution.
Page 182 For returnType==DESCRIPTION_XML returns complex data structure describing one or more selected executions in XML format. A schema describing XML format of the XML response is accessible on CloverETL Server URL: http://[host]:[port]/clover/schemas/executions.xsd Operation suspend Suspends server or sandbox (if specified). Suspension means, that no graphs may me executed on suspended server/sandbox.
Page 183: Parameters Of Sandbox Create
Chapter 24. API Result message Operation sandbox_create This operation creates a specified sandbox. If it is sandbox of "partitioned" or "local" type, it also creates locations by "sandbox_add_location" operation. parameters Table 24.8. Parameters of sandbox create parameter name mandatory default description sandbox Text Id of sandbox to be created.
Page 184 Chapter 24. API parameters Table 24.10. Parameters of sandbox add location parameter name mandatory default description sandbox Removes specified location from its sandbox. location Location storage ID. If the specified location isn't attached to the specified sandbox, sandbox won't be changed. verbose MESSAGE MESSAGE | FULL - how verbose should possible error message be.
Page 185: Parameters Of Server Configuration Export
Chapter 24. API returns Result message example of request (with using curl CLI tool (http://curl.haxx.se/)) curl -u username:password -F "overwriteExisting=true" -F "zipFile=@/tmp/my-sandbox.zip" http://localhost:8080/clover/simpleHttpApi/upload_sandbox_zip Operation cluster_status This operation displays cluster's nodes list. parameters returns Cluster's nodes list. Operation export_server_config This operation exports a current server configuration in XML format. parameters Table 24.13.
Page 186: Parameters Of Server Configuration Import
Chapter 24. API wget http://localhost:8080/clover/simpleHttpApi/export_server_config Operation import_server_config This operation imports server configuration. parameters Table 24.14. Parameters of server configuration import parameter name mandatory default description xmlFile An XML file with server's configuration. dryRun true If true, a dry run is performed with no actual changes written.
Page 187: Jmx Mbean
Chapter 24. API JMX mBean The CloverETL Server JMX mBean is an API that you can use for monitoring the internal status of the Server. MBean is registered with the name: com.cloveretl.server.api.jmx:name=cloverServerJmxMBean JMX Configuration Application's JMX MBeans aren't accessible outside of JVM by default. It needs some changes in an application server configuration to make JMX Beans accessible.
Page 188 JMX server of JVM. Use admin/adminadmin as user/password. (admin/adminadmin are default glassfish values) How to Configure JMX on WebSphere WebSphere does not require any special configuration, but the clover MBean is registered with the name that depends on application server configuration: com.cloveretl.server.api.jmx:cell=[cellName],name=cloverServerJmxMBean,node=[nodeName], process=[instanceName]...
Page 189: Operations
Java version 1.6. Solution is quite easy, just set these two system properties: -Djava.rmi.server.hostname=[hostname address] Djava.net.preferIPv4Stack=true Operations For details about operations please see the JavaDoc of the MBean interface: JMX API MBean JavaDoc is accessible in the running CloverETL Server instance on URL: http://[host]:[port]/ [contextPath]/javadoc-jmx/index.html...
Page 190: Soap Webservice Api
Chapter 24. API SOAP WebService API The CloverETL Server SOAP Web Service is an advanced API that provides an automation alternative to the Simple HTTP API. While most of the HTTP API operations are available in the SOAP interface too (though not all of them), the SOAP API provides additional operations for manipulating sandboxes, monitoring, etc.
Page 191: Launch Services
The architecture of a Launch Service is layered. It follows the basic design of multi-tiered applications utilizing a web browser. Launch services let you build a user-friendly form that the user fills in and sends to the CloverETL Server for processing.
Page 192: Launch Services Section
Dictionary is a key-value temporary data interface between the running transformation and the caller. Usually, although not restricted to, Dictionary is used to pass parameters in and out the executed transformation. For more information about Dictionary, read the “Dictionary” section in the CloverETL Designer User’s Guide. Passing Files to Launch Sevices If Launch service is designed to pass an input file to a graph or jobflow, the input dictionary entry has to be of type readable.channel.
Page 193: Overview Tab
Chapter 24. API Figure 24.5. Creating a new launch configuration Once you create the new Launch Service, you can set additional attributes like: 1. User and group access restrictions and additional configuration options (Edit Configuration) 2. Bind Launch Service parameters to Dictionary entries (Edit Parameters) Figure 24.6.
Page 194: Edit Configuration Tab
• Group - Restricts the configuration to a specific group of users. • User - Restricts the configuration to a specified user. • Sandbox - The CloverETL Sandbox where the configuration will be launched. • Job file - Selects the job to run.
Page 195: Edit Parameters Tab
Chapter 24. API Figure 24.8. Creating new parameter To add a new parameter binding, click on the “Add parameter” button. Every required a graph/jobflow listenerproperty defined by the job needs to be created here. Figure 24.9. Edit Parameters tab You can set the following fields for each property: •...
Page 196: Launch Services Authentication
(You can use a Launch Services test page, accessible from the login screen, to test drive Launch Services.) [Clover Context]/launch/[Configuration name]?[Parameters] • [Clover Context] is the URL to the context in which the CloverETL Server is running. Usually this is the full URL to the CloverETL Server (for example, for CloverETL Demo Server this would be http://server- demo.cloveretl.com:8080/clover).
Page 197 Launch requests are recorded in the log files in the directory specified by the launch.log.dir property in the CloverETL Server configuration. For each launch configuration, one log file named [Configuration name]#[Launch ID].log is created. For each launch request, this file will contain only one line with following tab- delimited fields: (If the property launch.log.dir is not specified, log files are created in the temp directory...
Page 198: Cloveretl Server Api Extensibility
Groovy Code API Since 3.3 The CloverETL Server Groovy Code API allows clients to execute Groovy code stored on the Server by an HTTP request. Executed code has access to the ServerFacade, instance HTTP request and HTTP response, so it's possible to implement a custom CloverETL Server API in the Groovy code.
Page 199: Embedded Osgi Framework
OSGi bundle). It can add a new API operation or even extend the Server Console UI. It is independent of the standard clover.war. CloverETL itself isn't based on OSGi technology, OSGi is used only optionally for extending server APIs. OSGi framework is completely disabled by default and is enabled only when the property "plugins.path" is set as described below.
Page 200 OSGi plugin is better choice. E.g. custom API has to use different libraries then the ones on the server classpath. Whereas groovy uses the same classpath as CloverETL, the OSGi plugin has its own isolated classpath.
Page 201: Recommendations For Transformations Developers
Connections (JDBC/JMS) may require third-party libraries. We strongly recommended adding these libraries to the app-server classpath. CloverETL allows you to specify these libraries directly in a graph definition so that CloverETL can load these libraries dynamically. However, external libraries may cause memory leak, resulting in "java.lang.OutOfMemoryError: PermGen space"...
Page 202: Extensibility - Cloveretl Engine Plugins
See details about the possibilities with CloverETL configuration in Part III, “Configuration” (p. 43) This property must be the absolute path to the directory or zip file with additional CloverETL engine plugins. Both the directory and zip must contain a subdirectory for each plugin. These plugins are not a substitute for plugins packed in a WAR file.
Page 203: Troubleshooting
Chapter 27. Troubleshooting Graph hangs and is un-killable Graph can sometimes hang and be un-killable if some network connection in it hangs. This can be improved by setting a shorter tcp-keepalive so that the connection times out earlier. The default value on Linux is 2 hours (7,200 seconds).
Page 204: Cluster
Part VI. Cluster...
Page 205: Clustering Features
CloverETL Server does not recognize any differences between cluster nodes. Thus, there are no "master" or "slave" nodes meaning all nodes can be virtually equal. There is no single point of failure (SPOF) in the CloverETL cluster itself, however SPOFs may be in the input data or some other external element.
Page 206: Transformation Requests
Basically, the more nodes we have in the cluster, the more transformation requests (or HTTP requests in general) we can process at one time. This type of scalability is the CloverETL server's ability to support a growing number of clients. This feature is closely related to the use of an HTTP load balancer which is mentioned in the previous section.
Page 207: Component Allocation Dialog
Component Allocation Allocation of a single component can be derived in several ways (list is ordered according priority): • Explicit definition - all components have common attribute Allocation. CloverETL Designer allows user to use convenient dialog. Figure 28.3. Component allocation dialog Three different approaches are available for explicit allocation definition: •...
Page 208 Chapter 28. Clustering Features allocation is automatically derived from locations of the partitioned sandbox. So in case you manipulate with one of these components with a file in partitioned sandbox suitable allocation is used automatically. • Adoption from neighbour components By default, allocation is inherited from neighbour components. Components on the left side have higher priority.
Page 209: Dialog Form For Creating New Shared Sandbox
As you can see in the screenshot above, you can specify the root path on the filesystem and you can use placeholders or absolute path. Placeholders available are environment variables, system properties or CloverETL Server config property intended for this use sandboxes.home. Default path is set as [user.data.home]/CloverETL/ sandboxes/[sandboxID] where the sandboxID is ID specified by the user.
Page 210: Dialog Form For Creating New Local Sandbox
So each physical location will cause a single worker to run. This worker does not have to actually store any data to "its" location. It is just a way to tell the CloverETL Server: "execute this part of ETL graph in parallel on these nodes"...
Page 211 Chapter 28. Clustering Features CloverETL Server evaluates the sandbox URL on each worker and provides an open stream to a local resource to the component. The sandbox URL may be used on standalone server as well. It is excellent choice when graph references some resources from different sandboxes.
Page 212: Graph Allocation Examples
Chapter 28. Clustering Features Graph Allocation Examples Basic component allocation This example shows two component graph, where allocation ensures that the first component will be executed on cluster node1 and the second component will be executed on cluster node2. Basic component allocation with remote data transfer Two components connected with an edge can have different allocation.
Page 213: Example Of Distributed Execution
Chapter 28. Clustering Features Example of Distributed Execution The following diagram shows a transformation graph used for parsing invoices generated by a few cell phone network providers in Czech Republic. The size of these input files may be up to a few gigabytes, so it is very beneficial to design the graph to work in the cluster environment.
Page 214 Chapter 28. Clustering Features The part of the graph demarcated by the four cluster components may have specified its allocation by the file URL attribute as well, but this part does not work with files at all, so there is no file URL. Thus, we will use the "node allocation"...
Page 215 Chapter 28. Clustering Features • does not contain any data and since the graph does not read or write to this sandbox, it is used only for the definition of "nodes allocation" • on the following figure, allocation is configured for two cluster nodes •...
Page 216: Scalability Of The Example Transformation
Chapter 28. Clustering Features Scalability of the Example Transformation The example transformation has been tested in the Amazon Cloud environment with the following conditions for all executions: • the same master node • the same input data: 1.2 GB of input data, 27 million records •...
Page 217 Chapter 28. Clustering Features nodes runtime 1 [s] runtime 2 [s] runtime 3 [s] average speedup factor runtime [s] 1.85 316.67 2.72 3.68 205.33 4.19 181.67 4.74 5.13 164.33 5.24...
Page 218: Cluster Configuration
Chapter 29. Cluster Configuration Cluster can work properly only if each node is properly configured. Clustering must be enabled, nodeID must be unique on each node, all nodes must have access to shared DB (direct connection or proxied by another cluster node) and shared sandboxes, and all properties for inter-node cooperation must be set according to network environment.
Page 219: Mandatory Properties
String, URL http://localhost:8080/clover description: URL of the CloverETL cluster node. It must be HTTP/HTTPS URL to the root of a web application, thus typically it would be "http:// [hostname]:[port]/clover". Primarily it's used for synchronous inter- node communication from other cluster nodes. It's recommended to use a fully qualified hostname or IP address, so it's accessible from client browser or CloverETL Designer.
Page 220: Optional Properties
Chapter 29. Cluster Configuration Optional Properties Table 29.3. Optional properties - these properties aren't vital for cluster configuration - default values are sufficient property type default description cluster.jgroups.external_address String, IP address of the cluster node. Configure this only if address the cluster nodes are on the different sub-nets, so IP address of the network interface isn't...
Page 221 Must be the same on all cluster nodes. Its protection against fake messages. sandboxes.home.partitioned String ${user.data.home}/ This property is intended to CloverETL/ be used as placeholder in the sandboxes- location path of partitioned partitioned sandboxes. So the sandbox path is specified with the...
Page 222 String local Change this property "remote" if the node doesn't have direct connection to the CloverETL Server database, so it has to use some other cluster node as proxy to handle persistent operations. In such case, also property "cluster.datasource.delegate.nodeIds" must be properly configured.
Page 223 At least one of the listed node IDs must be running, otherwise this node will fail. All listed node IDs must have a direct connection to CloverETL Server database properly configured. Property "cluster.datasource.delegate.nodeIds" is ignored by default. Property "cluster.datasource.type" must be set to "remote"...
Page 224: Example Of 2 Node Cluster Configuration
Example of 2 Node Cluster Configuration This section contains examples of CloverETL cluster nodes configuration. We assume that the user "clover" is running the JVM process and the license will be uploaded manually in the web GUI. In addition it is necessary to configure: •...
Page 225: 2-Nodes Cluster With Proxied Access To Database
Chapter 29. Cluster Configuration jdbc.password=clover cluster.enabled=true cluster.node.id=node02 cluster.http.url=http://192.168.1.132:8080/clover cluster.jgroups.bind_address=192.168.1.132 cluster.group.name=TheCloverCluster1 If you use Apache Tomcat, the configuration is placed in $CATALINA_HOME/webapps/clover/WEB- INF/config.properties file. The location and file name on other application server may differ. 2-nodes Cluster with Proxied Access to Database This cluster configuration is similar to previous one, but only one node has direct access to database.
Page 226: 2-Nodes Cluster With Load Balancer
These two lines describe access to database via another node. 2-nodes cluster with load balancer If you use any external load balancer, the configuration of CloverETL Cluster will be same as in the first example. Figure 29.3. Configuration of 2-nodes cluster, one node without direct access to database The cluster.http.url and cluster.jgroups.bind_address are urls of particular cluster nodes...
Page 227: Jobs Load Balancing Properties
Chapter 29. Cluster Configuration Jobs Load Balancing Properties Multiplicators of load balancing criteria. Load balancer decides which cluster node executes graph. It means, that any node may process request for execution, but graph may be executed on the same or on different node according to current load of the nodes and according to these multiplicators.
Page 228: Running More Clusters
Chapter 29. Cluster Configuration Running More Clusters If you run more clusters, each cluster has to have its own unique name. If the name is not unique, the cluster nodes of different clusters may consider foreign cluster nodes as part of the same cluster. The cluster name is configured using cluster.group.name option.
Page 229: Cluster Reliability In Unreliable Network Environment
Cluster Reliability in Unreliable Network Environment CloverETL Server instances must cooperate with each other to form a cluster together. If the connection between nodes doesn't work at all, or if it's not configured, cluster can't work properly. This chapter describes cluster nodes behavior in environment, where the connection between nodes is somehow unreliable.
Page 230: Nodeb Is Killed Or It Cannot Connect To The Database
Chapter 29. Cluster Configuration the event from NodeB. Also heart-beat is vital for meaningful load-balancing. The same check-task mentioned above also checks heart-beat from all cluster nodes. Time-line describing the scenario: • 0s network connection between NodeA and NodeB is down •...
Page 231 Chapter 29. Cluster Configuration • Since the network is down, also heart-beat can't be delivered and maybe HTTP connections can't be established, the cluster reacts as described in the sections above. Even though the nodes may be suspended, parent job A keeps waiting for the event from job B •...
Page 232: Recommendations For Cluster Deployment
Chapter 30. Recommendations for Cluster Deployment 1. All nodes in the cluster should have a synchronized system date-time. 2. All nodes share sandboxes stored on a shared or replicated filesystem. The filesystem shared among all nodes is single point of failure. Thus, the use of a replicated filesystem is strongly recommended. 3.
Page 233: Multiple Cloverserver Instances On The Same Host
Chapter 31. Multiple CloverServer Instances on the same Host Running multiple CloverETL Server instances on the same host is not recommended. If you do so, you should ensure that the instances do not interfere with each other. • Each instance must run in a separate application server.
Page 234: List Of Figures
List of Figures 3.1. Adjusting Maximum heap size limit ..................18 3.2. Login page of CloverETL Server without license ................ 30 3.3. Add new license form ......................31 3.4. Update license form ......................32 3.5. Clover Server as the only running application on IBM WebSphere ..........36 12.1.
Page 235 23.9. Web GUI - "File event listeners" section ................169 24.1. Glassfish JMX connector ....................183 24.2. WebSphere configuration ....................184 24.3. Launch Services and CloverETL Server as web application back-end .......... 186 24.4. Launch Services section ...................... 187 24.5. Creating a new launch configuration ..................188 24.6.
Page 236: List Of Tables
List of Tables 1.1. CloverETL Server and CloverETL Engine comparison ..............3 2.1. Hardware requirements of CloverETL Server ................5 2.2. CloverETL Server Compatibility Matrix ..................6 9.1. General configuration ......................69 9.2. Defaults for job execution configuration - see Job Config Properties for details .........
Page 237 CloverETL Server 29.3. Optional properties - these properties aren't vital for cluster configuration - default values are sufficient ..........................215 29.4. Load balancing properties ....................222...
Page 238 List of Examples 17.1. Example of simple configuration defining one new server user........... 124...

This manual is also suitable for:

Cloveretl 4.0 Cloveretl 4.3 Cloveretl 4.1 Cloveretl 4.2

CloverETL 3.5 Reference Manual

Table of Contents

List of Figures

Quick Links

Chapters

Need help?

Questions and answers

Summary of Contents for CloverETL CloverETL 3.5