Wednesday, January 25, 2023

It's a Data Driven World

Ever heard the phrases Data Driven, Data Centric, or Data Fabric? Maybe you've heard British mathematician Clive Humby's phrase, “Data is the new oil”?  Yes, we live in a world where data is the most important commodity and on IBM i we have plenty. Our issue is data isn't staying on IBM i, rather it's going elsewhere - off platform for reporting or analytics or machine learning or some or purpose. I call all this copying, duplicating,  extracting, and thrashing of data simply Data Motioning. 

Does any of this Data Motioning sound familiar?

  • move, copy, duplicate, extract, subset, filter, etc. your data from your IBM i,
  • optionally clean it by changing data types, substringing or concatenating fields, performing math calculations, etc.,
  • optionally join with other tables, or perhaps with data on other IBM i LPARs or non-IBM i servers,
  • optionally process journal entries for change data capture operations possibly from remote journals,
  • optionally enhance with data from external services, 
  • optionally transform to another data format such as JSON or CSV, and
  • load to another library or table, or to another destination such as the IFS, or to Microsoft SQL Server or Oracle, or to a cloud database? 

If Only All Data Motioning Were Easy!?

You've probably written code to perform these tasks. Writing code may be okay for a couple of tables but it quickly gets out of hand when there are hundreds or thousands of tables. The development and maintenance investments can be larger than expected. And complexity can increase quickly consuming valuable development time and resources. Let's take a closer look at some Data Motioning scenarios

Motioning data from one table to another on a different IBM i system while transforming data

The simple work of changing character date to an actual date data type, or splitting apart a field, or formatting a monetary column with decimal and currency symbol can be routine. The work can become harder when it's outside of our comfort zone or is complex. For example, it would take a more complex set of instructions to get a character date into Zulu datetime formats such as a UTC "2023-01-24T13:45:00Z" or ISO-8601 "2023-01-24T13:45:00Z-0400"


Motioning data from multiple tables on one IBM i to multiple destinations

Motioning data begins to get harder when multiple tables must be joined or aggregated, cleaned, transformed, and motioned to multiple destinations. This can be especially complex when some or all of the destinations are non-IBM i systems. It becomes complicated when each destination is only to receive subsets of the data and in different formats. And it's even more complicated when some sources receive summarized data and some detail.

 

Motioning journaled data from one or more tables on one IBM i to multiple destinations

One advantage to journal processing is near real-time data capture (change data capture) and the business may need this type of processing to feed Business Intelligence systems. If you have data on multiple systems, remote journals provide a way to capture that remote data without having to copy data to a centralized location.

Motioning data from multiple systems where some or all are non-IBM i's to multiple destinations some or all of which may be non-IBM i systems

Consider a scenario where data resides in multiple IBM i LPARs, a Microsoft SQL Server database, and is to be enhanced with data from external services providing weather and geospatial information. Not an easy task, is it?

 

Motioning to Open-source Applications

Many IT environments are beginning to use Apache's open-source technologies like Kafka and Camel to address heterogeneous data motioning. Without knowledge of these open-source technologies, this scenario can be complicated, leading to a long development cycle.


 

Motioning to a Data Warehouse

Building a dimensional model required by a Data Warehouse is enormously complex and challenging. Be aware creating a Data Warehouse without fully understanding basic concepts such as slowly changing dimensions can have the undesired consequence of a failed implementation.


Dude, Where's My Data Going? 

It's getting more common for data to be spread everywhere, anywhere and even out of your control. Often data is extracted without internal knowledge of the IBM i database specifics (char dates, multi member files, etc.) resulting in complaints about the quality of the database. Or data is extracted by an SQL developer who does not understand IBM i best practices for processing data resulting in unfounded accusations of a slow and archaic IBM i. 

The business needs better stewards of data to ensure the right users get the right data at the right time. I suggest that it's IBM i experts who are the stewards. So how can we be successful and keep control of our data - and maybe our sanity

Quite simply, we need a cost friendly, effective, and reliable option for 

  • An automated and repeatable process
  • Optionally cleaning the data
  • Optionally transforming the data
  • Motioning data from somewhere to somewhere

Allow Me to Introduce DataMigrator!

DataMigrator is an ETL (Extract, Transformation and Load) application which is part of the IBM Web Query family of products. DataMigrator offers capabilities to schedule processes that extract, consolidate, aggregate, clean, transform, load, and motion data across multiple platforms not just IBM i. Near real-time capture of data is possible with DataMigrator’s built in journaling and remote journaling support. Building Dimensional Models is also easy with DataMigrator built in support for these types of databases.

DataMigrator is a useful but little-known product for IBM i environments. I think of DataMigrator as an unsung hero of IBM Db2 for i. DataMigrator could save your day, budget, and possibly your well-being. 

Still Not Convinced?

Extract, Transform, and Load Tool (ETL)
  • Create data marts and operational data stores
  • Built in dimensional modeling for data warehouses
  • Processing journals including remote journals and DataMigrator natively decomposes the journal entry into the tables format so no more manual work
  • Can be your ETL Pipeline or participate in your existing ones
Low Code
  •  GUI
  • Drag and Drop
  • PC Design Client

Built in functions and formula 

  • Formulas for math operations, statistical functions, concatenation, subsetting, type conversion, data formatting, and more.
  • SQL Pass-Through for direct access to SQL
Integrated with and runs on IBM i
  • DataMigrator jobs run natively on IBM i (7.3+)

SQL based – set vs row at a time

  • All the performance and benefits of SQL without writing any SQL code

Security

  • Honors all IBM i security including object security and RCAC.

Low Cost

  • Compared to other ETL solutions, very cost effective
  • Can pay for itself within months

And much much more!


Interested in a demo or more information? Please reach out to me, JWestcott@ibm.com or QU2@us.ibm.com.