Helpful Tips For IBM Sterling Integrator’s Data Sweeper Utility
By: Dan Beach | September 13th, 2022One of the most overlooked yet crucial facets of keeping your IBM B2B Integrator and File Gateway systems healthy is related to housekeeping.
Housekeeping is a set of concepts that work silently in the background, and if you never receive an email you are lead to believe it’s working perfectly. However, there are aspects of housekeeping that are not automatic and while they require less frequent attention, are still important to the efficiency of these applications. The first involves running the Data Sweeper utility. You should run it under normal conditions daily, especially in higher volume environments. There is a Data Sweeper service in a business process that can be executed on a schedule.
Data Sweeper can be run within a business process or from the command line and there are compelling reasons for each option. On a regular basis, the Data Sweeper should run with default settings via schedule management within IBM Sterling B2B Integrator.
So first, let’s discuss running it as a scheduled business process.
Data Sweeper as a Business Process
Default settings for this out-of-box business process are set in a schedule to execute the Data Sweeper service once a week, but you will need to enable the schedule for it to run. In lower environments such as development, this frequency may be sufficient. You should enable the schedule to run at a suitable time of day when the environment is relatively idle. For UAT, pre-production, performance, production, and certification environments it should be set to run daily when traffic is not heavy.
Within the business process itself, you can set the batch size. This involves balancing the length of time the Data Sweeper takes to run with default settings versus getting the job done. Be sure to set the batch size to be bigger if needed but remember that you can also run the Data Sweeper process more frequently as well. Balance your frequency with batch size for best results.
Data Sweeper from the Command Line
One of the most important uses of Data Sweeper involves GUID tables. This should be done weekly through cron or other scheduling utility—preferably on a non-token node of a B2B integrator cluster. GUIDs are generated unique IDs for documents, communication sessions, mailboxes, mailbox messages, and so forth. Over time they accumulate and each generated unique ID or GUID is stored in a table associated with the unique ID generated. For those GUIDs no longer in service, there is nothing that runs on a regular schedule to remove them. This is where Data Sweeper executed from the command line comes in.
GUID tables can be periodically cleared in B2Bi by running Data Sweeper from the command line on one node. Below are the tables affected by Data Sweeper followed by the commands to run to clean these tables appropriately:
- ACT_SESSION_GUID – tied to communication sessions
- DATA_FLOW_GUID – business process-related activity
- DMI_FACT_GUID – DMI visibility (mostly tied to routing activities in FG)
- MBX_MAILBOX_GUID – Mailboxes
- MBX_MESSAGE_GUID – Mailbox messages
To effectively gather statistics (but not execute changes) on the collected/aggregated one-time entries created in these tables safely, run the following commands (from B2Bi’s install directory/bin):
- ./dataSweeper.sh -unassociatedRowSweeper.ACT_SESSION_GUID
- ./dataSweeper.sh -unassociatedRowSweeper.DATA_FLOW_GUID
- ./dataSweeper.sh -unassociatedRowSweeper.DMI_FACT_GUID
- ./dataSweeper.sh -unassociatedRowSweeper.MBX_MAILBOX_GUID
- ./dataSweeper.sh -unassociatedRowSweeper.MBX_MESSAGE_GUID
- To delete rows, run the same command with the -autoCorrect flag:
- ./dataSweeper.sh -unassociatedRowSweeper.ACT_SESSION_GUID -autoCorrect
- ./dataSweeper.sh -unassociatedRowSweeper.DATA_FLOW_GUID -autoCorrect
- ./dataSweeper.sh -unassociatedRowSweeper.DMI_FACT_GUID -autoCorrect
- ./dataSweeper.sh -unassociatedRowSweeper.MBX_MAILBOX_GUID -autoCorrect
- ./dataSweeper.sh -unassociatedRowSweeper.MBX_MESSAGE_GUID -autoCorrect
Additional arguments to the unassociatedRowSweeper command:
Option | Default Value | Notes |
-commitSize | 5000 | Maximum number of rows in each DB transaction
Runs until table is cleaned up |
-batchSize | Not Set | Maximum number of rows per execution; runs only once.
If set, batchSize needs to be smaller than commitSize to take effect. |
-autoCorrect | Not Set | Only if autoCorrect is set will rows be deleted and committed to DB. |
To learn more about this process (B2B Integrator 6.1.1 link), check out the link here.
To discover how our experts can help you overcome any of your IBM Sterling integrator challenges, schedule a consultation today.
Related Resources
Subscribe to our resources!
Sign up to receive our latest eBooks, webinars, blog posts, newsletter, event invitations, and much more.
Blog Posts by Topic
AIAlteryx
Analytics Solutions
Artificial Intelligence
B2B/EDI
Cloud Migration
Compliance
Culture
Data Analytics
Data Management
Embedded Analytics
IBM
Integration
Machine Learning
Managed File Transfer
Managed Services
Red Hat
Snowflake
Supply Chain
Syncrofy
Tableau
Tableau Cloud
Tableau Migration