Data Transformation: Take Back Control Over Your Disparate, Siloed Information
Overview: The data transfer module can be used to move data from any source to any destination. The source is typically a delimited text file or database, and the destination is typically a standard module, however, the system supports much more.
Getting Started with Data Transfer
When creating a new data transfer, you begin by configuring the following settings:
System Title: Give the transfer a descriptive system title.
Notification Options: Set up notifications that are commonly used to send an email when a scheduled transfer completes.
Execute Attribute: This should be set to Never while designing your transfer, but can be used to execute a transfer immediately or on a schedule. When Now is selected, the transfer will be executed immediately after the transfer record is saved.
File Upload Feature
The General tab contains a File Upload attribute which can be used to upload source files, zip files, etc.
- If you upload a recognized source file like an Access database, the system will automatically select your source and initialize the attribute accordingly
- Uploading a .ZIP file triggers the system to extract that file into the site's
/Uploads/DataTransfer folder in a sub-folder with the same name as the .ZIP file
- All uploads are placed in the
/Uploads/DataTransfer folder
⚠️ Important: Files in the /Uploads/DataTransfer folder are accessible from the web site via an unpublished URL.
Once you've set up the initial tab, proceed to the Source & Destination tab where you determine where the data comes from and where it goes.
Source & Destination Configuration
Selecting a Source
Selecting a different source provides you with the attributes required to read the data in that source. The attributes within each source will appear when it is selected, and typically contain header text that describes their use. The system also validates sources and destinations, so you should be prompted if you don't provide enough information.
In addition to the predefined sources, we have added a Custom Provider option. This can be used to write code to retrieve data from any source by inheriting the TransferReader interface.
Selecting a Destination
After filling in the source information, you can make a selection in the destination attribute. This attribute also displays different attributes depending upon your destination selection, and contains a custom provider option so that data can be transferred into any system.
Standard Module Destination: Selecting the Standard Module destination provides you with the options necessary to transfer data into a standard module. The majority of the attributes displayed are described adequately within the module.
Bulk Method: Fast, but not very forgiving. No data will be transferred if the data doesn't meet the requirements of the database, and the error messages are typically difficult to decipher.
Incremental Method: Much slower, but supports partial document updates and has much better error handling. In some cases, it will even transfer some records and provide a list of the data that could not be transferred.
Status Interval: This attribute defines how often to report on the progress of the transfer to the process log.
Monitoring Transfers: All transfers record their progress permanently in the Development > Process Log module and can be monitored in real time from the Development > Process Monitor module.
Field Mapping
The Fields tab is used to map source columns to destination columns once the source and destination have been defined and verified. You must select the mapping type for each destination column:
Column: When the mapping type is Column, you select the source column. The data transfer engine will then know to simply move the data from the source column to the destination column.
Text: The Text mapping type allows you to transfer the same text value into every row of the destination column.
Default and Null: The Default and Null options work similarly. The default value for a destination column is determined by the destination provider.
Scripting: The Scripting mapping type allows you to enter one or more of the data transfer scripts defined below. There are many scripts that generate dates or guids, retrieve and manipulate column data, and even translate text into unique identifiers.
The Audiences, Taxonomy, Roles, Module, and Keywords scripts can be used to translate data in a module into their respective Guids used by the system to relate these records. They can also be used to validate the data in that column to ensure that it is in the required format and exists in the related module.
Types of Data Transformation Supported
Structured Data Sources Supported
SharePoint
Required Fields: Source Site URL, Title, List Title, Username, Password
Oracle
Required Fields: Connection String/table name, any query or filter (credentials)
SQL
Required Fields: Connection String/table name, any query or filter (credentials)
Office 365 (OneDrive)
Required Fields: Authorization, Source Folder (Office365 file Selector), Source Recursive (y/n), Source End Point (bulk or changes)
OLEDB
Required Fields: Connection String/table name, any query or filter (credentials)
ODBC
Required Fields: Connection String/table name, any query or filter (credentials)
XML - ReadXML
Required Fields: XML Source (path to xml), Available tables and columns, Source Table Index
XML - CpCollection
Required Fields: Source XML (path to xml), Default File, Source Directory, Number of Files to select, XSLT
Delimited TXT (OLEDB)
Required Fields: Source Directory, Source Delimiter used, Source Text Qualifier, Source Header Row, Source Select Command
Delimited TXT (Stream)
Required Fields: Source Directory, Source Delimiter used, Source Text Qualifier, Source Header Row, Source Select Command
Active Directory
Purpose: Used to build employee directories typically
Required Fields: Directory Path, Filter
Excel
Required Fields: Source File (path to excel, if not uploaded into Centralpoint), Source Header row, Source Select Command (if any)
RSS
Required Fields: Source Feed URL (Web path to RSS Feed), Feed type (atom or rss)
Access
Required Fields: Source File (Path), Source Select Command (query)
Office 365
Required Fields: Source Authorization (Authorize), Source Folder, Source is Recursive, Source End Point
Custom Provider
Purpose: Typically used when ingesting from a WebAPI/Web Services
Required Fields: Source Type, Source Parameters
Note: Custom covers virtually any configuration wherein certain security considerations or custom methods must be passed in order to authorize or encrypt
Centralpoint (Modules, Audiences, Taxonomy, Roles)
Purpose: This is used when transforming data Centralpoint to Centralpoint or to an outside system
Unstructured Data Sources
File Path (Spider)
Purpose: Spider network drives or web-based drives (OneDrive)
Required Fields: File path (and/or credentials), Source Directory, Source Pattern, Source Options
When to Consider Indexing vs. Ingesting Data
Centralpoint supports both indexing and ingesting data. This is because you will want a federated search against all records, but need to preserve where some records currently live. In another case, you may need to have a new system of record or home to MOVE records into Centralpoint.
Key Considerations
- You may want to avoid duplication of your records
- You may need to consider the size of certain files (like CAD, high resolution images, etc.)
- You may need to consider the security of the file paths in which your files live today and users' network accessibility to them
The Difference Between Indexing and Ingesting
Indexing: Leave the record where it is found (to avoid duplication) in order to make it searchable within Centralpoint. When creating your data transformation routine and field mapping, use Alternative URL to record the path to the original record (as well as the enrichment). The user will be returned to the original source when they access the record (if they have the access to see it).
Ingesting: Move the record to Centralpoint as the new system of record (sunset the old system). Do NOT use Alternative URL. Make sure each record is moved during the ingestion via an available File Upload field.
⚠️ Security Consideration: Whether you are ingesting or indexing records using Data Transformation, always be sure to map the security roles from the system or path you're scanning, in order to maintain who may see or access records (either indexed or ingested in Centralpoint).
Data Governance (Data Cleaner) Considerations
Data cleaning is a separate module found under the Centralpoint Data Transformation suite of tools, which allows you to set up data governance rules that may work in unison with your Data Transformation routines.
Keyword Generator
Purpose: Used to enrich, supplement or add new keywords to any record where certain values are found (during mining).
Example: If the word 'apple' is found, add keywords of 'fruit, food, nutrition, pectin' which will enhance search (searching fruit will relate anything apple to any other record which may also relate to fruit or food).
Required Input: Search for?, attributes, add keywords
Taxonomy Generator
Purpose: Used to apply one or more metadata/taxonomy types when certain value(s) are found.
Example: If the word 'apple' is found, apply to N-tiered taxonomy under Food/Fruit, which will enhance search (searching fruit will relate anything apple to any other record which may also relate to fruit or food).
Required Input: Search for?, case sensitive?, regex?, Taxonomy
Attribute Generator
Purpose: Used to apply new values in any field within one or more modules when certain value(s) are found (during mining).
Example: If the term 'Top Secret' is found within any document, apply the value for the Role=Top Secret to the 'Roles' attribute for any module. This will override the security roles for all documents which contain 'Top Secret'.
Required Input: Search for?, regex?, value to add, attribute, which modules?
HTML Cleaner
Purpose: Used to clean or scrub any HTML or code where certain values are found (during mining). Often used to fix older or bad HTML or convert bad characters from MS Word or faulty HTML being ingested.
Example: Should you need to replace any faulty HTML, codes or characters which includes "<b>Apple<b>", and replace with "<b>Apple</b>" (correcting the original faulty html).
Required Input: Search for?, replacement type (html/text), replace with?
Attribute HTML Cleaner
Purpose: Used to apply new values into certain fields into specific modules whenever certain values are found (during mining) - allowing you to apply new values to any attribute as a result.
Required Input: Search for?, replacement type (html/text), replace with, attribute to add, which modules?
Data Triage
Purpose: Used to redirect certain file types into certain modules.
Example: When spidering a shared network drive or database, you may want to deposit all documents containing 'Marketing' references into a module designated for only Marketing, or place all Excel documents found into its own module. This is used to organize content by type (within separate modules).
Required Input: Search for?, regex (yes or no), searched attribute, destination modules, logging attribute
See Also: Data Cleaner works in unison with Data Transfer to filter and assign (keywords, taxonomy, regular expressions) based upon your rules.
Developer Notes & Tips Regarding Data Transfers
1. Use Console Utilities
- With so many modules containing triggers, utilizing this option saves a step post-import
- Useful when the DataId Attributes configuration is in use on the module
- Important Step: To activate console utilities on a per module basis, you must first visit the Module Properties for the receiving module (the module you are ingesting or indexing data into). In this way you can configure your module to leverage the data governance or data cleaning rules
2. "Export" Feature on Scripts
Allows you to insert a human value in place of a GUID on the way out (exporting).
3. UseDefaultValueOnUpdates
Allows you to mix and match your data set:
- If the data being imported is new, the new value in that column will be used
- If the data is being updated, the existing value will be maintained
4. RemoveDuplicateValues
Allows the system to de-dupe a comma-separated list.
5. FTP Files
Place on the source and destination tab's Execute Scripts on Open to send the file generated (presumably an export) to an FTP site provided by the client.
Scheduling Recurring Imports from Third Party Content Providers
⚠️ Important: When Scheduling Recurring Imports from Third Party Content Providers (RSS, XML or Syndicated Content)
When importing from third party sources (like RSS, JSON, XML or Syndicated News Feeds) and scheduling the recurring import of information from them (daily, weekly, or monthly):
- The Incremental Update managed within the Source & Destination tab (of Data Transfer) should be set to IntegrationID, and NOT DataId
- In the Fields tab (of Data Transfer):
- The Autonumber field should be set to the Default radio button option
- The DataId field should be set to the Default radio button option
- The IntegrationId field should be set to column and the column assigned should be the unique identifier from the third-party sources
Summary
The combination of the Incremental Update by DataId (where there is no data id coming from the third-party system) and the fields map having the DataId set to the NewGuid script means the system would never do an incremental update. The newguid would force the system to produce a new record each time, and the system was no doubt getting confused with the autonumber script and dataid being set to Newguid while attempting to update records by existing dataids (that don't exist).
Commonly Used Scripts When Executing a Data Transfer
Audiences Script
Replaces a value or values with valid AudienceIds.
[cp:literal:scripting key='Audiences' name='COLUMN_NAME' valueColumn='AudienceId' defaultValue='bf7bb52f-eae7-4d5a-bd20-6849d0260c80:1:1, 5f2d099d-a0df-4e62-bb88-93662c972ae0:0:1' /]
| Property |
Description |
| DefaultValue |
Enter the default value to be used when the value cannot be found. |
| Name |
The system name of the column in the current data source. Select the Column mapping type for a list of column names. The system name appears in parenthesis. |
| Format |
A format string used to prefix or suffix the return value. Just include '{0}' in the string where you would like the value to be placed. |
| ValueColumn |
This property supports the following values: AudienceId, SystemName, Name, IntegrationId. By default this property uses the value 'AudienceId' which validates the values without replacements. |
| isCda |
Optional parameter. default value = 1. Supported values (0,1 OR true, false) |
| isConsole |
Optional parameter. default value = 1. Supported values (0,1 OR true, false) |
| Export |
When Export='true' the replacement process is reversed allowing you replace GUIDs with a plain text value retrieved using the ValueColumn parameter. |
AutoNumber Script
Generates the next available auto number for transfers into standard modules. This script can only be included once because it will generate a new auto number each time it is used.
[cp:literal:scripting key='AutoNumber' /]
Column Script
Retrieves the value of a single column from the data source.
[cp:literal:scripting key='Column' name='COLUMN_NAME' /]
| Property |
Description |
| AddHours |
This property adds or subtracts hours from the value when dataType='DateTime'. It is primarily used to convert local times to UTC. |
| DataType |
This property supports the following values: String, DateTime, Integer, Decimal, Double, Guid. By default this property uses the value 'String'. |
| DefaultValue |
Enter the default value to be used when the column value cannot be parsed into the selected data type. |
| EmptyValue |
An empty string will be returned when this value matches the script's DataType, and it is equal to the calculated value. Do not use this property when you require the script to return a data type other than String because an empty string may be returned instead of the desired data type. |
| Format |
A format string used to prefix or suffix the return value. Just include '{0}' in the string where you would like the value to be placed. This can also be used to format data types: {0:d}. |
| MinValue |
The minimum allowed value. This property ignored with some data types. |
| MaxValue |
The maximum allowed value. This property ignored with some data types. |
| Name |
The system name of the column in the current data source. Select the Column mapping type for a list of column names. The system name appears in parenthesis. |
| RemoveMarkup |
This property supports the following values: true, false. Enter true to have all HTML stripped from the column value. By default HTML will not be stripped from the column value. |
| ReplaceText |
Use this parameter to replace a string in the column value using the value of the ReplaceWith parameter. |
| ReplaceWith |
This parameter will not be used unless the ReplaceText parameter contains a value. Enter a string that will be the replacement in the column value for the value of the ReplaceText parameter. |
| ReplaceTextRegexEnabled |
If you pass in a string into a regular expression, you need to make sure some special c
 |
Loading... |
|