Enrich Metadata & Classification

Centralpoint's Data Cleaner (otherwise known as Data Goverrnance) collaborates seamlessly with Centralpoint's Data Transfer utility, automating metadata and Taxonomy assignment for your content. This empowers the transformation of any information that lacks proper metadata. Regardless of the information type—raw data from databases like Oracle, IBM, SQL Access, Excel, and documents such as PDFs, MS Word, or Excel—Data Cleaner ensures metadata and Taxonomy application to each record upon import. This process allows for your terminology, product skus, laws, policies, and other pertinent information to be leveraged as a way to enrich your metadata, taxonomy and ontology....automatically.

Centralpoint's Data Cleaner covers HTML, Taxonomy, and Keywords, granting you the ability to manage condition dictionaries, including determining how many instances of a word or part number correspond to specific Taxonomy or keywords. For further insights, explore the extensive features we offer for MDM (Master Data Management).

The best way to understand how this tool can benefit your organization is to actually see it. We encourage you to set up a demonstration with us, so that we may step through the various routines online, via a MS/Teams Meeting demo, and we can address any of your questions or challenges .If you are interested in this new module, you may also want to see our Taxonomy Pre-Import Tool. Be sure to see our Data Transfer module for more information.

Types of Data Transformation which are supported and the required fields needed for each:

Structured Data Sources
- MS/Sharepoint -Source Site URL, Title, List Title, Username, Password
- Oracle -Connection String/table name, any query or filter (credentials)
- MS/SQL -Connection String/table name, any query or filter (credentials)
- MS/Office 365- (Ondrive) Authorization, Source Folder (Office365 file Selector), Source Recursive (y/n), Source End Point (bulk or changes)
- OLEDB - Connection String/table name, any query or filter (credentials)
- ODBC - Connection String/table name, any query or filter (credentials)
- XML- ReadXML -XML Source (path to xml), Available tables and columns, Source Table Index
- XML- CpCollection –Source XML (path to xml), Default File, Source Directory, Number of Files to select, XSLT
- Delimited TXT (oledb)– Source Directory, Source Delimiter used, Source Text Qualifier, Source Header Row, Source Select Command
- Delimited TXT (Stream)– Source Directory, Source Delimiter used, Source Text Qualifier, Source Header Row, Source Select Command
- Active Directory – (Used to build employee directories typically) – Directory Path, Filter
- MS/Excel – Source File (path to excel, if not uploaded into centralpoint), Source Header row, Source Select Command (if any)
- RSS/ATOM- Source Feed URL (Web path to RSS Feed), Feed type (atom or rss)
- MS/Access- Source File (Path), Source Select Command (query)
- MS/Office365 – Source Authorization (Authorize), Source Folder, Source is Recursive, Source End Point
- Custom Provider – Typically used when ingesting from a WebAPI/Web Services – Source Type, Source Parameters (Custom covers virtually any configuration) wherein certain security considerations or custom methods must be passed in order to authorize or encrypt.
- Centralpoint to Centralpoint (Modules, Audiences, Taxonomy, Roles) – This is used when transforming data Centralpoint to Centralpoint or to an outside system, or after staging and testing initial imports.
Unstructured
- File path (and/or credentials) to spider (Network drive, Any file path, web based drive (OneDrive- Source Directory, Source Pattern, Source Options. This feature also has the ability to spider all sub folders within a larger shared drive, ingested each record, and where applicable, fully text indexing, converting and preserving PDF, Word, XLS, PPT, Images, Videos and more. Additionally, the names from file folders (within the shared drive) may be included within the data cleaner/governance rules, to better organize content. Please see (below) Data Triage, which allows for your shared network drives to be ingested, allowing each file by type to be automatically organized into it's own unique module within Centralpoint.....automating the organization of all your information each day (via scheduled routines) and be made available on the internet, considerate of roles and authentication of each user.

When to consider INDEXING vs. INGESTING Data using Centralpoint Data Transformation

Centralpoint supports both indexing or ingesting data. This is because you will want a federated search against all records, but need to preserve where some records currently live. In another case, you may need to have a new system of record or home to MOVE records into Centralpoint.
- You may want to avoid duplication of your records
- You may may need to consider the size of certain files (like CAD, Hi resolution images, etc.)
- You may need to consider the security of the file paths in which your files live today and user’s network accessibility to them.
Indexing vs. Ingesting– The difference is whether you want to leave the record were it is found (to avoid duplication) in order to make it search able within Centralpoint, or whether you want to sunset an old system, where the new record will now live in Centralpoint as it’s new home. When creating your data transformation routine, the difference will be when field mapping ALTERNATIVE URL. Alternative URL is only used when indexing, allowing the path to record to be recorded in Centralpoint (as well as the enrichment), wherein the user will be returned to the original source, should they have the access to see the record. If you are intending to move the record to Centralpoint as the new system of record, then DO NOT USE Alternative URL, and make sure each record is moved during the ingestion (via an available) File Upload field.
Security Consideration- Whether you are ingesting or indexing records using Data Transformation, always be sure to map the security roles from the system or path your scanning, in which to maintain who may see or access records (either indexed o ingested in Centralpoint)

Data Governance (Data Cleaner) considerations, when executing your Data Transformation.

Data cleaning is a separate module found under the Centralpoint Data Transformation suite of tools, which allow you to set up data governance rules which may work in unison with your Data Transformation routines.

Keyword Generator – Used to enrich, supplement or add new keywords to any record, where certain values are is found (during mining).
- Example: If the word ‘apple’ is found, add keywords of ‘fruit, food, nutrition, pectin’ which will enhance search (searching fruit, and relate anything apple to any other record which may also relate to fruit or food)
  - What it will ask you for this routine : Search for?, attributes, add keywords
Taxonomy Generator- Used to apply one or more metadata/taxonomy types when certain value(s) are found.
- Example: If the word ‘apple’ is found, apply to N-tiered taxonomy under Food/Fruit, which will enhance search (searching fruit, and relate anything apple to any other record which may also relate to fruit or food)
  - What it will ask you for this routine Search for?, case sensitive?, regex?, Taxonomy
Attribute Generator – Used to apply new values in any field within one or more modules when certain value(s) are found (during mining).
- Example: If the term ‘Top Secret’ is found within any document, apply the value for the Role=Top Secret, to the ‘Roles’ attribute for any module. This will override the security roles for all documents which contain ‘Top Secret’
  - What it will ask you for this routine : Search for?, regex?, value to add, attribute, which modules?
HTML Cleaner- Used to clean or scrub any HTML or code where certain values are found (during mining). Often used to fix older or bad HTML or convert bad characters from MS/Word or faulty HTML being ingested.
- Example: Should you need to replace any fault HTML, codes or characters which includes “<b>Apple<b>, and replace with <b>Apple</b> (correcting the original faulty html)
  - What it will ask you for this routine : Search for?, replacement type (html/text), replace with?
Attribute HTML Cleaner- Used to apply new values into certain fields into specific modules whenever certain values are found (during mining)-allowing you to apply new values to any attribute as a result
- What it will ask you for this routine : Search for? , replacement type (html/text), replace with, attribute to add, which modules?
Data Triage- Used to redirect certain file types into certain modules.
- Example, when spidering a shared network drive or database, you may want to deposit all documents containing ‘Marketing’ references into a module designated for only Marketing, or place all Excel documents found into it’s own module. This is used to organize content by type (within separate modules)
- What it will ask you for this routine: Search for?, regex (yes or no), searched attribute, destination modules, logging attribute