• Decrease Text SizeIncrease Text Size

Data Aggregation & Enrichment

Centralpoint features distinctive Data Transformation tools for automating Aggregation from diverse sources. It accommodates indexing and ingestion of both Structured and Unstructured data. This translates to aggregating information from various systems (SQL, Oracle, IBM, XML, JSON, CSV), even folders with files (PDF, Word, Images, Videos). The tools include Data Mining, which auto-classifies each record with accurate metadata and Taxonomy upon import. This empowers seamless searching across disparate systems, uncovering potential relationships between all records.




Types of Data Transformation which are supported and the required fields needed for each:

  • STRUCTURED DATA SOURCES SUPPORTED
    • MS/Sharepoint -Source Site URL, Title, List Title, Username, Password
    • Oracle -Connection String/table name, any query or filter (credentials)
    • SQL -Connection String/table name, any query or filter (credentials)
    • MS/Office 365- (Ondrive) Authorization, Source Folder (Office365 file Selector), Source Recursive (y/n), Source End Point (bulk or changes)
    • OLEDB - Connection String/table name, any query or filter (credentials)
    • ODBC  - Connection String/table name, any query or filter (credentials)
    • XML- ReadXML -XML Source (path to xml), Available tables and columns, Source Table Index
    • XML- CpCollection –Source XML (path to xml), Default File, Source Directory, Number of Files to select, XSLT
    • Delimited TXT (oledb)– Source Directory, Source Delimiter used, Source Text Qualifier, Source Header Row, Source Select Command
    • Delimited TXT (Stream)– Source Directory, Source Delimiter used, Source Text Qualifier, Source Header Row, Source Select Command
    • Active Directory – (Used to build employee directories typically) – Directory Path, Filter
    • Excel – Source File (path to excel, if not uploaded into centralpoint), Source Header row, Source Select Command (if any)
    • RSS- Source Feed URL (Web path to RSS Feed), Feed type (atom or rss)
    • Access- Source File (Path), Source Select Command (query)
    • Office 365 – Source Authorization (Authorize), Source Folder, Source is Recursive, Source End Point
    • Custom Provider – Typically used when ingesting from a WebAPI/Web Services – Source Type, Source Parameters (Custom covers virtually any configuration) wherein certain security considerations or custom methods must be passed in order to authorize or encrypt.
    • Centralpoint (Modules, Audiences, Taxonomy, Roles) – This is used when transforming data Centralpoint to Centralpoint or to an outside system

  • UNSTRUCTURED
    • File path (and/or credentials) to spider (Network drive, Any file path, web based drive (OneDrive- Source Directory, Source Pattern, Source Options. This feature also has the ability to spider all sub folders within a larger shared drive, ingested each record, and where applicable, fully text indexing, converting and preserving PDF, Word, XLS, PPT, Images, Videos and more. Additionally, the names from file folders (within the shared drive) may be included within the data cleaner/governance rules, to better organize content. Please see (below) Data Triage, which allows for your shared network drives to be ingested, allowing each file by type to be automatically organized into it's own unique module within Centralpoint.....automating the organization of all your information each day (via scheduled routines) and be made available on the internet, considerate of roles and authentication of each user.

When to consider INDEXING vs. INGESTING Data using Centralpoint Data Transformation

  • Centralpoint supports both indexing or ingesting data. This is because you will want a federated search against all records, but need to preserve where some records currently live. In another case, you may need to have a new system of record or home to MOVE records into Centralpoint.
    • You may want to avoid duplication of your records
    • You may may need to consider the size of certain files (like CAD, Hi resolution images, etc.)
    • You may need to consider the security of the file paths in which your files live today and user’s network accessibility to them.

  • Indexing vs. Ingesting– The difference is whether you want to leave the record were it is found (to avoid duplication) in order to make it search able within Centralpoint, or whether you want to sunset an old system, where the new record will now live in Centralpoint as it’s new home. When creating your data transformation routine, the difference will be when field mapping ALTERNATIVE URL. Alternative URL is only used when indexing, allowing the path to record to be recorded in Centralpoint (as well as the enrichment), wherein the user will be returned to the original source, should they have the access to see the record. If you are intending to move the record to Centralpoint as the new system of record, then DO NOT USE Alternative URL, and make sure each record is moved during the ingestion (via an available)  File Upload field.

  • Security Consideration- Whether you are ingesting or indexing records using Data Transformation, always be sure to map the security roles from the system or path your scanning, in which to maintain who may see or access records (either indexed o ingested in Centralpoint)

Data Governance (Data Cleaner) considerations, when executing your Data Transformation.

Data cleaning is a separate module found under the Centralpoint Data Transformation suite of tools, which allow you to set up data governance rules which may work in unison with your Data Transformation routines. 

    • Keyword Generator – Used to enrich, supplement or add new keywords to any record, where certain values are is found (during mining).
      • Example: If the word ‘apple’ is found, add keywords of ‘fruit, food, nutrition, pectin’ which will enhance search (searching fruit, and relate anything apple to any other record which may also relate to fruit or food)
        • What it will ask you for this routine : Search for?, attributes, add keywords
    • Taxonomy GeneratorUsed to apply one or more metadata/taxonomy types when certain value(s) are found.
      • Example: If the word ‘apple’ is found, apply to N-tiered taxonomy under Food/Fruit, which will enhance search (searching fruit, and relate anything apple to any other record which may also relate to fruit or food)
        • What it will ask you for this routine Search for?, case sensitive?, regex?, Taxonomy
    • Attribute Generator – Used to apply new values in any field within one or more modules when certain value(s) are found (during mining).
      • Example: If the term ‘Top Secret’ is found within any document, apply the value for the Role=Top Secret, to the ‘Roles’ attribute for any module. This will override the security roles for all documents which contain ‘Top Secret’
        • What it will ask you for this routine : Search for?, regex?, value to add, attribute, which modules?
    • HTML CleanerUsed to clean or scrub any HTML or code where certain values are found (during mining). Often used to fix older or bad HTML or convert bad characters from MS/Word or faulty HTML being ingested.
      • Example: Should you need to replace any fault HTML, codes or characters which includes “<b>Apple<b>, and replace with <b>Apple</b> (correcting the original faulty html)
        • What it will ask you for this routine : Search for?, replacement type (html/text), replace with?
    • Attribute HTML CleanerUsed to apply new values into certain fields into specific modules whenever certain values are found (during mining)-allowing you to apply new values to any attribute as a result
      • What it will ask you for this routine : Search for? , replacement type (html/text), replace with, attribute to add, which modules?
    • Data Triage- Used to redirect certain file types into certain modules.
      • Example, when spidering a shared network drive or database, you may want to deposit all documents containing ‘Marketing’ references into a module designated for only Marketing, or place all Excel documents found into it’s own module. This is used to organize content by type (within separate modules)
      • What it will ask you for this routine: Search for?, regex (yes or no), searched attribute, destination modules, logging attribute

Schedule a Demo!

Our team will set up a live,
High Fidelity Prototype of your project
to prove our capabilities (including
ingesting some of your sample data) at no cost.