Start a conversation

Handling “Duplicate” and “Ghost” Records in Khoros Marketing API Data Pulls (Meta/Facebook)

Overview

Khoros Marketing API data pulls can sometimes appear to contain “duplicate” or “ghost” records, creating noise in reporting and analytics. In many cases, these are not true duplicates: items may share identical (or very similar) post text while still being distinct Meta (Facebook) post entities because they have different permalinks/URLs (and therefore different IDs).

In addition, API datasets can include unpublished/dark posts (and sometimes auto-created items such as Facebook “ad relatives”), which may look like “ghost” records when your reporting expects published-only posts.

Solution

Issue

Khoros Marketing API pulls show many entries that look like duplicate or “ghost” records, creating reporting noise.

How to recognize this scenario

  • Multiple rows have identical/similar Post Text, but the rows contain different Facebook (Meta) URLs/permalinks.
  • API output includes items that are not intended to be counted as “published posts” (for example, unpublished/dark posts).

Root cause (most common)

  1. False duplicates caused by an incorrect dedupe key
    • If deduplication is done by Post Text, distinct posts that reuse the same copy can be incorrectly flagged as duplicates.
    • Different Facebook (Meta) URLs/permalinks generally indicate different post entities (different IDs/permalinks), even if the text is the same.
  2. “Ghost” records are often unpublished/dark posts
    • API datasets can include posts/items with an Unpublished status, missing publish timestamps, or auto-created items (for example, Facebook “ad relatives”) that should be excluded from published-only reporting.

Resolution / Mitigation

1) Fix deduplication logic (use identifiers, not text)

Update your dedupe logic to use one of the following (in order of preference, depending on what your API returns):

  • Platform post ID (for example, a stable Facebook post ID)
  • Permalink / URL
  • A composite key such as: {platform}:{account_id}:{post_id}

Avoid using Post Text as the primary dedupe key.

2) Filter out unpublished/dark posts in your API consumer (client-side)

For published-only reporting, filter out records that indicate they are not published.

Common filters:

  • Exclude where status = "Unpublished" (unpublished = dark post)
  • Exclude where Published Date (or equivalent publish timestamp field) is blank/null
  • If your payload distinguishes auto-created unpublished items (for example, Facebook “ad relatives”), exclude those categories/types as well to reduce noise

Example (pseudo-logic):

Keep record only if:
  status != "Unpublished"
  AND published_date IS NOT NULL

3) Cross-check against Khoros Marketing export

Run a Khoros Marketing Posts data export using the product option “Exclude Unpublished Posts” (from your instance at https://your_instance.domain.com) and compare:

  • Number of records
  • Presence/absence of the “ghost” items
  • Whether “duplicate” items are actually distinct when comparing permalinks/URLs

This confirms whether your API-side filtering matches the product’s published-only view.

Verification steps

  1. Re-run your API pull after implementing:
    • Dedupe by post ID/permalink/URL (not text)
    • Client-side filtering to exclude unpublished/dark posts
  2. Generate a Posts data export with “Exclude Unpublished Posts” enabled.
  3. Compare totals and spot-check a sample of previously “duplicate/ghost” entries:
    • Distinct permalinks/URLs should remain as separate posts
    • Unpublished/dark posts should no longer appear in published-only reporting

Notes

  • No product defect, version-specific fix, or engineering escalation was identified based on the provided information.
  • No platform publishing impact was indicated; the behavior described affects data pulls/reporting visibility.

Frequently Asked Questions

1. How can I tell whether these are real duplicates or just reused copy?
Check whether the rows have different platform permalinks/URLs (or different post IDs). If the Facebook (Meta) URL/permalink differs, they are typically distinct post entities even if the post text matches.
2. What should I dedupe on for Khoros Marketing API reporting?
Dedupe by stable identifiers such as platform post ID, permalink/URL, or a composite key like {platform}:{account_id}:{post_id}. Avoid deduping solely by Post Text.
3. What are “ghost” records in API pulls most commonly?
They are often unpublished/dark posts (and sometimes auto-created platform items such as Facebook “ad relatives”). These can be present in API datasets but should be excluded from published-only reporting.
4. How do I exclude unpublished/dark posts from API results?

Apply client-side filtering, for example:

  • Exclude records where status = "Unpublished", and/or
  • Exclude records where Published Date is blank/null

If your payload marks auto-created unpublished items separately, exclude those types as well.

5. How do I validate that my filtering matches what Khoros Marketing shows?
Run a Khoros Marketing Posts data export with “Exclude Unpublished Posts” enabled and compare that export to your API dataset after filtering. They should align closely for published-only reporting.
Choose files or drag and drop files
Was this article helpful?
Yes
No
  1. Priyanka Bhotika

  2. Posted

Comments