Overview
Khoros Marketing API data pulls can sometimes appear to contain “duplicate” or “ghost” records, creating noise in reporting and analytics. In many cases, these are not true duplicates: items may share identical (or very similar) post text while still being distinct Meta (Facebook) post entities because they have different permalinks/URLs (and therefore different IDs).
In addition, API datasets can include unpublished/dark posts (and sometimes auto-created items such as Facebook “ad relatives”), which may look like “ghost” records when your reporting expects published-only posts.
Solution
Issue
Khoros Marketing API pulls show many entries that look like duplicate or “ghost” records, creating reporting noise.
How to recognize this scenario
- Multiple rows have identical/similar Post Text, but the rows contain different Facebook (Meta) URLs/permalinks.
- API output includes items that are not intended to be counted as “published posts” (for example, unpublished/dark posts).
Root cause (most common)
-
False duplicates caused by an incorrect dedupe key
- If deduplication is done by Post Text, distinct posts that reuse the same copy can be incorrectly flagged as duplicates.
- Different Facebook (Meta) URLs/permalinks generally indicate different post entities (different IDs/permalinks), even if the text is the same.
-
“Ghost” records are often unpublished/dark posts
- API datasets can include posts/items with an Unpublished status, missing publish timestamps, or auto-created items (for example, Facebook “ad relatives”) that should be excluded from published-only reporting.
Resolution / Mitigation
1) Fix deduplication logic (use identifiers, not text)
Update your dedupe logic to use one of the following (in order of preference, depending on what your API returns):
- Platform post ID (for example, a stable Facebook post ID)
- Permalink / URL
- A composite key such as:
{platform}:{account_id}:{post_id}
Avoid using Post Text as the primary dedupe key.
2) Filter out unpublished/dark posts in your API consumer (client-side)
For published-only reporting, filter out records that indicate they are not published.
Common filters:
- Exclude where
status = "Unpublished"(unpublished = dark post) - Exclude where
Published Date(or equivalent publish timestamp field) is blank/null - If your payload distinguishes auto-created unpublished items (for example, Facebook “ad relatives”), exclude those categories/types as well to reduce noise
Example (pseudo-logic):
Keep record only if:
status != "Unpublished"
AND published_date IS NOT NULL
3) Cross-check against Khoros Marketing export
Run a Khoros Marketing Posts data export using the product option “Exclude Unpublished Posts” (from your instance at https://your_instance.domain.com) and compare:
- Number of records
- Presence/absence of the “ghost” items
- Whether “duplicate” items are actually distinct when comparing permalinks/URLs
This confirms whether your API-side filtering matches the product’s published-only view.
Verification steps
- Re-run your API pull after implementing:
- Dedupe by post ID/permalink/URL (not text)
- Client-side filtering to exclude unpublished/dark posts
- Generate a Posts data export with “Exclude Unpublished Posts” enabled.
- Compare totals and spot-check a sample of previously “duplicate/ghost” entries:
- Distinct permalinks/URLs should remain as separate posts
- Unpublished/dark posts should no longer appear in published-only reporting
Notes
- No product defect, version-specific fix, or engineering escalation was identified based on the provided information.
- No platform publishing impact was indicated; the behavior described affects data pulls/reporting visibility.
Frequently Asked Questions
- 1. How can I tell whether these are real duplicates or just reused copy?
- Check whether the rows have different platform permalinks/URLs (or different post IDs). If the Facebook (Meta) URL/permalink differs, they are typically distinct post entities even if the post text matches.
- 2. What should I dedupe on for Khoros Marketing API reporting?
- Dedupe by stable identifiers such as platform post ID, permalink/URL, or a composite key like
{platform}:{account_id}:{post_id}. Avoid deduping solely by Post Text. - 3. What are “ghost” records in API pulls most commonly?
- They are often unpublished/dark posts (and sometimes auto-created platform items such as Facebook “ad relatives”). These can be present in API datasets but should be excluded from published-only reporting.
- 4. How do I exclude unpublished/dark posts from API results?
-
Apply client-side filtering, for example:
- Exclude records where
status = "Unpublished", and/or - Exclude records where
Published Dateis blank/null
If your payload marks auto-created unpublished items separately, exclude those types as well.
- Exclude records where
- 5. How do I validate that my filtering matches what Khoros Marketing shows?
- Run a Khoros Marketing Posts data export with “Exclude Unpublished Posts” enabled and compare that export to your API dataset after filtering. They should align closely for published-only reporting.
Priyanka Bhotika
Comments