Your Data Migration Was a Crime Scene: What Nobody Tells You About Moving Data into Salesforce
How to plan, validate, and execute a data migration without creating 12,000 duplicates
Every data migration is a crime scene. The question is whether the evidence was preserved or destroyed.
A client migrated 70,000 records from their legacy system last year. The migration was handled by a junior admin and a contractor with Data Loader experience. The timeline was tight: 3 weeks from kickoff to go-live. The stakeholders wanted the old system decommissioned by the end of the quarter.
They skipped deduplication because "we'll clean it up after." They skipped field mapping validation because "the fields are the same." They skipped UAT because "we're behind schedule."
The result: 12,000 duplicate Accounts. 8,000 orphaned Contacts (Contact records with no Account association because the Account lookup mapping failed silently). 3,200 Opportunities with amounts in the wrong field (the source system stored amounts in cents, Salesforce stores them in dollars; nobody caught the conversion issue). The annual revenue dashboard showed $4.2M in pipeline. The actual pipeline was $42,000. Two decimal places.
The cleanup project: 6 weeks, 2 consultants, $25,000. The cleanup cost more than the original migration. The worst part: 4 months after the cleanup, users were still finding orphaned records that the deduplication missed.
"We'll clean it up after" is the most expensive sentence in Salesforce. Nobody cleans it up after.
Why Data Migrations Fail
No Deduplication Before Loading
The source system has duplicates. Every legacy system has duplicates. If you load 70,000 records without deduplicating first, you're importing every duplicate the legacy system accumulated over its lifetime. Those duplicates are now in Salesforce, mixed with any existing records, and exponentially harder to identify because they're in a new system with different record IDs.
No Field Mapping Validation
"The fields are the same" is almost never true. The source system has a field called "Amount" that stores values in cents. Salesforce has a field called "Amount" that stores values in dollars. Both are currency fields. Both are named "Amount." The data loads successfully. Every dollar amount is 100x too high. Nobody notices until the CFO asks why the pipeline report shows $4.2M when last month it showed $42K.
Field mapping validation means checking: data types match, units match, picklist values match, and formatting is consistent. A 2-hour validation process prevents a 6-week cleanup.
No UAT with Real Users
The admin loads the data, checks that record counts match, and declares the migration complete. Nobody asks the sales team: "Does this data look right?" Nobody asks the finance team: "Do these amounts match your records?" The people who know the data best are never consulted until they start complaining, which is usually 2-3 weeks after go-live when the problems have compounded.
Lookup Relationship Mismatches
Salesforce uses 18-character record IDs for relationships. The source system uses its own ID format. During migration, Contact records need to be linked to Account records. If the mapping between legacy Account IDs and Salesforce Account IDs is incorrect or incomplete, Contacts get linked to the wrong Accounts or no Accounts at all. 8,000 orphaned Contacts is what "no Account at all" looks like.
No Rollback Plan
Most migrations have no "undo" button. If the migration goes wrong, the team has to manually clean up in production. A rollback plan means: before migration, export a complete backup of the target objects. If migration fails, you can delete the migrated records and restore from backup. Without a rollback plan, you're committed to whatever you loaded.
How to Plan a Data Migration Properly
Phase 1: Source Data Assessment (Week 1)
Before moving any data, assess what you're working with.
Export the source data. Get complete exports from the legacy system in CSV format. One file per object/entity type.
Count records per entity. How many Accounts? Contacts? Opportunities? Donations? Get exact counts. These become your validation benchmarks.
Profile the data quality. Open each CSV and check:
- How many records have blank required fields?
- Are there obvious duplicates? (Same name, same email, different records)
- What formats are used? (Date formats, phone formats, currency formats)
- Are there lookup relationships between entities? (Contacts linked to Accounts via a shared ID)
- What character encoding is used? (UTF-8, Latin-1, Windows-1252. This matters for names with accents or special characters.)
Identify the deduplication scope. How will you define "duplicate"? Same name + same email? Same name + same address? Same company name? Define matching rules for each entity before you start cleaning.
Phase 2: Data Preparation (Week 2)
Deduplicate the source data. Before loading into Salesforce, remove or merge duplicates in the CSV files. Tools for this: Excel (for small datasets), OpenRefine (for medium datasets), DemandTools or CRMfusion (for large datasets with fuzzy matching).
Validate field mappings. Create a mapping document:
| Source Field | Source Format | Target Field | Target Format | Transformation Needed |
|---|---|---|---|---|
| Amount | Integer (cents) | Amount | Currency (dollars) | Divide by 100 |
| Phone | 10-digit no formatting | Phone | (XXX) XXX-XXXX | Format conversion |
| State | Full name | BillingState | 2-letter abbreviation | Mapping table |
| Created Date | MM/DD/YY | CreatedDate | YYYY-MM-DD | Format conversion |
For every field where the format differs, write the transformation rule. Apply the transformations to the CSV data before loading.
Resolve lookup relationships. If Contacts need to be linked to Accounts, you need a mapping key. The most reliable approach: create an External ID field on the target object in Salesforce.
Navigate to: Setup → Object Manager → Account → Fields & Relationships → New → External ID (check the External ID and Unique checkboxes)
Name the field Legacy_ID__c. Populate it with the source system's unique identifier during the Account migration. Then, when migrating Contacts, use the Account's Legacy_ID__c as the lookup reference instead of the Salesforce Record ID. Data Loader supports External ID lookups natively.
Populate required fields. Salesforce has required fields (both standard and custom) that must be populated for a record to save. Check every required field on the target objects. If the source data doesn't have values for required fields, you need to either populate them with default values or make the fields non-required during migration and re-enable the requirement afterward.
Phase 3: Test Load (Week 2-3)
Load 100 records as a test. Use Data Loader to load 100 records from your prepared CSV into a sandbox. Not production. Never production.
Navigate to: Data Loader → Insert → select the target object → map the fields → select your test CSV (100 records)
Review the test results:
- Did all 100 records load? Check the success file and error file that Data Loader generates.
- Do the field values look correct? Open 10 random records in Salesforce and compare them against the source data.
- Are lookup relationships correct? Do Contacts point to the right Accounts?
- Do currency amounts, dates, and formatted fields display correctly?
- Did any automation fire on the loaded records? (Flows, triggers, assignment rules) Did they produce expected results?
Have a business user review. Ask someone who knows the data to look at 10-20 records and verify accuracy. The admin checks the technical load. The business user checks the semantic accuracy: "Is this the right data for this customer?"
Phase 4: Full Migration (Week 3)
Disable automations before loading. Record-Triggered Flows, validation rules, and workflow rules will fire on every record as it's inserted. This can cause errors (validation rules blocking inserts), performance issues (47 Flows firing on 70,000 records), and unwanted side effects (welcome emails sent to 70,000 imported Contacts).
Navigate to: Setup → Flows → deactivate Record-Triggered Flows on the objects you're migrating Navigate to: Setup → Object Manager → [Object] → Validation Rules → deactivate each rule
Make a list of everything you deactivate. You'll reactivate them after migration.
Load in batches. Don't load 70,000 records in one batch. Load in batches of 5,000-10,000. This keeps you under governor limits, makes errors easier to diagnose (which batch failed?), and allows you to spot-check between batches.
Validate record counts. After each batch, compare your Salesforce record count against the expected count. If you loaded 5,000 records and Salesforce shows 4,800, investigate the 200 failures before loading the next batch.
Run the full count validation at the end:
SELECT COUNT(Id) FROM Account
SELECT COUNT(Id) FROM Contact
SELECT COUNT(Id) FROM Opportunity
Compare against your source data counts. If the counts don't match, check the error files from Data Loader for the discrepancy.
Reactivate automations. Turn everything back on in the order you turned it off. Test with a single new record to verify automations fire correctly on new data while coexisting with migrated data.
Phase 5: Post-Migration Validation (Week 3-4)
Run the Data Quality Assessment. Use the 10-metric template from Article 4 of this series. Check owner population, field completion, duplicate rate, and format consistency on the migrated data.
Business user sign-off. Have representatives from each department review their data in Salesforce. Do their Accounts look right? Are the Contact relationships correct? Do the historical Opportunities show accurate amounts and stages? Get explicit sign-off before decommissioning the legacy system.
Keep the legacy system accessible for 90 days. Don't decommission the source system immediately. Keep it available (read-only) for 90 days so users can reference it if they find discrepancies in the migrated data. After 90 days with no issues, decommission.
How to Prevent Migration Problems
Never skip deduplication. The cost of deduplicating 70,000 records before migration: 4-8 hours. The cost of deduplicating 70,000 records after migration (when they've been mixed with existing data and had automations fire on them): 6 weeks and $25,000. The math is not ambiguous.
External ID on every migration target. Every object that receives migrated data should have an External ID field populated with the source system's unique identifier. This makes post-migration troubleshooting possible: you can trace any Salesforce record back to its source record.
Field mapping review with a business user. The admin handles the technical mapping. A business user validates the semantic accuracy. "Amount in cents vs. dollars" is a business-level error, not a technical one. The person who understands the data catches errors the person who understands the tool doesn't.
Pre-migration backup. Before loading any records, export a complete backup of every object you'll be touching. If the migration creates problems, you have a clean restore point.
Download the Data Migration Pre-Flight Checklist
The Data Migration Pre-Flight Checklist is a one-page printable PDF with 15 validation points organized across pre-migration, during-migration, and post-migration phases. Print it. Tape it to your monitor. The 2 hours this checklist costs you will save you 6 weeks of cleanup.
Download it below. It's free.
Part 8 of 10 in the series: What Your Salesforce Org Says About Your Company.

