PowerQuery Remove Duplicate not working correctly

Do you think the PowerQuery Remove Duplicate is not working correctly? You may have noticed that PowerQuery’s Remove Duplicate tool does not seem to work exactly the same as Remove Duplicates in Excel. In Excel it always leaves the first duplicate and removes duplicates after the first one. But PowerQuery seems to be more ad hoc and sometimes removes the first item even when you have sorted them to have the first item be the one you want to keep.

YouTube PowerQuery Remove Duplicate deletes first items

PowerQuery ignores sort orders when removing duplicates

As shown below, we have sorted a table in PowerQuery so that the first item in the ‘Matching’ column is always the biggest number with the idea being that when we remove duplicates in the ‘Unique’ column, we will be left with the biggest number in each case.

PowerQuery Remove Duplicate not working correctly

However, when we do this, PowerQuery seems to ignore the sort order and (apparently) arbitrarily deletes items. In fact it is deleting duplicates based on some initial sort order and NOT the one you are seeing on the screen.

PowerQuery Remove Duplicate not working correctly

Force PowerQuery to see your sort order (and delete the correct duplicates)

There are some other ways, but the easiest way we have found is to add an index column after you have the data sorted the way you want it to be (as shown below). This seems to reset the sort order so that the Remove Duplicates option now sees the same sort order you are looking at.

PowerQuery Remove Duplicate not working correctly

Now when we remove duplicates it correctly ALWAYS keeps the first item and deletes any other duplicates.

Remove duplicates NOT WORKING if number formats different

Easy fix for promoting headers when the name can change in PowerQuery

Solution to PowerQuery Create Connection Only being Greyed Out

See, and play with the Excel data when PowerQuery editor is open