Do you think the PowerQuery Remove Duplicate is not working correctly? You may have noticed that PowerQuery’s Remove Duplicate tool does not seem to work exactly the same as Remove Duplicates in Excel. In Excel it always leaves the first duplicate and removes duplicates after the first one. But PowerQuery seems to be more ad hoc and sometimes removes the first item even when you have sorted them to have the first item be the one you want to keep.
Table of contents
YouTube PowerQuery Remove Duplicate deletes first items
PowerQuery ignores sort orders when removing duplicates
As shown below, we have sorted a table in PowerQuery so that the first item in the ‘Matching’ column is always the biggest number with the idea being that when we remove duplicates in the ‘Unique’ column, we will be left with the biggest number in each case.
However, when we do this, PowerQuery seems to ignore the sort order and (apparently) arbitrarily deletes items. In fact it is deleting duplicates based on some initial sort order and NOT the one you are seeing on the screen.
Force PowerQuery to see your sort order (and delete the correct duplicates)
There are some other ways, but the easiest way we have found is to add an index column after you have the data sorted the way you want it to be (as shown below). This seems to reset the sort order so that the Remove Duplicates option now sees the same sort order you are looking at.
Now when we remove duplicates it correctly ALWAYS keeps the first item and deletes any other duplicates.
Related
Remove duplicates NOT WORKING if number formats different
Easy fix for promoting headers when the name can change in PowerQuery
Solution to PowerQuery Create Connection Only being Greyed Out
See, and play with the Excel data when PowerQuery editor is open