How to Find Duplicates Automatically in Excel using the COUNTIF function.

Related-View our course on cleaning data with Excel

- - 5:20
- - Skill Level: 5
- - What's my skill level

## Find Duplicates Automatically in Excel

You sometimes need to find duplicates automatically in Excel.

So in this example, what we have got here is a transaction number and you’ll see the first one is 100123 and as you come down you’ll see that same number is repeated a couple of times. So what we want to do is to tell Excel to please find where these situations happen.

The function to use is the **COUNT IF** function. The key logic is how we are going to do the counting. Just working back we want Excel to look at the first row and to say, that transaction number – have I seen it before? – just looking upwards. When it goes to the next cell, it must look between these two and say, have I seen it before, yes or no – how many times? Eventually we will go all the way down here and by the time we get to this cell (B18), what it will do is count how many times its seen this transaction before and in this case it will say, oh, there is another one here. So you are always looking above. So lets do the “**COUNT IF**” function. The range is where we are going to make a major difference. So I’m going to say please look at the range from here and in this case by pushing semi-colon (;), you’ve got the same range (B11;B11) – and the criteria, I’m going to make it this cell (B11).

What is key is, as I copy it down, I want the first cell to stay the same (of the range) but the second reference can move. So I’m going to freeze SBS11 and press ok. If you see Excel’s result – it has found one time of this transaction. If I copy it down to G31. As we go down, you’ll see the area it looks at is ever increasing until it gets to row G18 where its now looking and saying in this range, I have found this transaction twice – and going down you’ll see 3, 4, 5 etc. And now what we want to do is in a separate column, we want to tell Excel where is the first instance of any particular transaction number shown.

So, for example, this 2, 3, 4 – all of them are irrelevant to us – we don’t actually care – we know that that means that they are duplicates – all we care about is the 1’s. So a simple way to do it is use the “**IF”** function and just say, if this is a 1, that’s fine – we’ll just put a 1 for example, if its not a 1 put a 0 or another word. So I’m just going to use the function wizard, go to the “**IF**” function- the logical test here is that if this cell is = to 1, then I know that that is a unique situation, if its true in my case I’m going to put a 1, you may put a word saying “unique” or whatever you want it to be. If its not a 1, then it must be a duplicate, so you can either be a word but I am going to put a 0 in, say ok and now I copy it down. So what you’ll see here is that all unique transaction numbers, get 1s.

Wherever there are duplicates, you now get 0s. If you wanted to now, you could set up the data filter. You could say just show me the 1s and you will see only the unique numbers. What we can now do is take it a step further. What we want to do is uniquely identify the unique items. So what we want to be able to say is- give an absolutely unique number to the lines that are unique – so for example, that will be a 1, that will be a 2. When it gets down here where there are duplicates, we don’t actually want numbers for these, but we want unique numbers here.

This will allow you to do things like “**VLOOKUP**” on the set of data. The way you can do that is by using an “**IF**” function. So let’s activate the function wizard, get the “**IF**” function. What we want to do now, is every time we see a 1, in this first instance column, I want to have a unique number so I’ll take a number and add 1, if there’s not a 1, and there’s a 0, then I want Excel to ignore it and put a 0. So the logical test is, is this cell = 1. If its true, and again we are going to use a growing boundary – I want it to sum. So we go from here to itself (I am just going to freeze that). If its false I want to put a 0.

Now this is the key – the easiest way to show what this is doing is to copy it down – say ok, copy it down. What you’ll see the first time it finds a 1 so it adds the 1 in if you go down again – it sees a 1 so it adds these 2 together. If we go down again, it sees a 1 so adds these 3 together. When it gets to the first 0, it sees 0 and we have told it- please make it a 0. When it finally finds another 1, it then goes and adds all of them together. What you can now see, is that you have got an absolutely unique number for every unique row of information and you’ll see it goes up to 13. At this point now it is possible to extract all this information using something like a **VLOOKUP** command.

What you could do is using a **VLOOKUP** command you could set up a separate spreadsheet, have a grouping of numbers – 1,2 3 etc and all the **VLOOKUP** will do is it will look on this table, find the unique item called 1 and maybe pull through this set of information. Another problem with duplicate values is that sometimes you are not looking for a duplicate value at this level, you maybe want it to be the same transaction number, the date, etc. For this methodology what you can then do is you need to create a single cell that represents absolute uniqueness.

The way you will see I have done it here is I have used **CONCATENATE **and I’ve joined all the cells together in this case – so what I’m going to be looking for is the exact duplicate where columns A, B, C, D etc are the same. The only way to do this is to create a new column, create this unique key and then follow through and do the exact same logic here (count how many times you’ve seen this before).