|
|
Transcripts for the above video clip:
Find Duplicates Automatically
You sometimes need
to use Excel to check in a data set where there are duplicate
values. So in this example, what we have got here is a transaction
number and you’ll see the first one is 100123 and as you come down
you’ll see that same number is repeated a couple of times. So what
we want to do is to tell Excel to please find where these situations
happen. The function to use is the COUNT IF function. The key logic is how we are going to do the
counting. Just working back we want Excel to look at the first row
and to say, that transaction number - have I seen it before? –
just looking upwards. When it goes to the next cell, it must look
between these two and say, have I seen it before, yes or no – how
many times? Eventually we will go all the way down here and by the
time we get to this cell (B18), what it will do is count how many
times its seen this transaction before and in this case it will say,
oh, there is another one here. So you are always looking above. So
lets do the “COUNT IF”
function. The range is where we are going to make a major
difference. So I’m going to say please look at
the range from here and in this case by pushing semi-colon
(;), you’ve got the same range (B11;B11) – and the criteria,
I’m going to make it this cell (B11). What is key is, as I copy it
down, I want the first cell to stay the same (of the range) but the
second reference can move. So I’m going to freeze SBS11 and press
ok. If you see Excel’s result – it has found one time of this
transaction. If I copy it down to G31. As we go down, you’ll see
the area it looks at is ever increasing until it gets to row G18
where its now looking and saying in this range, I have found
this transaction twice – and going down you’ll see 3, 4, 5 etc.
And now what we want to do is in a separate column, we want to tell
Excel where is the first instance of any particular transaction
number shown. So, for example, this 2, 3, 4 – all of them are
irrelevant to us – we don’t actually care – we know that that
means that they are duplicates – all we care about is the 1’s.
So a simple way to do it is use the “IF”
function and just say, if this is a 1, that’s fine – we’ll
just put a 1 for example, if its not a 1 put a 0 or another word. So
I’m just going to use the function wizard, go to the “IF”
function- the logical test here is that if this cell is = to 1, then
I know that that is a unique situation, if its true in my case I’m
going to put a 1, you may put a word saying “unique” or whatever
you want it to be. If its not a 1, then it must be a duplicate, so
you can either be a word but I am going to put a 0 in, say ok and
now I copy it down. So what you’ll see here is that all unique
transaction numbers, get 1s. Wherever there are duplicates, you now
get 0s. If you wanted to now, you could set up the data filter. You
could say just show me the 1s and you will see only the unique
numbers. What we can now do is take it a step further. What we want
to do is uniquely identify the unique items. So what we want to be
able to say is- give an absolutely unique number to the lines
that are unique – so for example, that will be a 1, that
will be a 2. When it gets down here where there are duplicates, we
don’t actually want numbers for these, but we want unique numbers
here. This will allow you to do things like “VLOOKUP”
on the set of data. The way you can do that
is by using an “IF”
function. So let’s activate the function wizard, get the “IF” function. What we want to do now, is every time we see a 1, in
this first instance column, I want to have a unique number so I’ll
take a number and add 1,
if there’s not a 1, and
there’s a 0, then I want Excel to ignore it and put a 0. So the
logical test is, is this cell = 1. If its true, and again we are
going to use a growing boundary – I want it to sum. So we go from
here to itself (I am just going to freeze that). If its false I want
to put a 0. Now this is the key – the easiest way to show what
this is doing is to copy it down – say ok, copy it down. What
you’ll see the first time it finds a
1 so it adds the 1 in if
you go down again – it sees a 1 so it adds these 2 together. If we
go down again, it sees a 1 so adds these 3 together. When it gets to
the first 0, it sees 0 and we have told it- please make it a 0. When
it finally finds another 1, it then goes and adds all of them
together. What you can now see, is that you have got an absolutely
unique number for every unique row of information and you’ll see
it goes up to 13. At
this point now it is possible to extract all this information using
something like a VLOOKUP command. What you could do is using a VLOOKUP command you could set up a separate spreadsheet, have a
grouping of numbers – 1,2 3 etc and all the VLOOKUP will do is it will look on this table, find the unique item
called 1 and maybe pull through this set of information. Another
problem with duplicate values is that sometimes you are not looking
for a duplicate value at this level, you maybe want it to be the
same transaction number, the date, etc. For this methodology what
you can then do is you need to create a single cell that represents
absolute uniqueness. The way you will see I
have done it here is I have used CONCATENATE
and I’ve joined all the cells together in this case - so what
I’m going to be looking for is the exact duplicate where columns
A, B, C, D etc are the same. The only way to do this is to create a
new column, create this unique key and then follow through and do
the exact same logic here (count how many times you’ve seen this
before).
|