python - How to remove duplicates in a numpy array and keep its sorting

Question

Welcome To Ask or Share your Answers For Others

python - How to remove duplicates in a numpy array and keep its sorting

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to remove duplicates in a numpy array and keep its sorting

I have a list of numpy arrays and want to remove duplicates and also keep the order of my sorted data. This is my array with duplicates:

dup_arr=[np.array([[0., 10., 10.],
                   [0., 2., 30.],
                   [0., 3., 5.],
                   [0., 3., 5.],
                   [0., 3., 40.]]),
         np.array([[0., -1., -4.],
                   [0., -2., -3.],
                   [0., -3., -5.],
                   [0., -3., -6.],
                   [0., -3., -6.]])]

I tried to do it using the following code:

clean_arr=[]
for i in dup_arr:
    new_array = [tuple(row) for row in i]
    uniques = np.unique(new_array, axis=0)
    clean_arr.append(uniques)

But the problem of this method is that it changes the sort of my data and I do not want to to sort them again because it is a tough task for my real data. I want to have the following result:

clean_arr=[np.array([[0., 10., 10.],
                     [0., 2., 30.],
                     [0., 3., 5.],
                     [0., 3., 40.]]),
           np.array([[0., -1., -4.],
                     [0., -2., -3.],
                     [0., -3., -5.],
                     [0., -3., -6.]])]

But the code shuffle it. I also tried the foolowing for loops but it was not also successful because I can not iterate until the end of my data and stop the second for loop before reaching to the end of each array of my list.

clean_arr=[]
for arrays in dup_arr:
    for rows in range (len(arrays)-1):
        if np.all(arrays [rows]== arrays [rows+1]):
            continue
        else:
            dat= arrays [rows]
            clean_arr.append(dat)

In advance, I do appreciate any help and contribution.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-26T20:36:11+0000

You can simply use np.unique with axis=0. If you want to keep the order from the original sequence try this -

[i[np.sort(np.unique(i, axis=0, return_index=True)[1])] for i in dup_arr]

[array([[ 0., 10., 10.],
        [ 0.,  2., 30.],
        [ 0.,  3.,  5.],
        [ 0.,  3., 40.]]),
 array([[ 0., -1., -4.],
        [ 0., -2., -3.],
        [ 0., -3., -5.],
        [ 0., -3., -6.]])]

np.unique(i, axis=0, return_index=True)[1] returns the indexes of the unique elements.
np.sort() sorts these indexes back to original sequence in array.
[f(i) for i in dup_arr] applies the above 2 steps over each element in dup_arr.

NOTE: You will NOT be able to completely vectorize this operation (say by np.stack on this operations since it will may have variable duplicates removed from each matrix. This will cause the numpy array to have unequal shapes over an axis.

Breaking the steps as a function -

def f(a):
    indexes = np.unique(a, axis=0, return_index=True)[1]
    return a[np.sort(indexes)]

[f(i) for i in dup_arr]

Categories

python - How to remove duplicates in a numpy array and keep its sorting

python - How to remove duplicates in a numpy array and keep its sorting

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags