De duplication method of two-dimensional unordered array

Posted by sonic_2k_uk on Mon, 03 Jan 2022 23:32:53 +0100

introduction

this paper wants to solve the following problems: there is a two-dimensional matrix X = { x 1 , x 2 , x 3 . . . } ， x i ∈ R 2 X=\{x_1,x_2,x_3...\}，x_i\in R^2 X={x1​,x2​,x3​...}，xi​∈R2， x i x_i The elements in xi are out of order, i.e x i = ( a , b ) ⇔ x j = ( b , a ) x_i=(a,b)\Leftrightarrow x_j=(b,a) xi = (a,b) ⇔ xj = (b,a), you want to clear duplicate elements from X. Additional constraints: X X The first to second columns in X are injective.

Solution

Extract duplicate elements

Use the set type in python. Set is an unordered set, which has all the operational properties of the set, such as taking intersection, union, etc. define the two-dimensional matrix as the set type to get the set S 1 S_1 The order of each scalar element defined in the set is X S 2 S_2 S2, take S 1 , S 2 S_1,S_2 The intersection of S1 and S2 ＾ gives the repeated elements
code:

x=np.array([[1,3],[5,9],[3,11],[2,7],[10,12],[11,3],[4,6],[6,4],[7,2],[8,11]])
s1=set(tuple([x[i,0],x[i,1]]) for i in range(x.shape))
s2=set(tuple([x[i,1],x[i,0]]) for i in range(x.shape))
bond=s1&s2

result:

s1
Out: {(1, 3),(2, 7),(3, 11),(4, 6),(5, 9),(6, 4),(7, 2),(8, 11),(10, 12), (11, 3)}
bond
Out: {(2, 7), (3, 11), (4, 6), (6, 4), (7, 2), (11, 3)}

Delete duplicate elements

after obtaining the repeated elements, each repeated element can form a shape such as ( a , b ) , ( b , a ) (a,b),(b,a) (a,b),(b,a) pairs, so we can't directly subtract the duplicate element set from the original set, otherwise we will lose these data. Instead, we should delete any of the pairs to achieve the purpose of de duplication. Delete duplicate elements as follows:
first, we sort the duplicate element data based on the smaller value of each two-dimensional element in the array. The code and results are as follows:

temp_inter=[]
for i in bond:
temp_inter.append(i)
temp_inter=np.array(temp_inter)
temp_inter=temp_inter[temp_inter.min(axis=1).argsort()]

result:

array([[ 2,  7],
[ 7,  2],
[11,  3],
[ 3, 11],
[ 6,  4],
[ 4,  6]])

It can be seen that through the above operations, the elements at the parity position constitute a group of pairs. One of the duplicate element pairs can be taken out through the slice operation. The code and results are as follows:

temp_inter=temp_inter[::2,:]

result:

array([[ 2,  7],
[11,  3],
[ 6,  4]])

The overall code is as follows:

x=np.array([[1,3],[5,9],[3,11],[2,7],[10,12],[11,3],[4,6],[6,4],[7,2],[8,11]])
s1=set(tuple([x[i,0],x[i,1]]) for i in range(x.shape))
s2=set(tuple([x[i,1],x[i,0]]) for i in range(x.shape))
bond=s1&s2
temp_inter=[]
for i in bond:
temp_inter.append(i)
temp_inter=np.array(temp_inter)
temp_inter=temp_inter[temp_inter.min(axis=1).argsort()]
temp_inter=temp_inter[::2,:]
temp_inter=set(tuple([temp_inter[i,0],temp_inter[i,1]]) for i in range(temp_inter.shape))
s1=s1-temp_inter

summary

it is worth noting that this de duplication method has limitations, and the implicit conditions are
If the first column of a two-dimensional matrix is regarded as an independent variable and the second column as a dependent variable, the mapping in it should be injective, that is, although there are x i = ( a , b ) ⇔ x j = ( b , a ) x_i=(a,b)\Leftrightarrow x_j=(b,a) xi = (a,b) ⇔ xj = (b,a) this repeating element should not exist x i = ( a , b ) ⇔ x j = ( b , a ) , x i ′ = ( a , c ) ⇔ x j ′ = ( c , a ) x_i=(a,b)\Leftrightarrow x_j=(b,a),x'_i=(a,c)\Leftrightarrow x'_j=(c,a) xi = (a,b) ⇔ xj = (b,a),xi '= (a,c) ⇔ xj' = (c,a), otherwise problems may occur using the above code, for example:

Repeating elements:(2,4),(2,3),(3,2),(4,2)
Arrange after sorting:(2,4),(2,3),(4,2),(3,2)
Extract duplicate elements:(2,3),(3,2)

Topics: Python linear algebra pyhon