2016-04-29

Python 操作numpy 矩阵代替for循环

具体问题：

上一篇文章中，使用kmeans聚类后，我得到了一个存放4个中心点颜色值的4乘3矩阵。为了得到这个矩阵中，颜色残差最大的那个分类的索引，我必须使第三列减去（第一+第二列）的均值。最后找出色差最大的值，并返回索引。
4乘3矩阵中，使用for循环好像没什么问题，但是如果这个矩阵是1000乘以1000呢？那么循环次数达到了惊人的10^6次。

方法：

一开始为了快速得到结果，我使用了这样一个代码：C++的思路是使用for循环

def find_max(center):
	print("中心点坐标为")
	print(center)
	m,n = np.shape(center)
 	center_zero = np.float32(center)
 	red_max = (center_zero[0][2] - (center_zero[0][0] + center_zero[0][1]) / 2.0)
 	print red_max
 	idmax = 0
 	#循环次数为行数
 	for i in range(1, n):
 		print(center_zero[i][2] - (center_zero[i][0] + center_zero[i][1]) / 2.0)
 		if (center_zero[i][2] - (center_zero[i][0] + center_zero[i][1]) / 2.0) > red_max:
			 red_max = (center_zero[i][2] - (center_zero[i][0] + center_zero[i][1]) / 2.0)
			 idmax = i
 	zero = np.zeros([m,n])
 	zero[idmax] = center[idmax]
 	print("选取第[{}]类".format(idmax + 1))
    print("坐标值为")
    print(zero)

运行结果：

cen = np.float32(np.array([[137,143,134],[94,115,107],[52,81,107],[111,127,117]]))
find_max(cen)
>>> 中心点坐标为
>>>	[[137 143 134]
	 [ 94 115 107]
	 [ 52  81 107]
	 [111 127 117]]
>>> -6.0
>>> 2.5
>>> 40.5
>>> -2.0
>>> 选取第[3]类为鱼
>>> 坐标值为
>>>	[[   0.    0.    0.]
	 [   0.    0.    0.]
	 [  52.   81.  107.]
	 [   0.    0.    0.]]
>>> [Finished in 0.4s]

numpy的操作

numpy是python的一个数据操作库，可以像matlab一样操作数组跟矩阵。

def find_max_numpy(center):	
	print("中心点坐标为")	
	m,n = np.shape(center)
	center_res = center[:,2]-(center[:,1]+center[:,0])/2
	print ("结果为[{}]".format(center_res))
	imax = center_res.max() #找到最大值
	idmax = np.where(center_res == imax) #得到最大值的索引位置
	idmax = np.int32(idmax) #由于它得到的是一个索引位置，所以需要转化成int类型
	zero = np.zeros([m,n])
	zero[idmax] = center[idmax]
	print("选取第[{}]类".format(idmax + 1))
	print("坐标值为")
	print(zero)

结果

cen = np.float32(np.array([[137,143,134],[94,115,107],[52,81,107],[111,127,117]]))
find_max_numpy(cen)
>>>  中心点坐标为
>>>	[[137 143 134]
	 [ 94 115 107]
	 [ 52  81 107]
	 [111 127 117]]
>>> 结果为 [ -6.    2.5  40.5  -2. ]
>>> 选取第[[[3]]]类
>>> 坐标值为
>>>	[[   0.    0.    0.]
	 [   0.    0.    0.]
	 [  52.   81.  107.]
	 [   0.    0.    0.]]
>>> [Finished in 0.1s]

运行速度

有时间可以计算下2个function的复杂度，第一个为O(n^2）,第二个矩阵操作应该快得多，但具体的复杂度是多少还真不知道。从时间上来说也很明显，第一个0.4秒，第二个0.1秒