Tensorflow: несовместимые формы при создании пользовательской функции активации?

Я пытаюсь построить нейронную сеть, используя пользовательские функции активации. Я следовал приведенному здесь решению, и оно работает, когда входной и выходной векторы имеют одинаковый размер, но не при использовании разных размеров (как в функции объединения). Вот моя проблема до сих пор:

Я пытаюсь обобщить это на случай, когда вход и выход имеют разные размеры. В моем коде вход 'x' имеет размер (2,4), выход 'y' имеет размер (1,2), а функция активации MEX(.) Выполняет отображение y = MEX(x). Я вычислил градиент MEX() как d_MEX(), где d_MEX(x) имеет тот же размер, что иx, то есть (2,4). Тем не менее, я получаю эту ошибку

InvalidArgumentError (см. Выше для отслеживания): несовместимые фигуры: [1,2] против [2,4]

Разве градиент MEX(x) не должен быть того же размера, что и x? Вот мой полный код:

import tensorflow as tf
import numpy as np


# This is our target function
def MEX(x):
    '''
    :param x: is a row vector which is the concatenation of [input, beta]
    :return MEX_{beta}(x): scalar output
    '''
    # lenx = np.size(x) # Number of columns (ROW vector)

    lenx = x.shape[1]
    N = x.shape[0]

    out = np.zeros((1,N))
    for ii in range(N):
        c = x[ii,0:lenx-1]
        beta = x[ii,lenx-1]
        out[0,ii] = 1./beta * np.log( np.mean( np.exp(beta*c) ))
    return np.array(out)

# Now we should write its derivative.
def d_MEX(x):
    # lenx = np.size(x) # Number of
    lenx = x.shape[1]
    N = x.shape[0]

    out = np.zeros((N,lenx))
    for ii in range(N):
        c = x[ii,0:lenx-1]
        beta = x[ii,lenx-1]

        d_beta = np.array([0.])
        d_beta[0] = -1./beta*( MEX(np.array([x[ii,:]])) - np.mean( np.multiply( c, np.exp(beta*c)))/np.mean( np.exp(beta*c))  )
        d_c = 1./lenx*np.exp(beta*c) /np.mean( np.exp(beta*c))
        out[ii,:] = np.concatenate((d_c,d_beta), axis=0)

    return out

# The first step is making it into a numpy function, this is easy:
np_MEX = np.vectorize(MEX, excluded=['x']) # IMPORTANT!! Otherwise np.vectorize() doesnt work
np_d_MEX = np.vectorize(d_MEX, excluded=['x']) # IMPORTANT!! Otherwise np.vectorize() doesnt work

# Now we make a tensforflow function
'''
Making a numpy fct to a tensorflow fct: We will start by making np_d_MEX_32 into a tensorflow function.
There is a function in tensorflow tf.py_func(func, inp, Tout, stateful=stateful, name=name) [doc]
which transforms any numpy function to a tensorflow function, so we can use it:
'''
np_d_MEX_32 = lambda x: np_d_MEX(x=x).astype(np.float32)

def tf_d_MEX(x,name=None):
    with tf.name_scope(name, "d_MEX", [x]) as name:
        y = tf.py_func(np_d_MEX_32,
                        [x],
                        [tf.float32],
                        name=name,
                        stateful=False)
        return y[0]

'''
tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x] (and return y[0]).
The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False)
in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.
One thing to be careful of at this point is that numpy used float64 but tensorflow uses float32 so you need to convert
your function to use float32 before you can convert it to a tensorflow function otherwise tensorflow will complain.
This is why we need to make np_d_MEX_32 first.

What about the Gradients? The problem with only doing the above is that even though we now have tf_d_MEX which is the
tensorflow version of np_d_MEX, we couldn't use it as an activation function if we wanted to because tensorflow doesn't
know how to calculate the gradients of that function.

Hack to get Gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function
using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc]. Copying the code from harpone we can modify
the tf.py_func function to make it define the gradient at the same time:
'''

def py_func(func, inp, Tout, stateful=True, name=None, grad=None):

    # Need to generate a unique name to avoid duplicates:
    rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))

    tf.RegisterGradient(rnd_name)(grad)  # see _MySquareGrad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": rnd_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

'''
Now we are almost done, the only thing is that the grad function we need to pass to the above py_func function needs to
 take a special form. It needs to take in an operation, and the previous gradients before the operation and propagate
 the gradients backward after the operation.

Gradient Function: So for our MEX activation function that is how we would do it:
'''

def MEXgrad(op, grad):
    x = op.inputs[0]
    # x = op

    n_gr = tf_d_MEX(x)
    return grad * n_gr

'''
The activation function has only one input, that is why x = op.inputs[0]. If the operation had many inputs, we would
need to return a tuple, one gradient for each input. For example if the operation was a-bthe gradient with respect to a
is +1 and with respect to b is -1 so we would have return +1*grad,-1*grad. Notice that we need to return tensorflow
functions of the input, that is why need tf_d_MEX, np_d_MEX would not have worked because it cannot act on
tensorflow tensors. Alternatively we could have written the derivative using tensorflow functions:
'''


# Combining it all together: Now that we have all the pieces, we can combine them all together:

np_MEX_32 = lambda x: np_MEX(x=x).astype(np.float32)

def tf_MEX(x, name=None):

    with tf.name_scope(name, "MEX",[x]) as name:
        y = py_func(np_MEX_32,
                        [x],
                        [tf.float32],
                        name=name,
                        grad=MEXgrad)  # <-- here's the call to the gradient
        return y[0]


with tf.Session() as sess:

    x = tf.constant([[0.2,0.7,1.2,1.7],[0.2,0.7,1.2,1.7]])
    y = tf_MEX(x)
    tf.global_variables_initializer().run()

    print(x.eval(), y.eval(), tf.gradients(y, [x])[0].eval())

В консоли я проверил, что переменные имеют "правильные" формы:

x.eval()
Out[9]: 
array([[ 0.2       ,  0.69999999,  1.20000005,  1.70000005],
       [ 0.2       ,  0.69999999,  1.20000005,  1.70000005]], dtype=float32)
y.eval()
Out[10]: array([[ 0.83393127,  0.83393127]], dtype=float32)
tf_d_MEX(x).eval()
Out[11]: 
array([[ 0.0850958 ,  0.19909413,  0.46581003,  0.07051659],
       [ 0.0850958 ,  0.19909413,  0.46581003,  0.07051659]], dtype=float32)

1 ответ

Решение

Мой плохой, я только что нашел ошибку.

Это здесь:

def MEXgrad(op, grad):
    x = op.inputs[0]
    # x = op

    n_gr = tf_d_MEX(x)
    return n_gr

Интересно, есть ли здесь опечатка, где эта ошибка тоже есть?

Другие вопросы по тегам