2010年8月30日 星期一

trace netfilter conntrack nf_conntrack_hash


For some purpose, I need to reduce netfilter connect track time as 1 to expire ASAP.
First all, I need to know how to list /proc/net/ip_conntrack.

在2.6.x早期還是用nf_conntrack_hash為bucket的hash list,所以還看得到如下的global varibles

net/netfilter/nf_conntrack_core.c
struct hlist_head *nf_conntrack_hash __read_mostly;
EXPORT_SYMBOL_GPL(nf_conntrack_hash);
struct nf_conn nf_conntrack_untracked __read_mostly;
EXPORT_SYMBOL_GPL(nf_conntrack_untracked);
新的netfilter會將ipv4和ipv6結合起來, 且因為新的net_namespace架構始得netfilter的connect track有一些變化。
namespace的架構主要是因為linux kernel要support virtual machine,好處是security及virtualize
,但程式碼變得更難理解,(我看不太懂),且上面的這些原本是global varibles也hidden到net_namespace裡了…

anyway,I just want to resolve my problem in project: I need to reset connect expire time in traced sessions.(kernel 2.6.28x)
First, I trace the module: nf_conntrack_ipv4.ko, when module initical, it will create /proc/net/ip_conntrack and /proc/net/nf_conntrack
nf_conntrack_l3proto_ipv4_compat.c
static int __net_init ip_conntrack_net_init(struct net *net)
{
        struct proc_dir_entry *proc, *proc_exp, *proc_stat;

        proc = proc_net_fops_create(net, "ip_conntrack", 0440, &ct_file_ops);
        if (!proc)
                goto err1;

        proc_exp = proc_net_fops_create(net, "ip_conntrack_expect", 0440,
                                        &ip_exp_file_ops);
        if (!proc_exp)
                goto err2;

        proc_stat = proc_create("ip_conntrack", S_IRUGO,
                                net->proc_net_stat, &ct_cpu_seq_fops);


nf_conntrack_l3proto_ipv4_compat.c
static const struct file_operations ct_file_ops = {
        .owner   = THIS_MODULE,
        .open    = ct_open,
        .read    = seq_read,
        .llseek  = seq_lseek,
        .release = seq_release_net,
};


nf_conntrack_l3proto_ipv4_compat.c

static int ct_open(struct inode *inode, struct file *file)
{
        return seq_open_net(inode, file, &ct_seq_ops,
                            sizeof(struct ct_iter_state));//private data stored in seq_xxx
}


nf_conntrack_l3proto_ipv4_compat.c

static const struct seq_operations ct_seq_ops = {
        .start = ct_seq_start,
        .next  = ct_seq_next,
        .stop  = ct_seq_stop,
        .show  = ct_seq_show
};
fs/proc/proc_net.c
int seq_open_net(struct inode *ino, struct file *f,
                 const struct seq_operations *ops, int size)
{
        struct net *net;
        struct seq_net_private *p;

        BUG_ON(size < sizeof(*p));

        net = get_proc_net(ino);
        if (net == NULL)
                return -ENXIO;

        p = __seq_open_private(f, ops, size);
        if (p == NULL) {
                put_net(net);
                return -ENOMEM;
        }
#ifdef CONFIG_NET_NS
        p->net = net;
#endif
        return 0;
}
EXPORT_SYMBOL_GPL(seq_open_net);
Finally, the ct_seq_show dispaly each of session.

static int ct_seq_show(struct seq_file *s, void *v)
{
        const struct nf_conntrack_tuple_hash *hash = v;
        const struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(hash);
        const struct nf_conntrack_l3proto *l3proto;
        const struct nf_conntrack_l4proto *l4proto;

        NF_CT_ASSERT(ct);

        /* we only want to print DIR_ORIGINAL */
        if (NF_CT_DIRECTION(hash))
                return 0;
        if (nf_ct_l3num(ct) != AF_INET)
                return 0;

        l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct));
        NF_CT_ASSERT(l3proto);
        l4proto = __nf_ct_l4proto_find(nf_ct_l3num(ct), nf_ct_protonum(ct));
        NF_CT_ASSERT(l4proto);
Now!, I try to modify expire time of each session.by mod_timer, eg:
 mod_timer(&ct->timeout, jiffies + HZ/2); /* update expire as 0.5sec. */
after recompiled mdoules, just install it
#insmod nf_conntrack_ipv4.ko
then read again to run mod_timer for each sessions
#cat /proc/net/ip_conntrack
Great!, all sessions  will be deleted after 0.5 sec


--------------------------------------------------------------------------

Extra topic from this case:

net/core/net_namespace.c
int register_pernet_subsys(struct pernet_operations *ops)
{
        int error;
        mutex_lock(&net_mutex);
        error =  register_pernet_operations(first_device, ops);
        mutex_unlock(&net_mutex);
        return error;
}
EXPORT_SYMBOL_GPL(register_pernet_subsys);

include/net/net_namespace.h
struct pernet_operations {
        struct list_head list;
        int (*init)(struct net *net);
        void (*exit)(struct net *net);
};
register_pernet_operations: will call method init from each of net_namespace which regiestered.
But, where or how to get hash bucket by net namesapce....?

-----------------------------------
Used functions in this case
/include/linux/moduleparam.h
#define module_param_call(name, set, get, arg, perm)                          \ __module_param_call(MODULE_PARAM_PREFIX, name, set, get, arg, perm)
rcu_dereference: include/linux/rcupdate.h
#define rcu_dereference(p)     ({ \
                                typeof(p) _________p1 = ACCESS_ONCE(p); \
                                smp_read_barrier_depends(); \
                                (_________p1); \
                                })
http://rd-life.blogspot.com/2009/05/rcu_26.html
http://lxr.linux.no/#linux+v2.6.28/Documentation/RCU/whatisRCU.txt#L122


2010年8月23日 星期一

PF_KEYv2 to cipher support internal

最近有個project因為kernel的cipher沒辦法support,所以要trace一下openswan和kernel modeules之前的operations

Frist one, I need to understand openswan how to communicate with kernel SA:
So I found it is through PF_KEYv2 socket family to interface key engine, see RFC2367

Then I found a open source sample from svn repository:
    http://xbq-code-repository.googlecode.com/svn/trunk/unpcode
This is come from sample code that famous "UNIX netorking programming" v1, 3rd.

The only thing you need to modify is the header file path is from net/pfkeyv2.h to linux/pfkeyv2.h
I only compile lib and libfree subdirecotry within unpcodoe.

You can use the 「register」program to get cipher support list from kernel throught PF_KEYv2 protocol.

/tmp/rootfs # ./register -t esp
Sending register message:
SADB Message Register, errno 0, satype IPsec ESP, seq 0, pid 17534

Reply returned:
SADB Message Register, errno 0, satype IPsec ESP, seq 0, pid 17534
 Supported authentication algorithms:
  Null ivlen 0 bits 0-0
  HMAC-MD5 ivlen 0 bits 128-128
  HMAC-SHA-1 ivlen 0 bits 160-160
 Supported encryption algorithms:
  Null ivlen 0 bits 0-0
  DES-CBC ivlen 8 bits 64-64
  3DES-CBC ivlen 8 bits 192-192
  [Unknown encryption algorithm 12] ivlen 8 bits 128-256
------------------------------------------------------------------------------------------

Now, I still can not get blowfish, twofish, etc.. cipher support from PF_KEYv2, even I get list form /proc/crypto

/tmp/rootfs # cat /proc/crypto |grep driver
driver       : authenc(hmac(sha1-ubicom32),cbc-aes-ubicom32)
....
driver       : michael_mic-generic
driver       : ecb(arc4-generic)
driver       : krng
driver       : seed-generic
driver       : arc4-generic
driver       : cast6-generic
driver       : cast5-generic
driver       : tnepres-generic
driver       : serpent-generic
driver       : twofish-generic
driver       : blowfish-generic
driver       : sha1-generic
...
So the next step is to get understand  the kernel layers of crypto, xfrm and af_keyv2.

net/key/af_key.c: An implamentation of AF_KEYv2
The AF_KEYv2 type 'REGISTER' will call

     static int pfkey_register(struct sock *sk, struct sk_buff *skb, struct sadb_msg *hdr, void **ext_hdrs)

then it call

     static struct sk_buff *compose_sadb_supported(struct sadb_msg *orig,
                                              gfp_t allocation)

Of course, the main task of  compose_sadb_supported is to compose auth and crypto supported list into skb.


The important subfounctions called by compose_sadb_supported are
  1. xfrm_count_auth_supported: to get the number of auth algs supported
  2. xfrm_count_enc_supported: to get the number of crypto algs supported
  3. if auth present, for each  xfrm_aalg_get_byidx
  4. if crypto present, for each  xfrm_ealg_get_byidx
another interesting functions is "void xfrm_probe_algs(void)"
It probes/init internal static lists of xfrm layer :aalg_list, ealg_list, calg_list,
by  crypto_has_hash(), crypto_has_blkcipher(), crypto_has_comp().
and the crypto_has_xxx() defined on include/linux/crypto.h eg:
static inline int crypto_has_blkcipher(const char *alg_name, u32 type, u32 mask)
{
        type &= ~CRYPTO_ALG_TYPE_MASK;
        type |= CRYPTO_ALG_TYPE_BLKCIPHER;
        mask |= CRYPTO_ALG_TYPE_MASK;

        return crypto_has_alg(alg_name, type, mask);
}
crypto_has_blkcipher() inital cipher mask for block cipher then look for by crypto API. Now let us trace them
crypto/api.c,
int crypto_has_alg(const char *name, u32 type, u32 mask)
{
        int ret = 0;
        struct crypto_alg *alg = crypto_alg_mod_lookup(name, type, mask);

        if (!IS_ERR(alg)) {
                crypto_mod_put(alg);
                ret = 1;
        }

        return ret;
}
struct crypto_alg *crypto_alg_mod_lookup(const char *name, u32 type, u32 mask)
{
        struct crypto_alg *alg;
        struct crypto_alg *larval;
        int ok;

        if (!(mask & CRYPTO_ALG_TESTED)) {
                type |= CRYPTO_ALG_TESTED;
                mask |= CRYPTO_ALG_TESTED;
        }

        larval = crypto_larval_lookup(name, type, mask);
        if (IS_ERR(larval) || !crypto_is_larval(larval))
                return larval;

        ok = crypto_probing_notify(CRYPTO_MSG_ALG_REQUEST, larval);

        if (ok == NOTIFY_STOP)
                alg = crypto_larval_wait(larval);
        else {
                crypto_mod_put(larval);
                alg = ERR_PTR(-ENOENT);
        }
        crypto_larval_kill(larval);
        return alg;
}
struct crypto_alg *crypto_larval_lookup(const char *name, u32 type, u32 mask)
{
        struct crypto_alg *alg;

        if (!name)
                return ERR_PTR(-ENOENT);

        mask &= ~(CRYPTO_ALG_LARVAL | CRYPTO_ALG_DEAD);
        type &= mask;

        alg = try_then_request_module(crypto_alg_lookup(name, type, mask),
                                      name);
        if (alg)
                return crypto_is_larval(alg) ? crypto_larval_wait(alg) : alg;

        return crypto_larval_add(name, type, mask);
}

int crypto_probing_notify(unsigned long val, void *v)
{
        int ok;

        ok = blocking_notifier_call_chain(&crypto_chain, val, v);
        if (ok == NOTIFY_DONE) {
                request_module("cryptomgr");
                ok = blocking_notifier_call_chain(&crypto_chain, val, v);
        }

        return ok;
}
crypto/internal.h
static inline int crypto_is_larval(struct crypto_alg *alg)
{
        return alg->cra_flags & CRYPTO_ALG_LARVAL;
}
    net/xfrm/xfrm_algo.c
    int xfrm_count_enc_supported(void)
    {
            int i, n;

            for (i = 0, n = 0; i < ealg_entries(); i++)
                    if (ealg_list[i].available)
                            n++;
            return n;
    }

    struct xfrm_algo_desc *xfrm_ealg_get_byidx(unsigned int idx)
    {
            if (idx >= ealg_entries())
                    return NULL;

            return &ealg_list[idx];
    }
    EXPORT_SYMBOL_GPL(xfrm_ealg_get_byidx);

    The static struct xfrm_algo_desc ealg_list and aalg_list are static array to
    describe algs of auth,crypto, compress, eg:
    net/xfrm/xfrm_algo.c
    static struct xfrm_algo_desc ealg_list[] = {
    {
            .name = "ecb(cipher_null)",
            .compat = "cipher_null",

            .uinfo = {
                    .encr = {
                            .blockbits = 8,
                            .defkeybits = 0,
                    }
            },

            .desc = {
                    .sadb_alg_id =  SADB_EALG_NULL,
                    .sadb_alg_ivlen = 0,
                    .sadb_alg_minbits = 0,
                    .sadb_alg_maxbits = 0
            }
    },
    {
            .name = "cbc(des)",
            .compat = "des",

            .uinfo = {
                    .encr = {
                            .blockbits = 64,
                            .defkeybits = 64,
                    }
            },

            .desc = {
                    .sadb_alg_id = SADB_EALG_DESCBC,
                    .sadb_alg_ivlen = 8,
                    .sadb_alg_minbits = 64,
                    .sadb_alg_maxbits = 64
            }
    },
    {
            .name = "cbc(des3_ede)",
            .compat = "des3_ede", .....
    ...

    --------------------------------------------------------------------------------

    For now, I change trace toward  from AF_KEYv2 to crpyto layer.
    For example, in NULL cipher common support
    crypto/crypto_null.c
    crypto_register_alg(&cipher_null)  called in module initial. 
    crypto/algapi.c:
    __crypto_register_alg was called finially, it will add to list by below if anything ok.
       o Check cipher is registered before for anything.
       o alloc "struct crypto_larval *larval"  by crypto_larval_alloc
       o Call  crypto_mod_get to requset module if need, and add a referance count,
          then retuen to larval->adult.
            larval->adult = crypto_mod_get(alg);
       o  Now anything is ready, add to list chain.

            list_add(&alg->cra_list, &crypto_alg_list);
            list_add(&larval->alg.cra_list, &crypto_alg_list);

    crpyto/api.c

    LIST_HEAD(crypto_alg_list);
    EXPORT_SYMBOL_GPL(crypto_alg_list);
    DECLARE_RWSEM(crypto_alg_sem);
    EXPORT_SYMBOL_GPL(crypto_alg_sem);

    BLOCKING_NOTIFIER_HEAD(crypto_chain);
    EXPORT_SYMBOL_GPL(crypto_chain);
    static inline struct crypto_alg *crypto_alg_get(struct crypto_alg *alg)
    {
            atomic_inc(&alg->cra_refcnt);
            return alg;
    }
    struct crypto_alg *crypto_mod_get(struct crypto_alg *alg)
    {
            return try_module_get(alg->cra_module) ? crypto_alg_get(alg) : NULL;
    }

    crpyto/internal.h

    struct crypto_larval {
            struct crypto_alg alg;
            struct crypto_alg *adult;
            struct completion completion;
            u32 mask;
    };
    It is interest to know about
    __crypto_register_alg
     待續

    2010年8月5日 星期四

    timezone in uClibc and glibc

    http://lists.uclibc.org/pipermail/uclibc/2002-August/004010.html
    glibc 和uclibc對「time zone」處理的方式不同

    以下是對uclibc對env「TZ」 or file 「/etc/TZ」的定義
    http://www.sonoracomm.com/support/20-voice-support/107-uclibc-tz

    這是uclibc對於time zone的部分文件
    http://leaf.sourceforge.net/doc/buci-tz3.html
    其中「http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html
    這個opengroup有定義完整的格式

    在x86 pc上的glibc有以下文件
    man tzfile(5), gmtime, tzset(3)
    gmtime(3): 從time_t轉換struct tm(以UTC only)
    localtime(3): 從time_t轉換struct tm,但會call tzset(3)來轉換成當地時間

    我在uclibc上的 test 如下

    /tmp/rootfs # TZ="" date
    Thu Aug  5 02:16:46 UTC 2010


    /tmp/rootfs # cat /etc/TZ
    XXX-16

    /tmp/rootfs # date
    Thu Aug  5 18:15:42 XXX 2010
    /tmp/rootfs #
    看來前三個字元可以自定,以台北的時 差要用-16(少16hrs)