dentry 缓存泄露追查

最近追查某线上服务发现一个问题:频繁的创建&删除目录或者文件,会消耗大量的物理内存。

原因是因为,目录或者文件删除后,dentry并没有立即释放,而是保存在了superblock上的lru链表上。通常只有内核认为内存紧张的时候才会回收此部分内存。

不过由于dentry结构非常小,这种case下内存增长是很慢。

// linux-2.6.32.68/fs/dcache.c
/*
 * dput - release a dentry
 * @dentry: dentry to release 
 *
 * Release a dentry. This will drop the usage count and if 
 * appropriate call the dentry unlink method as well as
 * removing it from the queues and releasing its resources. 
 * If the parent dentries were scheduled for release
 * they too may now get deleted.
 *
 * no dcache lock, please.
 */

void dput(struct dentry *dentry)
{
	if (!dentry)
		return;

repeat:
	if (atomic_read(&dentry->d_count) == 1)
		might_sleep();
	if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
		return;

	spin_lock(&dentry->d_lock);
	if (atomic_read(&dentry->d_count)) {
		spin_unlock(&dentry->d_lock);
		spin_unlock(&dcache_lock);
		return;
	}

	/*
	 * AV: ->d_delete() is _NOT_ allowed to block now.
	 */
	if (dentry->d_op && dentry->d_op->d_delete) {
		if (dentry->d_op->d_delete(dentry))
			goto unhash_it;
	}
	/* Unreachable? Get rid of it */
 	if (d_unhashed(dentry))
		goto kill_it;
  	if (list_empty(&dentry->d_lru)) {
  		dentry->d_flags |= DCACHE_REFERENCED;
                // 这里把dentry放到sb的lru链表里
		dentry_lru_add(dentry);
  	}
 	spin_unlock(&dentry->d_lock);
	spin_unlock(&dcache_lock);
	return;

unhash_it:
	__d_drop(dentry);
kill_it:
	/* if dentry was on the d_lru list delete it from there */
	dentry_lru_del(dentry);
	dentry = d_kill(dentry);
	if (dentry)
		goto repeat;
}

而内存回收有两种方式:

  1. 内核发现内存不足时主动处罚。例如分配不了内存时,达到某种指定的阈值时等等。
  2. 手动触发 sysctl vm.drop_caches。

这里只讨论第二种方法。内核在初始化的时候,注册了一个dcache_shrinker来释放dcache内存。shrink_dcache_memory的最主要的逻辑时prune_dcache()函数

static struct shrinker dcache_shrinker = {
	.shrink = shrink_dcache_memory,
	.seeks = DEFAULT_SEEKS,
};
static void __init dcache_init(void)
{
	int loop;
	/* 
	 * A constructor could be added for stable state like the lists,
	 * but it is probably not worth it because of the cache nature
	 * of the dcache. 
	 */
	dentry_cache = KMEM_CACHE(dentry,
		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
	// 注册钩子函数,当用户手动触发内存回收时,调用dcache_shrinker
	register_shrinker(&dcache_shrinker);
	/* ... */
}

cgroup 进程调度之 cfs 时间片限制

一些资料

  1. https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt
  2. CPU bandwidth control for CFS

cpu带宽控制的原理是在给定周期内(cpu.cfs_period_us)内限制进程组的运行时间(cpu.cfs_quota_us),这两个参数值通过cpu子系统来设置。

cfs_rq相关的两个成员是 runtime_remainingruntime_expires ,前者是当前进程组的剩余时间片,后者是时间片的到期时间。

内核的做法是:

  1. 每次进程切换的时候更新当前进程的运行时间,检查进程时钟带宽是否超过阈值。
  2. 通过一个定时器,每隔固定的cpu.cfs_period_us周期,更新进程组的时钟带宽