summaryrefslogtreecommitdiff
path: root/fs/ceph
AgeCommit message (Collapse)AuthorFilesLines
2013-05-01libceph: separate read and write dataAlex Elder2-36/+41
An osd request defines information about where data to be read should be placed as well as where data to write comes from. Currently these are represented by common fields. Keep information about data for writing separate from data to be read by splitting these into data_in and data_out fields. This is the key patch in this whole series, in that it actually identifies which osd requests generate outgoing data and which generate incoming data. It's less obvious (currently) that an osd CALL op generates both outgoing and incoming data; that's the focus of some upcoming work. This resolves: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01libceph: distinguish page and bio requestsAlex Elder2-0/+5
An osd request uses either pages or a bio list for its data. Use a union to record information about the two, and add a data type tag to select between them. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01libceph: separate osd request data infoAlex Elder2-31/+32
Pull the fields in an osd request structure that define the data for the request out into a separate structure. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01libceph: don't assign page info in ceph_osdc_new_request()Alex Elder2-6/+10
Currently ceph_osdc_new_request() assigns an osd request's r_num_pages and r_alignment fields. The only thing it does after that is call ceph_osdc_build_request(), and that doesn't need those fields to be assigned. Move the assignment of those fields out of ceph_osdc_new_request() and into its caller. As a result, the page_align parameter is no longer used, so get rid of it. Note that in ceph_sync_write(), the value for req->r_num_pages had already been calculated earlier (as num_pages, and fortunately it was computed the same way). So don't bother recomputing it, but because it's not needed earlier, move that calculation after the call to ceph_osdc_new_request(). Hold off making the assignment to r_alignment, doing it instead r_pages and r_num_pages are getting set. Similarly, in start_read(), nr_pages already holds the number of pages in the array (and is calculated the same way), so there's no need to recompute it. Move the assignment of the page alignment down with the others there as well. This and the next few patches are preparation work for: http://tracker.ceph.com/issues/4127 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01ceph: simplify ceph_sync_write() page_align calculationAlex Elder1-9/+4
(This is being reposted. The first one had a problem because it erroneously added a similar change elsewhere; that change has been dropped.) The next patch in this series points out that the calculation for the number of pages in an osd request is getting done twice. It is not obvious, but the result of both calculations is identical. This patch simplifies one of them--as a separate step--to make it clear that the transformation in the next patch is valid. In ceph_sync_write() there is some magic that computes page_align for an osd request. But a little analysis shows it can be simplified. First, we have: io_align = pos & ~PAGE_MASK; which is used here: page_align = (pos - io_align + buf_align) & ~PAGE_MASK; Note (pos - io_align) simply rounds "pos" down to the nearest multiple of the page size. We also have: buf_align = (unsigned long)data & ~PAGE_MASK; Adding buf_align to that rounded-down "pos" value will stay within the same page; the result will just be offset by the page offset for the "data" pointer. The final mask therefore leaves just the value of "buf_align". One more simplification. Note that the result of calc_pages_for() is invariant of which page the offset starts in--the only thing that matters is the offset within the starting page. We will have put the proper page offset to use into "page_align", so just use that in calculating num_pages. This resolves: http://tracker.ceph.com/issues/4166 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01ceph: use calc_pages_for() in start_read()Alex Elder1-1/+1
There's a spot that computes the number of pages to allocate for a page-aligned length by just shifting it. Use calc_pages_for() instead, to be consistent with usage everywhere else. The result is the same. The reason for this is to make it clearer in an upcoming patch that this calculation is duplicated. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01libceph: no need for alignment for mds messageAlex Elder1-1/+0
Currently, incoming mds messages never use page data, which means there is no need to set the page_alignment field in the message. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01libceph: define mds_alloc_msg() methodAlex Elder1-0/+23
The only user of the ceph messenger that doesn't define an alloc_msg method is the mds client. Define one, such that it works just like it did before, and simplify ceph_con_in_msg_alloc() by assuming the alloc_msg method is always present. This and the next patch resolve: http://tracker.ceph.com/issues/4322 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01libceph: rename ceph_calc_object_layout()Alex Elder1-2/+3
The purpose of ceph_calc_object_layout() is to fill in the pool number and seed for a ceph_pg structure provided, based on a given osd map and target object id. Currently that function takes a file layout parameter, but the only thing used out of that is its pool number. Change the function so it takes a pool number rather than the full file layout structure. Only update the ceph_pg if the pool is found in the osd map. Get rid of few useless lines of code from the function while there. Since the function now very clearly just fills in the ceph_pg structure it's provided, rename it ceph_calc_ceph_pg(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01libceph: kill ceph_msg->pagelist_countAlex Elder1-1/+0
The pagelist_count field is never actually used, so get rid of it. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01ceph: acquire i_mutex in __ceph_do_pending_vmtruncateYan, Zheng3-13/+13
make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller does not hold the i_mutex, so ceph_aio_read() can call safely. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01ceph: don't early drop Fw capYan, Zheng1-25/+17
ceph_aio_write() has an optimization that marks CEPH_CAP_FILE_WR cap dirty before data is copied to page cache and inode size is updated. The optimization avoids slow cap revocation caused by balance_dirty_pages(), but introduces inode size update race. If ceph_check_caps() flushes the dirty cap before the inode size is updated, MDS can miss the new inode size. So just remove the optimization. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01ceph: revert commit 22cddde104Sage Weil3-77/+48
commit 22cddde104 breaks the atomicity of write operation, it also introduces a deadlock between write and truncate. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Conflicts: fs/ceph/addr.c
2013-05-01ceph: use I_COMPLETE inode flag instead of D_COMPLETE flagYan, Zheng5-95/+34
commit c6ffe10015 moved the flag that tracks if the dcache contents for a directory are complete to dentry. The problem is there are lots of places that use ceph_dir_{set,clear,test}_complete() while holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may sleep because they call dput(). This patch basically reverts that commit. For ceph_d_prune(), it's called with both the dentry to prune and the parent dentry are locked. So it's safe to access the parent dentry's d_inode and clear I_COMPLETE flag. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-05-01ceph: set mds_want according to cap import messageYan, Zheng1-1/+5
MDS ignores cap update message if migrate_seq mismatch, so when receiving a cap import message with higher migrate_seq, set mds_want according to the cap import message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01ceph: queue cap release when trimming capYan, Zheng3-3/+7
So the client will later send cap release message to MDS Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01ceph: fix LSSNAP regressionYan, Zheng1-1/+2
commit 6e8575faa8 makes parse_reply_info_extra() return -EIO for LSSNAP Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-05-01libceph: distinguish page array and pagelist countAlex Elder1-2/+2
Use distinct fields for tracking the number of pages in a message's page array and in a message's page list. Currently only one or the other is used at a time, but that will be changing soon. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-03-03fs: Limit sys_mount to only request filesystem modules.Eric W. Biederman1-0/+1
Modify the request_module to prefix the file system type with "fs-" and add aliases to all of the filesystems that can be built as modules to match. A common practice is to build all of the kernel code and leave code that is not commonly needed as modules, with the result that many users are exposed to any bug anywhere in the kernel. Looking for filesystems with a fs- prefix limits the pool of possible modules that can be loaded by mount to just filesystems trivially making things safer with no real cost. Using aliases means user space can control the policy of which filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf with blacklist and alias directives. Allowing simple, safe, well understood work-arounds to known problematic software. This also addresses a rare but unfortunate problem where the filesystem name is not the same as it's module name and module auto-loading would not work. While writing this patch I saw a handful of such cases. The most significant being autofs that lives in the module autofs4. This is relevant to user namespaces because we can reach the request module in get_fs_type() without having any special permissions, and people get uncomfortable when a user specified string (in this case the filesystem type) goes all of the way to request_module. After having looked at this issue I don't think there is any particular reason to perform any filtering or permission checks beyond making it clear in the module request that we want a filesystem module. The common pattern in the kernel is to call request_module() without regards to the users permissions. In general all a filesystem module does once loaded is call register_filesystem() and go to sleep. Which means there is not much attack surface exposed by loading a filesytem module unless the filesystem is mounted. In a user namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT, which most filesystems do not set today. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Reported-by: Kees Cook <keescook@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-28Merge branch 'for-linus' of ↵Linus Torvalds11-105/+265
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "A few groups of patches here. Alex has been hard at work improving the RBD code, layout groundwork for understanding the new formats and doing layering. Most of the infrastructure is now in place for the final bits that will come with the next window. There are a few changes to the data layout. Jim Schutt's patch fixes some non-ideal CRUSH behavior, and a set of patches from me updates the client to speak a newer version of the protocol and implement an improved hashing strategy across storage nodes (when the server side supports it too). A pair of patches from Sam Lang fix the atomicity of open+create operations. Several patches from Yan, Zheng fix various mds/client issues that turned up during multi-mds torture tests. A final set of patches expose file layouts via virtual xattrs, and allow the policies to be set on directories via xattrs as well (avoiding the awkward ioctl interface and providing a consistent interface for both kernel mount and ceph-fuse users)." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits) libceph: add support for HASHPSPOOL pool flag libceph: update osd request/reply encoding libceph: calculate placement based on the internal data types ceph: update support for PGID64, PGPOOL3, OSDENC protocol features ceph: update "ceph_features.h" libceph: decode into cpu-native ceph_pg type libceph: rename ceph_pg -> ceph_pg_v1 rbd: pass length, not op for osd completions rbd: move rbd_osd_trivial_callback() libceph: use a do..while loop in con_work() libceph: use a flag to indicate a fault has occurred libceph: separate non-locked fault handling libceph: encapsulate connection backoff libceph: eliminate sparse warnings ceph: eliminate sparse warnings in fs code rbd: eliminate sparse warnings libceph: define connection flag helpers rbd: normalize dout() calls rbd: barriers are hard rbd: ignore zero-length requests ...
2013-02-26Merge branch 'for-linus' of ↵Linus Torvalds7-27/+63
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile (part one) from Al Viro: "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent locking violations, etc. The most visible changes here are death of FS_REVAL_DOT (replaced with "has ->d_weak_revalidate()") and a new helper getting from struct file to inode. Some bits of preparation to xattr method interface changes. Misc patches by various people sent this cycle *and* ocfs2 fixes from several cycles ago that should've been upstream right then. PS: the next vfs pile will be xattr stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) saner proc_get_inode() calling conventions proc: avoid extra pde_put() in proc_fill_super() fs: change return values from -EACCES to -EPERM fs/exec.c: make bprm_mm_init() static ocfs2/dlm: use GFP_ATOMIC inside a spin_lock ocfs2: fix possible use-after-free with AIO ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero target: writev() on single-element vector is pointless export kernel_write(), convert open-coded instances fs: encode_fh: return FILEID_INVALID if invalid fid_type kill f_vfsmnt vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op nfsd: handle vfs_getattr errors in acl protocol switch vfs_getattr() to struct path default SET_PERSONALITY() in linux/elf.h ceph: prepopulate inodes only when request is aborted d_hash_and_lookup(): export, switch open-coded instances 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate() 9p: split dropping the acls from v9fs_set_create_acl() ...
2013-02-26libceph: update osd request/reply encodingSage Weil1-25/+6
Use the new version of the encoding for osd requests and replies. In the process, update the way we are tracking request ops and reply lengths and results in the struct ceph_osd_request. Update the rbd and fs/ceph users appropriately. The main changes are: - we keep pointers into the request memory for fields we need to update each time the request is sent out over the wire - we keep information about the result in an array in the request struct where the users can easily get at it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-26libceph: calculate placement based on the internal data typesSage Weil1-4/+1
Instead of using the old ceph_object_layout struct, update our internal ceph_calc_object_layout method to use the ceph_pg type. This allows us to pass the full 32-bit precision of the pgid.seed to the callers. It also allows some callers to avoid reaching into the request structures for the struct ceph_object_layout fields. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-26ceph: update support for PGID64, PGPOOL3, OSDENC protocol featuresSage Weil1-4/+8
Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features. These have been present in ceph.git since v0.42, Feb 2012. Require these features to simplify support; nobody is running older userspace. Note that the new request and reply encoding is still not in place, so the new code is not yet functional. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-26libceph: decode into cpu-native ceph_pg typeSage Weil1-2/+3
Always decode data into our cpu-native ceph_pg type that has the correct field widths. Limit any remaining uses of ceph_pg_v1 to dealing with the legacy protocol. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-26libceph: rename ceph_pg -> ceph_pg_v1Sage Weil1-1/+1
Rename the old version this type to distinguish it from the new version. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
2013-02-26fs: encode_fh: return FILEID_INVALID if invalid fid_typeNamjae Jeon1-2/+2
This patch is a follow up on below patch: [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63 Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Vivek Trivedi <t.vivek@samsung.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26ceph: prepopulate inodes only when request is abortedSage Weil1-2/+38
If r_aborted is true, we do not hold the dir i_mutex, and cannot touch the dcache. However, we still need to update the inodes with the state returned by the MDS. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25Merge branch 'for-linus' of ↵Linus Torvalds5-23/+29
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace and namespace infrastructure changes from Eric W Biederman: "This set of changes starts with a few small enhnacements to the user namespace. reboot support, allowing more arbitrary mappings, and support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the user namespace root. I do my best to document that if you care about limiting your unprivileged users that when you have the user namespace support enabled you will need to enable memory control groups. There is a minor bug fix to prevent overflowing the stack if someone creates way too many user namespaces. The bulk of the changes are a continuation of the kuid/kgid push down work through the filesystems. These changes make using uids and gids typesafe which ensures that these filesystems are safe to use when multiple user namespaces are in use. The filesystems converted for 3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs. The changes for these filesystems were a little more involved so I split the changes into smaller hopefully obviously correct changes. XFS is the only filesystem that remains. I was hoping I could get that in this release so that user namespace support would be enabled with an allyesconfig or an allmodconfig but it looks like the xfs changes need another couple of days before it they are ready." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits) cifs: Enable building with user namespaces enabled. cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t cifs: Convert struct cifs_sb_info to use kuids and kgids cifs: Modify struct smb_vol to use kuids and kgids cifs: Convert struct cifsFileInfo to use a kuid cifs: Convert struct cifs_fattr to use kuid and kgids cifs: Convert struct tcon_link to use a kuid. cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t cifs: Convert from a kuid before printing current_fsuid cifs: Use kuids and kgids SID to uid/gid mapping cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc cifs: Use BUILD_BUG_ON to validate uids and gids are the same size cifs: Override unmappable incoming uids and gids nfsd: Enable building with user namespaces enabled. nfsd: Properly compare and initialize kuids and kgids nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids nfsd: Modify nfsd4_cb_sec to use kuids and kgids nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion nfsd: Convert nfsxdr to use kuids and kgids nfsd: Convert nfs3xdr to use kuids and kgids ...
2013-02-25ceph: eliminate sparse warnings in fs codeAlex Elder1-2/+2
Fix the causes for sparse warnings reported in the ceph file system code. Here there are only two (and they're sort of silly but they're easy to fix). This partially resolves: http://tracker.ceph.com/issues/4184 Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-22new helper: file_inode(file)Al Viro5-23/+23
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22ceph: fix statvfs fr_sizeSage Weil2-2/+7
Different versions of glibc are broken in different ways, but the short of it is that for the time being, frsize should == bsize, and be used as the multiple for the blocks, free, and available fields. This mirrors what is done for NFS. The previous reporting of the page size for frsize meant that newer glibc and df would report a very small value for the fs size. Fixes http://tracker.ceph.com/issues/3793. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-19ceph: remove a few bogus declarationsAlex Elder1-7/+1
There are three ceph page vector functions declared in "fs/ceph/super.h" that don't belong there. They're probably left over from some long-ago code reorganization. They're properly declared in "include/linux/ceph/libceph.h" so just delete the ones in "super.h". This and the next few commits resolve: http://tracker.ceph.com/issues/4053 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18libceph: update ceph_mds_state_name() and ceph_mds_op_name()Alex Elder1-0/+4
Update ceph_mds_state_name() and ceph_mds_op_name() to include the newly-added definitions in "ceph_fs.h", and to match its counterpart in the user space code. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_new_request() "num_reply" parameterAlex Elder2-3/+3
The "num_reply" parameter to ceph_osdc_new_request() is never used inside that function, so get rid of it. Note that ceph_sync_write() passes 2 for that argument, while all other callers pass 1. It doesn't matter, but perhaps someone should verify this doesn't indicate a problem. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_writepages() "flags" parameterAlex Elder1-2/+1
There is only one caller of ceph_osdc_writepages(), and it always passes 0 as its "flags" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with 0. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_writepages() "dosync" parameterAlex Elder1-1/+1
There is only one caller of ceph_osdc_writepages(), and it always passes 0 as its "dosync" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with 0. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-18ceph: kill ceph_osdc_writepages() "nofail" parameterAlex Elder1-1/+1
There is only one caller of ceph_osdc_writepages(), and it always passes the value true as its "nofail" argument. Get rid of that argument and replace its use in ceph_osdc_writepages() with the constant value true. This and a number of cleanup patches that follow resolve: http://tracker.ceph.com/issues/4126 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-13ceph: implement hidden per-field ceph.*.layout.* vxattrsSage Weil1-0/+59
Allow individual fields of the layout to be fetched via getxattr. The ceph.dir.layout.* vxattr with "disappear" if the exists_cb indicates there no dir layout set. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: add ceph.dir.layout vxattrSage Weil1-0/+8
This virtual xattr will only appear when there is a dir layout policy set on the directory. It can be set via setxattr and removed via removexattr (implemented by the MDS). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: change ceph.file.layout.* implementation, contentSage Weil1-14/+53
Implement a new method to generate the ceph.file.layout vxattr using the new framework. Use 'stripe_unit' instead of 'chunk_size'. Include pool name, either as a string or as an integer. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: fix listxattr handling for vxattrsSage Weil1-6/+13
Only include vxattrs in the result if they are not hidden and exist (as determined by the exists_cb callback). Note that the buffer size we return when 0 is passed in always includes vxattrs that *might* exist, forming an upper bound. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: fix getxattr vxattr handlingSage Weil1-12/+8
Change the vxattr handling for getxattr so that vxattrs are checked prior to any xattr content, and never after. Enforce vxattr existence via the exists_cb callback. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: add exists_cb to vxattr structSage Weil1-0/+2
Allow for a callback to dynamically determine if a vxattr exists for the given inode. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: pass ceph.* removexattrs through to MDSSage Weil1-0/+5
If we do not explicitly recognized a vxattr (e.g., as readonly), pass the request through to the MDS and deal with it there. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: pass unhandled ceph.* setxattrs through to MDSSage Weil1-0/+5
If we do not specifically understand a setxattr on a ceph.* virtual xattr, send it through to the MDS. This allows us to implement new functionality via the MDS without direct support on the client side. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: support hidden vxattrsSage Weil1-9/+11
Add ability to flag virtual xattrs as hidden, such that you can getxattr them but they do not appear in listxattr. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-13ceph: remove 'ceph.layout' virtual xattrSage Weil1-7/+0
This has been deprecated since v3.3, 114fc474. Kill it. Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Sam Lang <sam.lang@inktank.com>
2013-02-12ceph: Convert kuids and kgids before printing them.Eric W. Biederman2-4/+8
Before printing kuid and kgids values convert them into the initial user namespace. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-12ceph: Convert struct ceph_mds_request to use kuid_t and kgid_tEric W. Biederman2-4/+4
Hold the uid and gid for a pending ceph mds request using the types kuid_t and kgid_t. When a request message is finally created convert the kuid_t and kgid_t values into uids and gids in the initial user namespace. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>