Monday, June 27, 2016

verify million files MD5 with spark

Partitioning will not be helpful in all applications—for
example, if a given RDD is scanned only once, there is no point in partitioning it in
advance. It is useful only when a dataset is reused multiple times in key-oriented
operations such as joins. 
Spark’s partitioning is available on all RDDs of key/value pairs, and causes the system
to group elements based on a function of each key.Although Spark does not give
explicit control of which worker node each key goes to (partly because the system is
designed to work even if specific nodes fail), it lets the program ensure that a set of
keys will appear together on some node.

Many of Spark’s operations involve shuffling data by key across the network. All of
these will benefit from partitioning.For operations that act on a single RDD, such as reduceByKey(), running on a prepartitioned
RDD will cause all the values for each key to be computed locally on a
single machine, requiring only the final, locally reduced value to be sent from each
worker node back to the master.

One issue to watch out for when passing functions is inadvertently serializing the
object containing the function. When you pass a function that is the member of an
object, or contains references to fields in an object (e.g., self.field), Spark sends the
entire object to worker nodes, which can be much larger than the bit of information
you need (see Example 3-19). Sometimes this can also cause your program to fail, if
your class contains objects that Python can’t figure out how to pickle.

The set of stages produced for a particular action is termed a job. In each case when
we invoke actions such as count(), we are creating a job composed of one or more
stages.
Once the stage graph is defined, tasks are created and dispatched to an internal
scheduler, which varies depending on the deployment mode being used. Stages in the
physical plan can depend on each other, based on the RDD lineage, so they will be
executed in a specific order. For instance, a stage that outputs shuffle data must occur
before one that relies on that data being present.
A physical stage will launch tasks that each do the same thing but on specific partitions
of data. Each task internally performs the same steps:
1. Fetching its input, either from data storage (if the RDD is an input RDD), an
existing RDD (if the stage is based on already cached data), or shuffle outputs.
2. Performing the operation necessary to compute RDD(s) that it represents. For
instance, executing filter() or map() functions on the input data, or performing
grouping or reduction.
3. Writing output to a shuffle, to external storage, or back to the driver (if it is the
final RDD of an action such as count()).

Choosing an output compression codec can have a big impact on future users of the
data. With distributed systems such as Spark, we normally try to read our data in
from multiple different machines. To make this possible, each worker needs to be
able to find the start of a new record. Some compression formats make this impossible,
which requires a single node to read in all of the data and thus can easily lead to a
bottleneck. Formats that can be easily read from multiple machines are called “splittable.”

While Spark’s textFile() method can handle compressed input, it
automatically disables splittable even if the input is compressed
such that it could be read in a splittable way. If you find yourself
needing to read in a large single-file compressed input, consider
skipping Spark’s wrapper and instead use either newAPIHadoopFile
or hadoopFile and specify the correct compression codec.

val sourceMD5 = sc.textFile("/home/ano/source.md5")

val  destMD5 = sc.textFile("/home/ano/oss.md5")

val source_pairs = sourceMD5.map(x => (x.split(" ")(0), x.split(" ")(2).replace("oss1","osstest")))

val dest_pairs = lines.map(x => (x.split(" ")(0), x.split(" ")(2)))

val invalids = countsource_pairs.subtract(dest_pairs ).count






















每台跑满了，运行周期三天，数据1.8T





原始数据*2.5 *2（fq)







单机版的速度你可以尝试这样测试，
1.把一个稍微大的文件放到内存里，停掉单机版的服务
2.配置好job.cfg, 删掉jobName配置
3.然后seq 1 100 | xargs -I {} echo "cp job.cfg job.{}.cfg;echo jobName=local_test.{} >> job.{}.cfg" | bash 复制多个job 
4.把这些job submit进去
5.启动服务，看下稳定后速度能到多少




seq 1 200 | xargs -I {} echo "cp job.cfg job.{}.cfg;echo jobName=local_test.{} >> job.{}.cfg;echo destPrefix=vpn{}/ >> job.{}.cfg;" | bash




seq 1 100 | xargs -I {} java  -jar bin/ossimport2.jar  -c conf/sys.properties submit jobs/job.{}.cfg




seq 1 100 | xargs -I {} java  -jar bin/ossimport2.jar  -c conf/sys.properties clean jobs/job.{}.cfg

Friday, June 24, 2016

the-difference-between-a-tmpfs-and-ramfs-ram-disk

https://www.jamescoyle.net/knowledge/951-the-difference-between-a-tmpfs-and-ramfs-ram-disk

Wednesday, June 22, 2016

network Tuning

For a host with a 10G NIC optimized for network paths up to 200ms RTT, and for friendlyness to single and parallel stream tools, or a 40G NIC up on paths up to 50ms RTT:

# allow testing with buffers up to 128MB
net.core.rmem_max = 134217728 
net.core.wmem_max = 134217728 
# increase Linux autotuning TCP buffer limit to 64MB
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
# increase the length of the processor input queue
net.core.netdev_max_backlog = 250000
# recommended default congestion control is htcp 
net.ipv4.tcp_congestion_control=htcp
# recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing=1

Also add this to /etc/rc.local (where N is the number for your 10G NIC):

    /sbin/ifconfig ethN txqueuelen 10000

    ifconfig ethX txqueuelen 300000

   ethtool -K eth3 gso on

   ethtool -k eth3

ip link show eth3

tuning NIC

You can use ethtool to check on the number of descriptors your NIC has, and whether the driver is configured to use them. Some sample Linux output from an Intel 10GE NIC that is using the default config is below:

[user@perfsonar ~]# ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX:          4096
RX Mini:     0
RX Jumbo:    0
TX:          4096
Current hardware settings:
RX:          256
RX Mini:     0
RX Jumbo:    0
TX:          256

Under Linux, to check which driver you are using, do this:

  ethtool -i eth0

If you're using e1000 chips (Intel 1GE, often integrated into motherboards; note that this does not apply to newer variants) the driver defaults to 256 Rx and 256 Tx descriptors. This is because early versions of the chipset only supported this. All recent versions have 4096, but the driver doesn't autodetect this. Increasing the number of descriptors can improve performance dramatically on some hosts.

You can also add this to /etc/rc.local to get the same result.

   ethtool -G ethN rx 4096 tx 4096

[1]https://fasterdata.es.net/host-tuning/linux/

Wednesday, June 15, 2016

Linux File System Read Write Performance Test

sudo apt-get install pktstat

sudo pktstat -i eth0 -nt

dstat -nt

dstat -N eth2,eth3

sudo apt-get install nethogs

sudo nethogs

http://www.cyberciti.biz/faq/fedora-sl-centos-redhat6-enable-epel-repo/

$ cd /tmp
$ wget http://mirror-fpt-telecom.fpt.net/fedora/epel/6/i386/epel-release-6-8.noarch.rpm
# rpm -ivh epel-release-6-8.noarch.rpm


How do I use EPEL repo?

Simply use the yum commands to search or install packages from EPEL repo:
# yum search nethogs
# yum update
# yum --disablerepo="*" --enablerepo="epel" install nethogs

System administrators responsible for handling Linux servers get confused at times when they are told to benchmark a file system's performance. But the main reason that this confusion happens is because, it does not matter whatever tool you use to test the file system's performance, what matter's is the exact requirement.File system's performance depends upon certain factors as follows.

The maximum rotational speed of your hard disk
The Allocated block size of a file system
Seek Time
The performance rate of the file system's metadata
The type of read/Write
Seriously speaking its wonderful to realize that various different technologies made by different people and even different companies are working together in coordination inside a single box, and we call that box a computer. And its even more wonderful to realize that hard disk's store's almost all the information available in the world in digital format. Its a very complex thing to understand how really hard disks stores our data safely. Explaining different aspects of how a hard disk, and a file system on top of it, work together is beyond the scope of this article(But i will surely give it a try with couple of my posts about themwink)

So Lets begin our tutorial on file system benchmark test.

Its advised that during this file system performance test, you must not run any other disk I/O intensive tasks. Otherwise your results about performance will be heavily deviated. Its better to stop all other process during this test.

The Simplest Performance Test Using dd command

The simplest read write performance test in Linux can be done with the help of dd command. This command is used to write or read from any block device in Linux. And you can do a lot of stuff with this command. The main plus point with this command, is that its readily available in almost all distributions out of the box. And is pretty easy to use.

With this dd command we will only be testing sequential read and sequential write.I will test the speed of my partition /dev/sda1 which is mounted on "/" (the only partition i have on my system)so can write the data to any where in my filesystem to test.

?
1
2
3
4
5
[root@slashroot2 ~]# dd if=/dev/zero of=speetest bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.0897865 seconds, 1.2 GB/s
[root@slashroot2 ~]
In the above command you will be amazed to see that you have got 1.1GB/s. But dont be happy thats falsecheeky. Becasue the speed that dd reported to us is the speed with which data was cached to RAM memory, not to the disk. So we need to ask dd command to report the speed only after the data is synced with the disk.For that we need to run the below command.

?
1
2
3
4
[root@slashroot2 ~]# dd if=/dev/zero of=speetest bs=1M count=100 conv=fdatasync
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 2.05887 seconds, 50.9 MB/s
As you can clearly see that with the attribute fdatasync the dd command will show the status rate only after the data is completely written to the disk. So now we have the actual sequencial write speed. Lets go to an amount of data size thats larger than the RAM. Lets take 200MB of data in 64kb block size.

?
1
2
3
4
[root@slashroot2 ~]# dd if=/dev/zero of=speedtest bs=64k count=3200 conv=fdatasync
3200+0 records in
3200+0 records out
209715200 bytes (210 MB) copied, 3.51895 seconds, 59.6 MB/s

as you can clearly see that the speed came to 59 MB/s. You need to note that ext3 bydefault if you do not specify the block size, gets formatted with a block size thats determined by the programes like mke2fs . You can verify yours with the following commands.

tune2fs -l /dev/sda1

dumpe2fs /dev/sda1

For testing the sequential read speed with dd command, you need to run the below command as below.

?
1
2
3
4
[root@myvm1 sarath]# dd if=speedtest of=/dev/null bs=64k count=24000
5200+0 records in
5200+0 records out
340787200 bytes (341 MB) copied, 3.42937 seconds, 99.4 MB/s
Performance Test using HDPARM

Now lets use some other tool other than dd command for our tests. We will start with hdparm command to test the speed. Hdparm tool is also available out of the box in most of the linux distribution.

?
1
2
3
4
5
[root@myvm1 ~]# hdparm -tT /dev/sda1

/dev/sda1:
Timing cached reads: 5808 MB in 2.00 seconds = 2908.32 MB/sec
Timing buffered disk reads: 10 MB in 3.12 seconds = 3.21 MB/sec

There are multiple things to understand here in the above hdparm results. the -t Option will show you the speed of reading from the cache buffer(Thats why its much much higher).

The -T option will show you the speed of reading without precached buffer(which from the above output is low 3.21 MB/sec as shown above. )

the hdparm output shows you both the cached reads and disk reads separately. As mentioned before hard disk seek time also matters a lot for your speed you can check your hard disk seek time with the following linux command. seek time is the time required by the hard disk to reach the sector where the data is stored.Now lets use this seeker tool to find out the seek time by the simple seek command.

?
1
2
3
4
5
6
7
8
[root@slashroot2 ~]# seeker /dev/sda1
Seeker v3.0, 2009-06-17, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda1 [81915372 blocks, 41940670464 bytes, 39 GB, 39997 MB, 41 GiB, 41940 MiB]
[512 logical sector size, 512 physical sector size]
[1 threads]
Wait 30 seconds..............................
Results: 87 seeks/second, 11.424 ms random access time (26606211 < offsets < 41937280284)
[root@slashroot2 ~]#
its clearly mentioned that my disk did a 86 seeks for sectors containing data per second. Thats ok for a desktop Linux machine but for servers its not at all ok.

Read Write Benchmark Test using IOZONE:

Now there is one tool out there in linux that will do all these test in one shot. Thats none other than "IOZONE". We will do some benchmark test against my /dev/sda1 with the help of iozone.Computers or servers are always purchased keeping some purpose in mind. Some servers needs to be highend performance wise, some needs to be fast in sequencial reads,and some others are ordered keeping random reads in mind. IOZONE will be very much helpful in carrying out large number of permance benchmark test against the drives. The output produced by iozone is too much brief.

The default command line option -a is used for full automatic mode, in which iozone will test block sizes ranging from 4k to 16M and file sizes ranging from 64k to 512M. Lets do a test using this -a option and see what happens.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@myvm1 ~]# iozone -a /dev/sda1
Auto Mode
Command line used: iozone -a /dev/sda1
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
<div id="xdvp"><a href="http://www.ecocertico.com/no-credit-check-direct-lenders
">creditors you never heard</a></div>
random random bkwd record stride
KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread
64 4 172945 581241 1186518 1563640 877647 374157 484928 240642 985893 633901 652867 1017433 1450619
64 8 25549 345725 516034 2199541 1229452 338782 415666 470666 1393409 799055 753110 1335973 2071017
64 16 68231 810152 1887586 2559717 1562320 791144 1309119 222313 1421031 790115 538032 694760 2462048
64 32 338417 799198 1884189 2898148 1733988 864568 1421505 771741 1734912 1085107 1332240 1644921 2788472
64 64 31775 811096 1999576 3202752 1832347 385702 1421148 771134 1733146 864224 942626 2006627 3057595
128 4 269540 699126 1318194 1525916 390257 407760 790259 154585 649980 680625 684461 1254971 1487281
128 8 284495 837250 1941107 2289303 1420662 779975 825344 558859 1505947 815392 618235 969958 2130559
128 16 277078 482933 1112790 2559604 1505182 630556 1560617 624143 1880886 954878 962868 1682473 2464581
128 32 254925 646594 1999671 2845290 2100561 554291 1581773 723415 2095628 1057335 1049712 2061550 2850336
128 64 182344 871319 2412939 609440 2249929 941090 1827150 1007712 2249754 1113206 1578345 2132336 3052578
128 128 301873 595485 2788953 2555042 2131042 963078 762218 494164 1937294 564075 1016490 2067590 2559306

Note: All the output you see above are in KB/Sec

The first column shows the file size used and second column shows the length of the record used.

Lets understand the output in some of the columns

The third Column-Write:This column shows the speed Whenever a new file is made in any file system under Linux. There is more overhead involved in the metadata storing. For example the inode for the file, and its entry in the journal etc. So creating a new file in a file system is always comparatively slower than overwriting an already created file.

Fourth column-Re-writing:This shows the speed reported in overwriting the file which is already created

Fifth column-Read:This reports the speed of reading an already existing file.

seq 1 100 | xargs -I {} java -jar bin/ossimport2.jar -c conf/sys.properties submit jobs/job.{}.cfg

seq 1 100 | xargs -I {} java -jar bin/ossimport2.jar -c conf/sys.properties clean jobs/job.{}.cfg

http://www.slashroot.in/linux-file-system-read-write-performance-test

Sunday, June 5, 2016

SGE

SGE JOB history

/var/lib/gridengine/default/common$ tail -f accounting

qsub sge-root/examples/jobs/simple.sh

After the job finishes executing, check your home directory for the redirected stdout/stderr files script-name.ejob-id and script-name.ojob-id.

job-id is a consecutive unique integer number assigned to each job.

/usr/lib/lsb/remove_initd /etc/init.d/sgeexecd.cluster

./install_qmaster -m -noremote -auto util/install_modules/sge_configuration.conf

Default location of Sun Grid Engine log files:

<qmaster_spool_dir>/messages
<qmaster_spool_dir>/schedd/messages
<execd_spool_dir>/<hostname>/messages
<sge_root>/<sge_cell>/common/accounting
<sge_root>/<sge_cell>/common/statistics

add user to submit queue

$qconf -sul

$qconf -su ceshi

name ceshi

type ACL DEPT
fshare 0
oticket 0
entries ceshi,ano

$qconf -as master.local

$qconf -shgrp @ceshi >hosts4group

add an entry to hosts4group

$qconf -Mhgrp hosts4group

$qconf -se execd2 >execd.txt

modify execd.txt

$qconf -Me execd.txt

$qconf -sq ceshi.q > queue.txt

$qconf -Mq queue.txt

Adding an execution host

Make the new host an administrative host

qconf -ah <hostname>
As root on this new host, run the following script from $SGE_ROOT

install_execd

Removing an execution host

First, delete the queues associated with this host
qconf -sql -> list show a list of all queues

  以下操作都是在管理节点操作。

1.qconf -mq ceshi.q 去掉要删除节点的信息

3. /etc/hosts去掉要删除节点的信息
Delete the host

qconf -mhgrp @ceshi to remove entry for <hostname>
qconf -de <hostname>

qonf -dh <host>

Finally, delete the configuration for the host

qconf -dconf <hostname>

Install On Master

sudo apt-get install gridengine-client gridengine-common gridengine-master gridengine-qmon sun-java6-jre

The java is needed for it to run.
In installer say not to send email
set cell name to default
It is important to set the master name to be your actual outside name, to find it, type

hostname -I
#this gives you an IP, then type
host thisIP

Then give the fully qualified name of your machine as host.

Install On Workers

sudo apt-get install gridengine-exec gridengine-client

Starting on Master

Try starting it inside a screen session with

sudo su
sge_qmaster
sge_execd
qmon

It will probably fail with some message like

$ sudo qmon
Warning: Cannot convert string "-adobe-courier-medium-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-bold-r-*--14-*-*-*-m-*-*-*" to type FontStruct
Warning: Cannot convert string "-adobe-courier-medium-r-*--12-*-*-*-m-*-*-*" to type FontStruct
X Error of failed request: BadName (named color or font does not exist)
Major opcode of failed request: 45 (X_OpenFont)
Serial number of failed request: 643
Current serial number in output stream: 654

Then install fonts:

sudo apt-get install xfonts-base xfonts-100dpi xfonts-75dpi

Then restart computer (no really!)

If you run into problems that say you cannot connect, make sure you don't have another sge process already running and that you have the fully qualified hostname set correctly

To kill possibly running instances:

ps aux | grep "sge"

To reconfigure the already installed after changing your hostname (for ubuntu in /etc/hostname) to match the one you got from running the above hostcommand:

sudo dpkg-reconfigure gridengine-master
# or to purge/remove with config files the whole install and try over clean:
sudo apt-get purge gridengine-common gridengine-exec gridengine-client

Then follow part 2 of http://scidom.wordpress.com/tag/parallel/ to use the GUI to set up your mainqueue and try their simple hello world example.

Other useful resources are: http://helms-deep.cable.nu/~rwh/blog/?p=159

sudo apt-get install gridengine-client gridengine-common gridengine-master

Creating config file /etc/default/gridengine with new version
Setting up gridengine-client (6.2u5-7.3) ...
Setting up gridengine-master (6.2u5-7.3) ...
Initializing cluster with the following parameters:
=> SGE_ROOT: /var/lib/gridengine
=> SGE_CELL: default
=> Spool directory: /var/spool/gridengine/spooldb
=> Initial manager user: sgeadmin
Initializing spool (/var/spool/gridengine/spooldb)
Initializing global configuration based on /usr/share/gridengine/default-configuration
Initializing complexes based on /usr/share/gridengine/centry
Initializing usersets based on /usr/share/gridengine/usersets
Adding user sgeadmin as a manager
Cluster creation complete

Friday, June 3, 2016

bcl2fastq v2.15 on ubuntu 14.04

Demultiplexing

Demultiplexing = reorganizing the FASTQ files + generating the statistics and reporting files.

Reorganizing FASTQ Files

The first step of demultiplexing is reorganizing the base call files, based on the index
sequence. This step is done the following way for each cluster:
1 Get the raw index for each Index Read from the BCL file.
2 Identify the appropriate sample for the index based on the sample sheet.
3 Optional: Detect and correct up to two errors on the barcode, and identify the
appropriate sample. If there are multiple Index Reads, detect and correct up to two
errors in each Index Read.
4 Optional: Detect the presence of adapter sequence at the end of read. If adapter
sequence is detected, trim or mask (with N) the corresponding base calls.
5 Append the read to the appropriate new FASTQ file for each read.
6 If the index cannot be identified, the data are written into an Undetermined sample
file, unless the sample sheet specifies a sample for reads without index

Compiling bcl2fastq v2.15 on Ubuntu 12.04 and 14.04

Wed 27 August 2014 — Filed under notes; tags: linux

Illumina provides a program for demultiplexing sequencing output called bcl2fastq. They get a gold star for releasing the source - the downside is that they release binaries only for RHEL/CentOS, and no build instructions for Ubuntu. So how hard could it be?

Ubuntu 14.04 (Trusty Tahr)

I thought I'd start here since the packages are more up to date (turns out it's a good thing I did, see the morass below). There's some documentation from Illumina for compiling from source here. There's not a lot to go on, other than a list of dependencies, which boils down to:

zlib
librt
libpthread
gcc 4.1.2 (with c++)
boost 1.54 (with its dependencies)
cmake 2.8.9

Really the only tricky part was figuring out the required packages, which didn't correspond particularly well to the list of dependencies above. I didn't bother trying to install specific version of any of the dependencies other than boost 1.54.

On an Amazon AWS EC2 instance (m3.medium, ubuntu-trusty-14.04-amd64-server-20140607.1 ami-e7b8c0d7):

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install zlibc
sudo apt-get install libc6 # provides librt and libpthread
sudo apt-get install gcc
sudo apt-get install g++
sudo apt-get install libboost1.54-all-dev
sudo apt-get install cmake

From there, compilation more or less works as advertised:

wget ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/Software/bcl2fastq/bcl2fastq2-v2.15.0.4.tar.gz
tar -xf bcl2fastq2-v2.15.0.4.tar.gz
cd bcl2fastq
mkdir build
cd build
PREFIX=/usr/local
sudo mkdir -p ${PREFIX:?}
../src/configure --prefix=${PREFIX:?}
make
sudo make install

We wanted this version to coexist with an older one, so I renamed the executable:

sudo mv $PREFIX/bin/bcl2fastq $PREFIX/bin/bcl2fastq2


syntax error at /usr/local/lib/bcl2fastq-1.8.4/perl/Casava/Alignment/Config.pm line 761, near "}"
/usr/local/lib/bcl2fastq-1.8.4/perl/Casava/Alignment/Config.pm has too many errors.
Compilation failed in require at /usr/local/lib/bcl2fastq-1.8.4/perl/Casava/Alignment.pm line 61.







sudo apt-get install libexpat1-dev

sudo apt-get install xsltproc




The reason for the errors is that bcl2fastq is not compatible with the default perl 5.18 of Ubuntu 14.04. You need to install an older perl version to execute the script.

Use the following commands to install, e.g. 5.14, to path/perlbrew/:



cd path/perlbrew/

wget http://install.perlbrew.pl -O install_perlbrew.sh


export PERLBREW_ROOT=path/perlbrew/ && bash install_perlbrew.sh


source ./etc/bashrc

perlbrew install perl-5.14.4


perlbrew switch perl-5.14.4


perlbrew install-cpanm

cpanm XML/Simple.pm

http://nhoffman.github.io/borborygmi/compiling-bcl2fastq-on-ubuntu.html

Thursday, June 2, 2016

s3fs-fuse

Maximum file size=64GB

s3fs is stable and is being used in number of production environments, e.g., rsync backup to s3.
s3fs works with rsync! (as of svn 43) as of r152 s3fs uses x-amz-copy-source for efficient update of mode, mtime and uid/gid.

enable_content_md5 (default is disable)

verifying uploaded data without multipart by content-md5 header.

fusermount -uz /opt/oss1

$ ossfs anno-sge /opt/oss1 -ourl=http://vpc100-oss-cn-beijing.aliyuncs.com -o multireq_max=5,use_cache=/mnt/xvdb1/tmp
$ ossfs anno-sge /opt/oss1 -ourl=http://vpc100-oss-cn-beijing.aliyuncs.com -o nomultipart,use_cache=/mnt/xvdb1/tmp

s3fs has a caching mechanism: You can enable local file caching to minimize downloads
the folder specified by use_cache (optional) a local file cache automatically maintained by s3fs, enabled with "use_cache" option, e.g., -ouse_cache=/mnt/xvdb1/tmp

s3fs supports multiparts request(send some request as parallel), I think this problem is dependent on the number of parallel requests as possible.
If you can, please try to set small value for multireq_max and parallel_count options.

nomultipart
- disable multipart uploads.
multireq_max (default="500")
- maximum number of parallel request for listing objects.
parallel_count (default="5")
- number of parallel request for downloading/uploading large objects. s3fs uploads large object(over 20MB) by multipart post request, and sends parallel requests. This option limits parallel request count which s3fs requests at once.

https://github.com/s3fs-fuse/s3fs-fuse/issues/94

https://github.com/s3fs-fuse/s3fs-fuse/issues/152

$cat s3fs-watchdog.sh

#!/bin/bash
#
# s3fs-watchdog.sh
#
# Run from the root user's crontab to keep an eye on s3fs which should always
# be mounted.
#
# Note: If getting the amazon S3 credentials from environment variables
# these must be entered in the actual crontab file (otherwise use one
# of the s3fs other ways of getting credentials).
#
# Example: To run it once every minute getting credentials from envrironment
# variables enter this via "sudo crontab -e":
#
# AWSACCESSKEYID=XXXXXXXXXXXXXX
# AWSSECRETACCESSKEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# * * * * * /root/s3fs-watchdog.sh
#

NAME=ossfs
BUCKET=anno-sge
MOUNTPATH=/opt/oss1
MOUNT=/bin/mount
UMOUNT=/bin/umount
NOTIFY=whg@anno.com
NOTIFYCC=whg@anno.com
GREP=/bin/grep
PS=/bin/ps
NOP=/bin/true
DATE=/bin/date
MAIL=/usr/bin/mail
RM=/bin/rm

$PS -ef|$GREP -v grep|$GREP $NAME|grep $BUCKET >/dev/null 2>&1
case "$?" in
0)
# It is running in this case so we do nothing.
$NOP
;;
1)
echo "$NAME is NOT RUNNING for bucket $BUCKET. Remounting $BUCKET with $NAME and sending notices."
$UMOUNT $MOUNTPATH >/dev/null 2>&1
$MOUNT $MOUNTPATH >/tmp/watchdogmount.out 2>&1
NOTICE=/tmp/watchdog.txt
echo "$NAME for $BUCKET was not running and was started on `$DATE`" > $NOTICE
$MAIL -n -s "$BUCKET $NAME mount point lost and remounted" -t $NOTIFYCC $NOTIFY < $NOTICE
$RM -f $NOTICE
;;
esac

exit

$cat /etc/fstab

ossfs#anno-sge /opt/oss1 fuse _netdev,url=http://vpc100-oss-cn-beijing.aliyuncs.com,uid=1001,gid=1001,max_stat_cache_size=100000000,nomultipart,use_cache=/mnt/xvdb1/tmp,allow_other,user,exec 0 0

请确保/etc/passwd-ossfs这个文件存在，且权限为640。并且user和该文件的owner在同一个group内

$mount /opt/oss1
fusermount: failed to open /etc/fuse.conf: Permission denied

fusermount: option allow_other only allowed if 'user_allow_other' is set in /etc/fuse.conf

The problem can be easily fixed by adding the user to the fuse group then relogin:

sudo addgroup <username> fuse

Genomics Big data