blogjava-凯发k8网页登录

blogjava-凯发k8网页登录http://www.blogjava.net/xzclog/zh-cnsat, 08 apr 2023 20:39:42 gmtsat, 08 apr 2023 20:39:42 gmt60python读写、创建文件http://www.blogjava.net/xzclog/archive/2018/11/28/433526.htmlxzcxzcwed, 28 nov 2018 03:52:00 gmthttp://www.blogjava.net/xzclog/archive/2018/11/28/433526.htmlhttp://www.blogjava.net/xzclog/comments/433526.htmlhttp://www.blogjava.net/xzclog/archive/2018/11/28/433526.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/433526.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433526.html

xzc 2018-11-28 11:52 发表评论
]]>
hdfs副本设置——默认3http://www.blogjava.net/xzclog/archive/2018/11/26/433518.htmlxzcxzcmon, 26 nov 2018 03:52:00 gmthttp://www.blogjava.net/xzclog/archive/2018/11/26/433518.htmlhttp://www.blogjava.net/xzclog/comments/433518.htmlhttp://www.blogjava.net/xzclog/archive/2018/11/26/433518.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/433518.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433518.html

xzc 2018-11-26 11:52 发表评论
]]>
当同时安装python2和python3后,如何兼容并切换使用详解(比如pip使用)http://www.blogjava.net/xzclog/archive/2018/11/16/433500.htmlxzcxzcfri, 16 nov 2018 01:38:00 gmthttp://www.blogjava.net/xzclog/archive/2018/11/16/433500.htmlhttp://www.blogjava.net/xzclog/comments/433500.htmlhttp://www.blogjava.net/xzclog/archive/2018/11/16/433500.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/433500.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433500.html

xzc 2018-11-16 09:38 发表评论
]]>
hive集成sentry的sql用法。http://www.blogjava.net/xzclog/archive/2018/09/03/433353.htmlxzcxzcmon, 03 sep 2018 10:19:00 gmthttp://www.blogjava.net/xzclog/archive/2018/09/03/433353.htmlhttp://www.blogjava.net/xzclog/comments/433353.htmlhttp://www.blogjava.net/xzclog/archive/2018/09/03/433353.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/433353.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433353.html

xzc 2018-09-03 18:19 发表评论
]]>
python编码和解码http://www.blogjava.net/xzclog/archive/2018/05/18/433218.htmlxzcxzcfri, 18 may 2018 01:52:00 gmthttp://www.blogjava.net/xzclog/archive/2018/05/18/433218.htmlhttp://www.blogjava.net/xzclog/comments/433218.htmlhttp://www.blogjava.net/xzclog/archive/2018/05/18/433218.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/433218.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433218.html str,相反的,解码就是 str -> unicode。剩下的问题就是确定何时需要进行编码或者解码了.关于文件开头的"编码指示",也就是 # -*- codin...  阅读全文

xzc 2018-05-18 09:52 发表评论
]]>
kafka节点假死http://www.blogjava.net/xzclog/archive/2018/03/08/433087.htmlxzcxzcthu, 08 mar 2018 08:35:00 gmthttp://www.blogjava.net/xzclog/archive/2018/03/08/433087.htmlhttp://www.blogjava.net/xzclog/comments/433087.htmlhttp://www.blogjava.net/xzclog/archive/2018/03/08/433087.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/433087.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433087.html
一、前言
    早上醒来打开微信,同事反馈kafka集群从昨天凌晨开始写入频繁失败,赶紧打开电脑查看了kafka集群的机器监控,日志信息,发现其中一个节点的集群负载从昨天凌晨突然掉下来了,和同事反馈的时间点大概一致,于是乎就登录服务器开始干活。
二、排错
1、查看机器监控,看是否能大概定位是哪个节点有异常
技术分享
2、根据机器监控大概定位到其中一个异常节点,登录服务器查看kafka日志,发现有报错日志,并且日志就停留在这个这个时间点:
[2017-06-01 16:59:59,851] error processor got uncaught exception. (kafka.network.processor)
java.lang.outofmemoryerror: direct buffer memory
        at java.nio.bits.reservememory(bits.java:658)
        at java.nio.directbytebuffer.(directbytebuffer.java:123)
        at java.nio.bytebuffer.allocatedirect(bytebuffer.java:306)
        at sun.nio.ch.util.gettemporarydirectbuffer(util.java:174)
        at sun.nio.ch.ioutil.read(ioutil.java:195)
        at sun.nio.ch.socketchannelimpl.read(socketchannelimpl.java:379)
        at org.apache.kafka.common.network.plaintexttransportlayer.read(plaintexttransportlayer.java:108)
        at org.apache.kafka.common.network.networkreceive.readfromreadablechannel(networkreceive.java:97)
        at org.apache.kafka.common.network.networkreceive.readfrom(networkreceive.java:71)
        at org.apache.kafka.common.network.kafkachannel.receive(kafkachannel.java:160)
        at org.apache.kafka.common.network.kafkachannel.read(kafkachannel.java:141)
        at org.apache.kafka.common.network.selector.poll(selector.java:286)
        at kafka.network.processor.run(socketserver.scala:413)3、查看kafka进程和监听端口情况,发现都正常,尼玛假死了
ps -ef |grep kafka        ## 查看kafka的进程
netstat -ntlp |grep 9092  ##9092kafka的监听端口4、既然已经假死了,只能重启了
ps -ef |grep kafka |grep -v grep |awk ‘{print $2}‘  | xargs kill -9  
/usr/local/kafka/bin;nohup ./kafka-server-start.sh ../config/server.properties &5、重启后在观察该节点的kafka日志,在一顿index重建之后,上面的报错信息在疯狂的刷,最后谷歌一番,解决了该问题
三、凯发天生赢家一触即发官网的解决方案:
/usr/local/kafka/binkafka-run-class.sh去掉
-xx: disableexplicitgc添加
-xx:maxdirectmemorysize=512m在一次重启kafka,问题解决。


xzc 2018-03-08 16:35 发表评论
]]>
hive中reduce个数设定http://www.blogjava.net/xzclog/archive/2018/03/07/433084.htmlxzcxzcwed, 07 mar 2018 03:21:00 gmthttp://www.blogjava.net/xzclog/archive/2018/03/07/433084.htmlhttp://www.blogjava.net/xzclog/comments/433084.htmlhttp://www.blogjava.net/xzclog/archive/2018/03/07/433084.html#feedback1http://www.blogjava.net/xzclog/comments/commentrss/433084.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/433084.html阅读全文

xzc 2018-03-07 11:21 发表评论
]]>
spark 累加历史 统计全部 行转列http://www.blogjava.net/xzclog/archive/2017/10/23/432867.htmlxzcxzcmon, 23 oct 2017 14:05:00 gmthttp://www.blogjava.net/xzclog/archive/2017/10/23/432867.htmlhttp://www.blogjava.net/xzclog/comments/432867.htmlhttp://www.blogjava.net/xzclog/archive/2017/10/23/432867.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/432867.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/432867.html阅读全文

xzc 2017-10-23 22:05 发表评论
]]>
spark分析窗口函数http://www.blogjava.net/xzclog/archive/2017/10/23/432866.htmlxzcxzcmon, 23 oct 2017 14:04:00 gmthttp://www.blogjava.net/xzclog/archive/2017/10/23/432866.htmlhttp://www.blogjava.net/xzclog/comments/432866.htmlhttp://www.blogjava.net/xzclog/archive/2017/10/23/432866.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/432866.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/432866.html阅读全文

xzc 2017-10-23 22:04 发表评论
]]>
sparksql相关语句总结http://www.blogjava.net/xzclog/archive/2017/10/23/432865.htmlxzcxzcmon, 23 oct 2017 13:03:00 gmthttp://www.blogjava.net/xzclog/archive/2017/10/23/432865.htmlhttp://www.blogjava.net/xzclog/comments/432865.htmlhttp://www.blogjava.net/xzclog/archive/2017/10/23/432865.html#feedback0http://www.blogjava.net/xzclog/comments/commentrss/432865.htmlhttp://www.blogjava.net/xzclog/services/trackbacks/432865.html

1.in 不支持子查询 eg. select * from src where key in(select key from test);
支持查询个数 eg. select * from src where key in(1,2,3,4,5);
in 40000个 耗时25.766秒
in 80000个 耗时78.827秒

2.union all/union
不支持顶层的union all eg. select key from src union all select key from test;
支持select * from (select key from src union all select key from test)aa;
不支持 union
支持select distinct key from (select key from src union all select key from test)aa;

3.intersect 不支持

4.minus 不支持

5.except 不支持

6.inner join/join/left outer join/right outer join/full outer join/left semi join 都支持
left outer join/right outer join/full outer join 中间必须有outer
join是最简单的关联操作,两边关联只取交集;
left outer join是以左表驱动,右表不存在的key均赋值为null;
right outer join是以右表驱动,左表不存在的key均赋值为null;
full outer join全表关联,将两表完整的进行笛卡尔积操作,左右表均可赋值为null;
left semi join最主要的使用场景就是解决exist in;
hive不支持where子句中的子查询,sql常用的exist in子句在hive中是不支持的
不支持子查询 eg. select * from src aa where aa.key in(select bb.key from test bb);
可用以下两种方式替换:
select * from src aa left outer join test bb on aa.key=bb.key where bb.key <> null;
select * from src aa left semi join test bb on aa.key=bb.key;
大多数情况下 join on 和 left semi on 是对等的
a,b两表连接,如果b表存在重复数据
当使用join on的时候,a,b表会关联出两条记录,应为on上的条件符合; 
而是用left semi join 当a表中的记录,在b表上产生符合条件之后就返回,不会再继续查找b表记录了,
所以如果b表有重复,也不会产生重复的多条记录。 
left outer join 支持子查询 eg. select aa.* from src aa left outer join (select * from test111)bb on aa.key=bb.a;

7. hive四中数据导入方式
1)从本地文件系统中导入数据到hive表
create table wyp(id int,name string) row format delimited fields terminated by '\t' stored as textfile;
load data local inpath 'wyp.txt' into table wyp;
2)从hdfs上导入数据到hive表
[wyp@master /home/q/hadoop-2.2.0]$ bin/hadoop fs -cat /home/wyp/add.txt
hive> load data inpath '/home/wyp/add.txt' into table wyp;
3)从别的表中查询出相应的数据并导入到hive表中
hive> create table test(
> id int, name string
> ,tel string)
> partitioned by
> (age int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;

注:test表里面用age作为了分区字段,分区:在hive中,表的每一个分区对应表下的相应目录,所有分区的数据都是存储在对应的目录中。
比如wyp表有dt和city两个分区,则对应dt=20131218city=bj对应表的目录为/user/hive/warehouse/dt=20131218/city=bj,
所有属于这个分区的数据都存放在这个目录中。

hive> insert into table test
> partition (age='25')
> select id, name, tel
> from wyp;

也可以在select语句里面通过使用分区值来动态指明分区:
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> insert into table test
> partition (age)
> select id, name,
> tel, age
> from wyp;

hive也支持insert overwrite方式来插入数据
hive> insert overwrite table test
> partition (age)
> select id, name, tel, age
> from wyp;

hive还支持多表插入
hive> from wyp
> insert into table test
> partition(age)
> select id, name, tel, age
> insert into table test3
> select id, name
> where age>25;
4)在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中
hive> create table test4
> as
> select id, name, tel
> from wyp;

8.查看建表语句
hive> show create table test3;

9.表重命名
hive> alter table events rename to 3koobecaf; 

10.表增加列
hive> alter table pokes add columns (new_col int); 

11.添加一列并增加列字段注释 
hive> alter table invites add columns (new_col2 int comment 'a comment'); 

12.删除表
hive> drop table pokes; 

13.top n
hive> select * from test order by key limit 10;
14.创建数据库
create database baseball;

14.alter table tablename  change oldcolumn newcolumn column_type 修改列的名称和类型

alter table yangsy change product_no phone_no string

 

15.导入.sql文件中的sql

 spark-sql --driver-class-path /home/hadoop/hive/lib/mysql-connector-java-5.1.30-bin.jar -f testsql.sql 


insert into table ci_cuser_20141117154351522 select mainresult.product_no,dw_coclbl_m02_3848.l1_01_02_01,dw_coclbl_d01_3845.l2_01_01_04 from (select product_no from ci_cuser_20141114203632267) mainresult left join dw_coclbl_m02_201407 dw_coclbl_m02_3848 on mainresult.product_no = dw_coclbl_m02_3848.product_no left join dw_coclbl_d01_20140515 dw_coclbl_d01_3845 on dw_coclbl_m02_3848.product_no = dw_coclbl_d01_3845.product_no

insert into ci_cuser_20141117142123638 ( product_no,attr_col_0000,attr_col_0001) select mainresult.product_no,dw_coclbl_m02_3848.l1_01_02_01,dw_coclbl_m02_3848.l1_01_03_01 from (select product_no from ci_cuser_20141114203632267) mainresult left join dw_coclbl_m02_201407 dw_coclbl_m02_3848 on mainresult.product_no = dw_coclbl_m02_3848.product_no 


create table ci_cuser_yymmddhhmisstttttt_tmp(product_no string) row format serde 'com.bizo.hive.serde.csv.csvserde' ; 
load data local inpath '/home/ocdc/coc/yuli/test123.csv' overwrite into table test_yuli2;

创建支持csv格式的testfile文件
create table test_yuli7 row format serde 'com.bizo.hive.serde.csv.csvserde' as select * from ci_cuser_20150310162729786;

不依赖csvserde的jar包创建逗号分隔的表
"create table " listname " row format delimited fields terminated by ','"
" as select * from " listname1;

create table aaaa row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile as select * from

thriftserver 开启fair模式
sparksql thrift server 开启fair调度方式:
1. 修改$spark_home/conf/spark-defaults.conf,新增
2. spark.scheduler.mode fair
3. spark.scheduler.allocation.file /users/tianyi/github/community/apache-spark/conf/fair-scheduler.xml
4. 修改$spark_home/conf/fair-scheduler.xml(或新增该文件), 编辑如下格式内容
5.
6.
7.
8. fair
9.
10. 1
11.
12. 2
13.

14.
15. fifo
16. 2
17. 3
18.

19.

20. 重启thrift server
21. 执行sql前,执行 
22. set spark.sql.thriftserver.scheduler.pool=指定的队列名

等操作完了 create table yangsy555 like ci_cuser_yymmddhhmisstttttt 然后insert into yangsy555 select * from yangsy555

 

创建一个自增序列表,使用row_number() over()为表增加序列号 以供分页查询

create table yagnsytest2 as select row_number() over() as id,* from yangsytest;

 

 

sparksql的解析与hiveql的解析的执行流程:



xzc 2017-10-23 21:03 发表评论
]]>
网站地图