我是怎么把4000万条数据从oracle导入到mysql中的

cloudtech

浏览: 4611282 次
性别:
来自: 武汉

最近访客更多访客>>

u012363178

devcang

robinjim

JasonWo

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (6378)

社区版块

存档分类

2013-03 ( 15)
2013-02 ( 44)
2013-01 ( 243)
更多存档...

由于公司想换用广告投放系统，要测试下新系统的性能，可是公司以前的系统采用的是oracle，新的系统是mysql，表结构也不太相同。首先在本地oracle中新建了一个用户，新建了一套和那个mysql上一样结构的数据表，然后组织数据把旧数据插入到新表中。接下来就是把本地oracle上的数据导到mysql上去。

导出成sql脚本，好几G的脚本，而且两个数据库不再一个网段上，执行起来很慢。导成文件数据库又不一致。在没有想到好的导出导入办法的情况下，选择了采用程序实现，下面是我的全部代码：

public class DumpData {

public static void main(String[] args) {

Connection con = null;
Connection conn = null;

PreparedStatement pst = null;
Statement sql_statement = null;

try {

Long startTime = System.currentTimeMillis();
SimpleDateFormat sdf = new SimpleDateFormat("HH:mm:ss:SS");
TimeZone t = sdf.getTimeZone();
t.setRawOffset(0);
sdf.setTimeZone(t);

Class.forName("oracle.jdbc.driver.OracleDriver");
String url = "jdbc:oracle:thin:@219.239.**.***:1521:ITCPN";
conn = DriverManager.getConnection(url, "***", "***");
if(conn != null) System.out.println("oracle连接成功");

Class.forName("org.gjt.mm.mysql.Driver");
System.out.println("Success loading Mysql Driver!");
con = DriverManager.getConnection("jdbc:mysql://219.239.**.***:3306/***?&useUnicode=true&characterEncoding=gb2312&rewriteBatchedStatements=true","****","*****");
if(con != null) System.out.println("mysql连接成功");
con.setAutoCommit(false);

sql_statement = conn.createStatement();
String countSql = "select /*+ parallel(t,4)*/count(1) from /"Solution/"";
ResultSet res = sql_statement.executeQuery(countSql);
int count = 0;
while (res.next()){
count = res.getInt(1);
}
res.close();
int loop = count / 100000;

StringBuffer insertSql = new StringBuffer("insert into Solution values(?");
for (int i = 0;i<38;i++){
insertSql.append(",?");
}
insertSql.append(")");
pst = (PreparedStatement) con.prepareStatement(insertSql.toString());
System.out.println(insertSql.toString());

for (int n = 0; n <= loop; n++){
String query = "select id,account, memo,channeltype,channelid, bannergroupid,bannerid,configid," +
"enable,gid,priority,percent,aid, method,type,startday,starttime,quitday,quittime,campaignid," +
"fixedchannelid,nopaying, optflag,keyword,keywordflag,keywordtext,keywordencoding,reserveflag," +
"forecastflag,estimatemediaexpense,castingtype,count,crm_priority,crm_limitnum,crm_limitdate,deleteflag,solutiontypeid,price,achievedflag"+ " from (select id,account, memo,channeltype,channelid, bannergroupid,bannerid,configid," +
"enable,gid,priority,percent,aid, method,type,startday,starttime,quitday,quittime,campaignid," +
"fixedchannelid,nopaying, optflag,keyword,keywordflag,keywordtext,keywordencoding,reserveflag," +
"forecastflag,estimatemediaexpense,castingtype,count,crm_priority,crm_limitnum,crm_limitdate,deleteflag,solutiontypeid," +"price,achievedflag,rownum rn from /"Solution/" where rownum <="+(n+1)*100000+" ) where rn >"+n*100000;

System.out.println(query);
ResultSet result = sql_statement.executeQuery(query);

result.setFetchSize(1000);

Long endTime1 = System.currentTimeMillis();
System.out.println("完成全部数据查询，用时：" + sdf.format(new Date(endTime1 - startTime)));

while(result.next()){
for(int i = 1; i < 40;i++){
pst.setObject(i, result.getObject(i));
}
pst.addBatch();
if (result.getRow()%10000==0){
System.out.println("插入10000条数据！");
pst.executeBatch();
pst.clearBatch();
}
}
pst.executeBatch();
pst.clearBatch();
con.commit();
System.out.println("插入了10万条记录！");
result.close();
System.gc();

}

Long endTime = System.currentTimeMillis();
System.out.println("完成全部数据插入，用时：" + sdf.format(new Date(endTime - startTime)));

} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}catch(Exception e){
e.printStackTrace();
}
finally{
try {
if (sql_statement != null) sql_statement.close();
if(pst != null) pst.close();
if (con != null &&!con.isClosed()){
con.setAutoCommit(true);
con.close();
}
if (con != null && !conn.isClosed()){
conn.close();
}
} catch (SQLException e) {
e.printStackTrace();
}

}
}
}

采用了addBatch()插入的方式,调用addBatch()是攒sql语句的过程，几乎是不费时的，调用executeBatch ()比较耗时，不过如果攒的语句太多，会导致 mysql 崩溃，我这里设置了比较适中的10000。

考虑到性能连接时候，添加了rewriteBatchedStatements 参数，默认为false, 需要手工设置为true。rewriteBatchedStatements=false时，执行路径会跳到 executeBatchSerially，此方法内部将语句一条条发送，与非batch处理简直一样，所以慢，就在这里了。当设为 true时，会执行executeBatchedInserts方法，事实上mysql支持这样的插入语句

Sql代码 insertintot_user(id,uname)values(1,'1'),(2,'2'),(3,'3')....参考http://www.javaeye.com/topic/770032?page=3

mysql驱动采用的是mysql-connector-java-5.1.14-bin.jar.

后面还对查询数据库记录策略进行了优化，/*+parallel(t,4)*/开启了数据库并行处理，一般只有对表或索引进行全部访问时才可以使用并行，并且数据量达到数百万级别。加大fetch_size，这样可以减少结果数据传输的交互次数及服务器数据准备时间，提高性能。建议不要超过1000，太大了也没什么性能提高了，反而可能会增加内存溢出的危险。还有就是查询语句的sql的优化，一次查询100000条记录并且采用嵌套的方式，也提供了查询速度。经过优化后运行速度还可以有一个很好的提高，在内网pc上运行，开始20万条数据30多秒搞定，放到外网服务器上运行估计还能提高一倍。最后放到外网服务器上用了五十多分钟就导完了4000万条记录。听说还有一种数据棒的方式导数据，大家有兴趣可以研究下。

分享到：