|
[plain]
view plain
copy
print
?


- removeStopWords?<-?function(x,stopwords)?{??
- temp?<-?character(0)??
- index?<-?1??
- xLen?<-?length(x)??
- while?(index?<=?xLen)?{??
- if?(length(stopwords[stopwords==x[index]])?<1)??
- temp<-?c(temp,x[index])??
- index?<-?index?+1??
- }??
- temp??
- }??
removeStopWords <- function(x,stopwords) {
temp <- character(0)
index <- 1
xLen <- length(x)
while (index <= xLen) {
if (length(stopwords[stopwords==x[index]]) <1)
temp<- c(temp,x[index])
index <- index +1
}
temp
}
[plain]
view plain
copy
print
?


- hlzjTemp2?<-lapply(hlzjTemp,removeStopWords,stopwords)??
- hlzjTemp2[1:2]??
hlzjTemp2 <-lapply(hlzjTemp,stopwords)
hlzjTemp2[1:2]
"? ???""???? "HWRAJGA"
跟hlzjTemp[1:2]的内容比较可以明显发现“的”这样的字都被去掉了。
?
4.?????? 词云
词云是现在很常见的一种分析图,把这些词语放在一张图中,频次来显示词语的大小,这样就能很直观看出那些词语出现得比较多,在舆情分析中经常被用到。
下面的过程是将分词结果做一个统计,计算出每个词出现的次数并排序,然后取排名在前150的150个词语,用wordcloud()方法来画词云。
[plain]
view plain
copy
print
?


- words?<-?lapply(hlzjTemp2,strsplit,"?")??
- wordsNum?<-?table(unlist(words))??
- wordsNum?<-?sort(wordsNum)?#排序??
- wordsData?<-?data.frame(words?=names(wordsNum),?freq?=?wordsNum)??
- library(wordcloud)?#加载画词云的包??
- weibo.top150?<-?tail(wordsData,150)?#取前150个词??
- colors=brewer.pal(8,"Dark2")??
- wordcloud(weibo.top150$words,weibo.top150$freq,scale=c(8,0.5),colors=colors,random.order=F)??
words <- lapply(hlzjTemp2," ")
wordsNum <- table(unlist(words))
wordsNum <- sort(wordsNum) #排序
wordsData <- data.frame(words =names(wordsNum),freq = wordsNum)
library(wordcloud) #加载画词云的包
weibo.top150 <- tail(wordsData,150) #取前150个词
colors=brewer.pal(8,"Dark2")
wordcloud(weibo.top150$words,random.order=F)
(编辑:网站开发网_安阳站长网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|