在Java代码中使用mahout,而不是cli [英] Using mahout in java code, not cli

查看:130
本文介绍了在Java代码中使用mahout,而不是cli的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望能够使用Java构建模型,我可以通过以下方式使用CLI进行构建:

i want to be able to build a model using java, i am able to do so with CLI as folowing:

    ./mahout trainlogistic --input Candy-Crush.twtr.csv \
       --output ./model \
       --target hd_click --categories 2 \
       --predictors click_frequency country_code ctr      device_price_range hd_conversion  time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper game_widgets health_and_fitness health_fitness libraries_and_demo libraries_demo lifestyle media_and_video media_video medical music_and_audio news_and_magazines news_magazines personalization photography productivity racing shopping social sports sports_apps sports_games tools transportation travel_and_local weather app_entertainment_percentage app_wallpaper_percentage app_widgets_percentage arcade_percentage books_and_reference_percentage brain_percentage business_percentage cards_percentage casual_percentage comics_percentage communication_percentage education_percentage entertainment_percentage finance_percentage game_wallpaper_percentage game_widgets_percentage health_and_fitness_percentage health_fitness_percentage libraries_and_demo_percentage libraries_demo_percentage lifestyle_percentage media_and_video_percentage media_video_percentage medical_percentage music_and_audio_percentage news_and_magazines_percentage news_magazines_percentage personalization_percentage photography_percentage productivity_percentage racing_percentage shopping_percentage social_percentage sports_apps_percentage sports_games_percentage sports_percentage tools_percentage transportation_percentage travel_and_local_percentage weather_percentage reads_magazine_sum reads_magazine_count interested_in_gardening_sum interested_in_gardening_count kids_birthday_coming_sum kids_birthday_coming_count job_seeker_sum job_seeker_count friends_sum friends_count married_sum married_count charity_donor_sum charity_donor_count student_sum student_count interested_in_real_estate_sum interested_in_real_estate_count sports_fan_sum sports_fan_count bascketball_sum bascketball_count interested_in_politics_sum interested_in_politics_count gamer_sum gamer_count activist_sum activist_count traveler_sum traveler_count likes_soccer_sum likes_soccer_count interested_in_celebs_sum interested_in_celebs_count auto_racing_sum auto_racing_count age_group_sum age_group_count healthy_lifestyle_sum healthy_lifestyle_count interested_in_finance_sum interested_in_finance_count sports_teams_usa_sum sports_teams_usa_count interested_in_deals_sum interested_in_deals_count business_oriented_sum business_oriented_count interested_in_cooking_sum interested_in_cooking_count music_lover_sum music_lover_count beauty_sum beauty_count follows_fashion_sum follows_fashion_count likes_wrestling_sum likes_wrestling_count name_sum name_count shopper_sum shopper_count golf_sum golf_count vegetarian_sum vegetarian_count dating_sum dating_count interested_in_fashion_sum interested_in_fashion_count interested_in_news_sum interested_in_news_count likes_tennis_sum likes_tennis_count male_sum male_count interested_in_cars_sum interested_in_cars_count follows_bloggers_sum follows_bloggers_count entertainment_sum entertainment_count interested_in_books_sum interested_in_books_count has_kids_sum has_kids_count interested_in_movies_sum interested_in_movies_count musicians_sum musicians_count tech_oriented_sum tech_oriented_count female_sum female_count has_pet_sum has_pet_count practicing_sports_sum practicing_sports_count \
       --types      numeric         word         numeric  word               word           word        numeric    word       word    word        numeric       \
       --features 100 --passes 1 --rate 50

我无法理解20个新闻组的示例,因为它值得借鉴. 谁能给我一个与cli命令相同的代码?

i cant understand the 20 news group example because its to big to learn from. can anyone give me a code that is doing the same as the cli command?

进行澄清:

我需要这样的东西:

    model.train(1,0,"monday",6,44,1,7,4,6,78,7,3,4,6,........,"good");
    model.train(1,0,"sunday",6,44,5,7,9,2,4,6,78,7,3,4,6,........,"bad");
    model.train(1,0,"monday",4,99,2,4,6,3,4,6,........,"good");

    model.writeTofile("myModel.model");

如果您不熟悉分类,只想告诉我如何从JAVA执行CLI命令,请不要回答

PLESE DO NOT ANSWER IF YOU ARE NOT FAMILIAR WITH CLASSIFICATION AND ONLY WANT TO TELL ME HOW TO EXECUTE CLI COMMAND FROM JAVA

推荐答案

我不是100%熟悉Mahout API(我同意文档非常稀疏),所以我只能给出指针,但希望对您有所帮助:

I am not 100% familiar with the Mahout API (I agree that documentation is very sparse) so I can only give pointers, but I hope it helps:

trainlogistic示例的Java源代码实际上可以在mahout-examples库中找到-它位于maven [0](在org.apache.mahout.classifier.sgd.TrainLogistic中)中.我想,如果您愿意,可以只使用完全相同的源代码,但这取决于mahout-examples库中的几个实用程序类(而且也不是很干净).

The Java source code for the trainlogistic example can actually be found in the mahout-examples library - it's on maven [0] (in org.apache.mahout.classifier.sgd.TrainLogistic). I suppose if you wanted to, you could just use the exact same source code, but it depends on a couple of utility classes in the mahout-examples library (and it's not very clean, either).

在此示例中,执行训练的班级是org.apache.mahout.classifier.sgd.OnlineLogisticRegression [1],尽管考虑到您拥有大量的预测变量,您可能仍想使用AdaptiveLogisticRegression [2](相同的程序包),它使用一个数字内部OnlineLogisticRegression的.但是,您必须亲自查看哪种数据最适合您.

The class performing the training in this example is org.apache.mahout.classifier.sgd.OnlineLogisticRegression [1], although considering the large number of predictor variables you have you might want to use the AdaptiveLogisticRegression [2] (same package), which uses a number of OnlineLogisticRegressions internally. But you have to see for yourself which works best with your data.

API相当简单,有一个train方法需要一个Vector输入数据,一个classify方法可以测试模型,还有learningRate和其他可以更改模型参数的方法.

The API is fairly straightforward, there's a train method which takes a Vector of your input data and a classify method to test your model, as well as learningRate and others to change the model's parameters.

要像命令行工具一样将模型保存到磁盘,请使用org.apache.mahout.classifier.sgd.ModelSerializer,它具有直接的API来读写模型. (OLR类本身中还有writereadFields方法,但是坦率地说,我不确定它们的作用或与ModelSerializer的区别-它们也没有记录.)

To save the model to disk like the command line tool does, use the org.apache.mahout.classifier.sgd.ModelSerializer, which has a straightforward API to write and read your model. (There's also write and readFields methods in the OLR class itself, but frankly, I'm not sure what they do or if there's a difference to ModelSerializer - they're not documented either.)

最后,除了mahout-examples中的源代码之外,这是另外两个直接使用Mahout API的示例,可能会很有用[3,4].

Lastly, aside from the source code in mahout-examples, here's two other example of using the Mahout API directly, that might be useful [3, 4].

来源:

[0] http://repo1.maven. org/maven2/org/apache/mahout/mahout-examples/0.8/

[1] [2] [3]

[3] http://mail-archives.apache.org/mod_mbox/mahout-user/201206.mbox/%3CCAJwFCa3X2fL_SRxT7f7v9uMjS3Tc9WrT7vuMQCVXyH71k0H0zQ@mail.gmail.com%3E

[4] http://skife.org/mahout/2013/02 /14/first_steps_with_mahout.html

这篇关于在Java代码中使用mahout,而不是cli的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆