Google App Engine中长时间运行的程序 [英] Long running program in Google App Engine

查看:120
本文介绍了Google App Engine中长时间运行的程序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用Java编写了一个servlet代码,用于从存储在Google Cloud Storage中的文件读取一行代码。一旦我读完每一行,我就将它传递给预测API。一旦我得到通过文本的预测。我将它追加到原始行,并将其存储在Google云存储中的其他文件中。



此源文件是csv,拥有超过10,000条记录。由于我分别解析它,将它传递给预测API,然后存储回云存储。这需要很多时间。由于App Engine限制了30个部分,并且任务队列也有限制。任何建议我可以选择吗?由于重新启动程序不是一种选择,因为我无法从停止的地方启动预测。



以下是我的代码:

  @SuppressWarnings(serial)
公共类PredictionWebAppServlet扩展HttpServlet {

private static final String APPLICATION_NAME =span -test应用内;

static final String MODEL_ID =span-senti;
static final String STORAGE_DATA_LOCATION =/ bigdata / training_set /;
私有静态HttpTransport httpTransport;
private static final JsonFactory JSON_FACTORY = JacksonFactory
.getDefaultInstance();

public static final String INPUT_BUCKETNAME =bigdata;
public static final String INPUT_FILENAME =abc.csv;

public static final String OUTPUT_BUCKETNAME =bigdata;
public static final String OUTPUT_FILENAME =def.csv;

private static Credential authorize()抛出异常{

Credential cr = new GoogleCredential.Builder()
.setTransport(httpTransport)
.setJsonFactory JSON_FACTORY)
.setServiceAccountId(
878482284233-aacp8vd5297aqak7v5r0f507qr63mab4@developer.gserviceaccount.com)
.setServiceAccountScopes(
Collections.singleton(PredictionScopes.PREDICTION))
。 setServiceAccountPrivateKeyFromP12File(
new File(
28617ba6faac0a51eb2208edba85d2e20e6081b4-privatekey.p12))
.build();
return cr;



$ b public void doGet(HttpServletRequest req,HttpServletResponse resp)
抛出IOException {
try {
httpTransport = GoogleNetHttpTransport.newTrustedTransport();
凭证凭证=授权();

Prediction prediction = new Prediction.Builder(httpTransport,
JSON_FACTORY,凭证).setApplicationName(APPLICATION_NAME)
.build();


GcsService gcsService = GcsServiceFactory.createGcsService();

GcsFilename filename = new GcsFilename(INPUT_BUCKETNAME,INPUT_FILENAME);
GcsFilename filename1 = new GcsFilename(OUTPUT_BUCKETNAME,
OUTPUT_FILENAME);
GcsFileOptions options = new GcsFileOptions.Builder()
.mimeType(text / html)。acl(public-read)
.addUserMetadata(myfield1,my field value )。建立();


GcsOutputChannel writeChannel = gcsService.createOrReplace(filename1,options);

PrintWriter writer = new PrintWriter(Channels.newWriter(writeChannel,
UTF8));


GcsInputChannel readChannel = null;
BufferedReader reader = null;

readChannel = gcsService.openReadChannel(filename,0);
reader = new BufferedReader(Channels.newReader(readChannel,UTF8));
字符串行;
String cvsSplitBy =,;
String temp_record =;
输入输入= new Input();
InputInput inputInput = new InputInput(); ((line = reader.readLine())!= null){

String [] post = line.split(cvsSplitBy);




inputInput.setCsvInstance(Collections
。< Object> singletonList(post [1]));
input.setInput(inputInput);

输出输出= prediction.trainedmodels()
.predict(878482284233,MODEL_ID,input).execute();
for(int i = 0; i <10; i ++){
temp_record = temp_record + post [i] +,;
}
temp_record = temp_record + output.getOutputLabel();


writer.println(temp_record);

}

writer.flush();
writer.close();

//resp.getWriter().println(temp_record);
} catch(Exception e){
// TODO自动生成的catch块
e.printStackTrace();
}
finally {

}
}
}


解决方案

如果您认为自己的工作可以在10分钟内完成分钟,您可以单独执行任务队列。



如果不是,您将需要使用任务队列和后端的组合。您需要将其推入后端实例。查看推送队列和后端
$ b

更新 - 使用模块而非后端

后端已弃用的模块。使用模块的方法是:


  1. 将您的应用程序转换为模块结构

  2. define具有手动缩放的模块

  3. 处理该模块中的/ _ah / starturl

  4. 执行/ _ah /启动处理程序

手动缩放实例对运行时间没有限制。如果实例具有手动缩放,则可以在/ _ah / start请求中运行forever。嘿,如果你喜欢,你甚至可以启动线程。但这个工作不应该是必要的。直到完成。


I have written a servlet code in Java for reading a line from file which is stored in Google Cloud Storage . Once I read each line I pass it to prediction API . Once i get the prediction of the text passed . I append it to original line and store it in some other file in Google cloud storage .

This sources file is a csv and has more than 10,000 records . Since I am parsing it individually,passing it to prediction API and then storing back to Cloud Storage . It takes lot of time to do so . Since App Engine has limit of 30 section and also task queues has limitation . Can any suggest me some option ? Since re-initiating the program is not an option since I wont be able to initiate the prediction from where i stopped .

Here's my code :

@SuppressWarnings("serial")
public class PredictionWebAppServlet extends HttpServlet {

    private static final String APPLICATION_NAME = "span-test-app";

    static final String MODEL_ID = "span-senti";
    static final String STORAGE_DATA_LOCATION = "/bigdata/training_set/";
    private static HttpTransport httpTransport;
    private static final JsonFactory JSON_FACTORY = JacksonFactory
            .getDefaultInstance();

    public static final String INPUT_BUCKETNAME = "bigdata";
    public static final String INPUT_FILENAME = "abc.csv";

    public static final String OUTPUT_BUCKETNAME = "bigdata";
    public static final String OUTPUT_FILENAME = "def.csv";

    private static Credential authorize() throws Exception {

        Credential cr = new GoogleCredential.Builder()
                .setTransport(httpTransport)
                .setJsonFactory(JSON_FACTORY)
                .setServiceAccountId(
                        "878482284233-aacp8vd5297aqak7v5r0f507qr63mab4@developer.gserviceaccount.com")
                .setServiceAccountScopes(
                        Collections.singleton(PredictionScopes.PREDICTION))
                .setServiceAccountPrivateKeyFromP12File(
                        new File(
                                "28617ba6faac0a51eb2208edba85d2e20e6081b4-privatekey.p12"))
                .build();
        return cr;
    }



    public void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws IOException {
        try {
            httpTransport = GoogleNetHttpTransport.newTrustedTransport();
            Credential credential = authorize();

            Prediction prediction = new Prediction.Builder(httpTransport,
                    JSON_FACTORY, credential).setApplicationName(APPLICATION_NAME)
                    .build();


            GcsService gcsService = GcsServiceFactory.createGcsService();

            GcsFilename filename = new GcsFilename(INPUT_BUCKETNAME, INPUT_FILENAME);
            GcsFilename filename1 = new GcsFilename(OUTPUT_BUCKETNAME,
                    OUTPUT_FILENAME);
            GcsFileOptions options = new GcsFileOptions.Builder()
                    .mimeType("text/html").acl("public-read")
                    .addUserMetadata("myfield1", "my field value").build();


            GcsOutputChannel writeChannel = gcsService.createOrReplace(filename1, options);

            PrintWriter writer = new PrintWriter(Channels.newWriter(writeChannel,
                    "UTF8"));


            GcsInputChannel readChannel = null;
            BufferedReader reader = null;

            readChannel = gcsService.openReadChannel(filename, 0);
            reader = new BufferedReader(Channels.newReader(readChannel, "UTF8"));
            String line;
            String cvsSplitBy = ",";
            String temp_record = "";
            Input input = new Input();
            InputInput inputInput = new InputInput();


            while ((line = reader.readLine()) != null) {

                String[] post = line.split(cvsSplitBy);

                inputInput.setCsvInstance(Collections
                        .<Object> singletonList(post[1]));
                input.setInput(inputInput);

                Output output = prediction.trainedmodels()
                        .predict("878482284233", MODEL_ID, input).execute();
                for (int i = 0; i < 10; i++) {
                    temp_record = temp_record + post[i] + ",";
                }
                temp_record = temp_record + output.getOutputLabel();


                 writer.println(temp_record);

            }

            writer.flush();
            writer.close();

            //resp.getWriter().println(temp_record);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        finally{

        }
    }
}

解决方案

You are hinting at it yourself.

If you think your job can finish within 10 minutes, you can do it with tasks queues alone.

If not, you will need to use a combination of task queues and backends. You need to push it into a backend instance. Take a look at Push queues and backends

UPDATE - with modules instead of backends

Backends are deprecated in favour of modules. A way to do it with modules is to:

  1. convert your app to modules structure
  2. define a module with manual scaling
  3. handle the "/_ah/start" url in that module
  4. execute all of your job in the "/_ah/start" handler

Manual scaling instances don't have constraints on how long time they may run. You can run "forever" in the "/_ah/start" request if the instance has manual scaling. Hey, you can even start threads, if you like. But it should not be necessary for this job. Just run until done.

这篇关于Google App Engine中长时间运行的程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆