JSoup"包装"不能按预期每次 [英] JSoup "wrap" is not working as expected everytime

查看:176
本文介绍了JSoup"包装"不能按预期每次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含文本,图像或地图图片的HTML字符串。该HTML是动态生成的。现在,只要有一个< IMG> 标签,应该由&LT包裹;中心> 标记。要做到这一点,我使用JSoup,我成功申请了静态图像和文字。

不过,每当我试图张贴图,HTML正在失去它的结构。我不明白发生了什么。无论是正常的,在地图图像有< IMG> 标记。该方法如何可以给不同的输出?

这是该做的工作方法:

 公共字符串wrapImgWithCenter(字符串HTML){
    文档的DOC = Jsoup.parse(HTML);
    。doc.select(img目录)包(<中心及GT;< /中心和GT;);
    返回doc.html();
}

原始的HTML与图像和地图图像包装前:

 < p DIR =升>< IMG src=\"http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-4de5af73-68f1-401d-9feb-1dfde1373cff-file\" />&下; / P>
&下,P DIR =LTR>&下; A HREF =15.2993265,74.123996>&下; IMG src=\"http://maps.google.com/maps/api/staticmap?center=15.2993265,74.123996&zoom=15&size=960x540&sensor=false&markers=color:blue%7Clabel:!%7C15.2993265,74.123996\" />&下; / A>&所述峰; br />&所述峰; br />&下; / P>
&下,P DIR =LTR>&下; A HREF =22.572646,88.363895,-25.274398,133.775136>&下; IMG src=\"http://maps.google.com/maps/api/staticmap?center=22.572646,88.363895&zoom=2&size=960x540&markers=22.572646,88.363895%7C-25.274398,133.775136&path=color:0xff0000ff%7Cweight:5%7C22.572646,88.363895%7C-25.274398,133.775136&sensor=false\" />&下; / A>&所述峰; br /> &所述; / P>

结果后包装

 < p DIR =升> &所述; / P>
<中心及GT;
 < IMG src=\"http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-01516245-c773-4765-b542-ebecb964b255-file\" />
< /中心及GT;
< BR />
< BR />
&所述p为H.;&下; / P>
< p DIR =升> &所述; / P>
<中心及GT;
 &所述; A HREF =15.2993265,74.123996>&下; IMG src=\"http://maps.google.com/maps/api/staticmap?center=15.2993265,74.123996&zoom=15&size=960x540&sensor=false&markers=color:blue%7Clabel:!%7C15.2993265,74.123996\" />&下; / A>
< /中心及GT;
&所述p为H.;&下; / P>
< p DIR =升> &所述; / P>
<中心及GT;
 &所述; A HREF =22.572646,88.363895,-25.274398,133.775136>&下; IMG src=\"http://maps.google.com/maps/api/staticmap?center=22.572646,88.363895&zoom=2&size=960x540&markers=22.572646,88.363895%7C-25.274398,133.775136&path=color:0xff0000ff%7Cweight:5%7C22.572646,88.363895%7C-25.274398,133.775136&sensor=false\" />&下; / A>
< /中心及GT;
< BR />
&所述p为H.;&下; / P>


为了比较,

只有图像的有效输出:

 < HTML和GT;
 < HEAD>< /头>
 <身体GT;
  < p DIR =升>
   <中心及GT;
    < IMG src=\"http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-959467a6-f83f-44c6-b6fc-88ba4f49d900-file\" />
   < /中心及GT;< BR />< / P>
  < p DIR =升>
   <中心及GT;
    < IMG src=\"http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-46c38c96-c3b5-402e-a0b4-03209adf5203-file\" />
   < /中心及GT;< BR />< / P>
  < p DIR =升>
   <中心及GT;
    < IMG src=\"http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-626ec909-c65e-452c-a341-61a361584eba-file\" />
   < /中心及GT;< BR /> &所述; / P>
 < /身体GT;
< / HTML>

的有效输出文本和图像:

 < HTML和GT;
 < HEAD>< /头>
 <身体GT;
  < p DIR =升>文字< / P>
  < p DIR =升>
   <中心及GT;
    < IMG src=\"http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-11343a01-7cd2-4f9e-9f9a-025ec3feb828-file\" />
   < /中心及GT;< BR /> &所述; / P>
 < /身体GT;
< / HTML>

++++++++++++++++++++++++++++++++++++

这是负责上述的功能中的类中的方法:

 私人无效createHtmlWeb(){        串listOfElements =空; //通常如果发现
                                        // webTextcontains.maps.google.com
        Toast.makeText(getApplicationContext(),+ mainEditText.getHeight(),Toast.LENGTH_SHORT).show();
        的parseObject postObject =新的parseObject(邮报);
        Spannable S = mainEditText.getText();
        字符串webText = Html.toHtml(S);
        webText = webText.replaceAll((小于/(?: C |我| U)>?)\\\\ 1+,$ 1)的replaceAll(< /(B | I | u)> &所述; \\\\ 1>中,);
        //重构HTML
        webText = wrapImgWithCenter(webText);
        //确定链接和喜爱的类型周围添加喜欢的一类
        //它。
        如果(webText.contains(A HREF)){
            字符串最喜欢=最爱;
            //解析它变成jsoup
            文档的DOC = Jsoup.parse(webText);
            //创建一个数组来单独解决所有类型的裹能
            //影响到整个身体类型otherwises。
            元素[]数组=新元素[doc.select(一)的大小()];            对(INT I = 0; I&下; doc.select(一)的大小();我++){
                如果(doc.select(一)。得到(ⅰ)!= NULL){
                    阵列[I] = doc.select(一)得到(一)。
                }
            }            的for(int i = 0; I< array.length,我++){
                //我们不想换链接类型。常见的部分环节已经是
                // HTTP。应该更新somethng更安全。
                如果(阵列[我]的ToString()。包含(HTTP)== FALSE){
                    阵列[I] =阵列[I] .wrap(&下;一类=+喜爱+>&下; / A>中);
                }            }
            // Log.e(从doc.body HTML ***************,+ doc.body());
            element元素= doc.body();
            Log.e(从元件的html ***************,+ element.html());
            listOfElements = element.html();
        }        //首先需要做的code的检查,如果ITI是个谷歌地图的图像
        如果(webText.contains(maps.google.com)){
            文档的DOC = Jsoup.parse(webText); //解析它变成jsoup            的for(int i = 0; I< doc.select(img目录)大小();我++){
                如果(doc.select(IMG)。得到(I)的ToString()。包含(maps.google.com)){
                    //获取所有号码+句号+获得的所有号码
                    模式信息noImage = Pattern.compile(\"(\\\\-?\\\\d+(\\\\.\\\\d+)?),(\\\\-?\\\\d+(\\\\.\\\\d+))+%7C(\\\\-?\\\\d+(\\\\.\\\\d+)?),(\\\\-?\\\\d+(\\\\.\\\\d+))\");
                    //获取URL SRC基本上.. ..几乎可以试试
                    匹配matcherer = noImage.matcher(doc.select(IMG)得到(ⅰ)的ToString());                    //有两个选择 - 多路或单路
                    如果(matcherer.find()==真){
                        对于(INT J = 0; J< matcherer.groupCount(); J ++){
                            latitude_to = Double.parseDouble(matcherer.group(1));
                            longitude_to = Double.parseDouble(matcherer.group(3));
                            latitude_from = Double.parseDouble(matcherer.group(5));
                            longitude_from = Double.parseDouble(matcherer.group(7));
                        }                        串COORDS =+ latitude_to +,+ longitude_to +,+ latitude_from +,+ longitude_from;
                        元件ELE = doc.body();
                        ele.select(IMG)得到(ⅰ).wrap。(&下; A HREF =+ COORDS +>&下; / A>中);
                        listOfElements = ele.html();
                        listOfElements = listOfElements.replace(与&放大器;,&放大器;);                    }否则如果(matcherer.find()== FALSE){
                        信息noImage = Pattern.compile((\\\\ - ?。?\\\\ D +(\\\\ D +)),\\\\ S *(\\\\ - ?。?\\\\ D +(\\\\ D +))) ;
                        matcherer = noImage.matcher(doc.select(IMG)得到(ⅰ)的ToString());                        Toast.makeText(getApplicationContext(),正则表达式计数:+ matcherer.groupCount(),Toast.LENGTH_LONG).show();
                        如果(matcherer.find()){
                            对于(INT J = 0; J< matcherer.groupCount(); J ++){
                                纬度= Double.parseDouble(matcherer.group(1));
                                parseGeoPoint.setLatitude(纬度);
                                经度= Double.parseDouble(matcherer.group(3));
                                parseGeoPoint.setLongitude(经度);
                            }
                        }                        串COORDS =+纬度+,+经度;                        元件ELE = doc.body();
                        ele.select(IMG)得到(ⅰ).wrap。(&下; A HREF =+ COORDS +>&下; / A>中);
                        listOfElements = ele.html();
                        listOfElements = listOfElements.replace(与&放大器;,&放大器;);                    }                }其他{
                    //标准照片
                    元件ELE = doc.body();
                    ele.select(IMG)获得(一)。
                    listOfElements = ele.html();                }            }
            //在htmlContent提出了新的价值
            postObject.put(htmlContent,listOfElements);        }其他{
            postObject.put(htmlContent,webText);
        }        mainEditText.getViewTreeObserver()。addOnGlobalLayoutListener(新ViewTreeObserver.OnGlobalLayoutListener(){            @覆盖
            公共无效onGlobalLayout(){
                // TODO自动生成方法存根
                矩形R =新的矩形();
                mainEditText.getWindowVisibleDisplayFrame(R);                // INT screenHeight = mainEditText.getRootView()的getHeight()。
                // INT heightDifference = screenHeight - (r.bottom - r.top);
            }
        });        //查看是否跳闸存在
        如果(finalTrip!= NULL){
        }        //要摆正位置的位置部分
        //如果parsegeoPoint = NULL - 旧的信息
        如果(!纬度= -10000&放大器;&安培;经度= -10000!){
            // Toast.makeText(getApplicationContext(),
            //添加位置中的共同ODS:+纬度+:+经度,
            // Toast.LENGTH_SHORT).show();
            postObject.put(位置,parseGeoPoint);
        }
        postObject.put(类型,Post.PostType.HTML.getPostVal());
        postObject.put(用户,ParseObject.createWithoutData(_用户,user.getObjectId()));        //将这些细节
        意图I =新意图(getApplicationContext(),WriteStoryAnimation.class);
        i.putExtra(listOfElements,listOfElements);
        i.putExtra(webText,webText);
        i.putExtra(finalTrip,finalTrip);
        i.putExtra(纬度,纬度);
        i.putExtra(经度,经度);        如果(mainEditText.length()大于0){
            startActivity(ⅰ);
        }其他{
            Toast.makeText(getApplicationContext()你的故事是空的,Toast.LENGTH_SHORT).show();
        }        //完成();
        // Toast.makeText(getApplicationContext()的EditText SIE:+高度+
        //:+ desiredHeight,Toast.LENGTH_LONG).show();    }    //方法重构HTML
    公共字符串wrapImgWithCenter(字符串HTML){
         文档的DOC = Jsoup.parse(HTML);
         //图像之前添加标签中心
            。doc.select(img目录)包(<中心及GT;< /中心和GT;);
            最后p标签后//添加差距
            的for(int i = 0; I< = 1;我++){
            doc.select(P)最后一次()之后。(< BR>中);
            }            返回doc.html();
    }


解决方案

我已经解决了这个问题。 Fonkap是正确的他的言论有东西改变我的输出。我只是改变从 wrapImgWithCenter()被称为得到住的地方。

我刚换了最后的 createHtmlWeb的()方法做:

  Log.e(listOfElements,listOfElements);
            //重构HTML
            listOfElements = wrapImgWithCenter(listOfElements);
            //在htmlContent提出了新的价值
            postObject.put(htmlContent,listOfElements);        }其他{
            //重构HTML
            webText = wrapImgWithCenter(webText);
            postObject.put(htmlContent,webText);
        }

现在的输出符合要求。

I have an HTML string which contains text, images or map images. The HTML is dynamically generated. Now, wherever there is an <img> tag, it should be wrapped by a <center> tag. To achieve this, I use JSoup and I successfully apply that to static images and text.

But, whenever I am trying to post a map, the HTML is losing its structure. I do not understand what is happening. Both the normal and the map images have <img> tags. How can the method give different outputs?

This is the method which is doing the job:

public String wrapImgWithCenter(String html){
    Document doc = Jsoup.parse(html);
    doc.select("img").wrap("<center></center>");
    return doc.html();
}

Original HTML with image and map images before wrapping:

<p dir="ltr"><img src="http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-4de5af73-68f1-401d-9feb-1dfde1373cff-file" /></p> 
<p dir="ltr"><a href="15.2993265,74.123996"><img src="http://maps.google.com/maps/api/staticmap?center=15.2993265,74.123996&zoom=15&size=960x540&sensor=false&markers=color:blue%7Clabel:!%7C15.2993265,74.123996" /></a><br /><br /></p> 
<p dir="ltr"><a href="22.572646,88.363895,-25.274398,133.775136"><img src="http://maps.google.com/maps/api/staticmap?center=22.572646,88.363895&zoom=2&size=960x540&markers=22.572646,88.363895%7C-25.274398,133.775136&path=color:0xff0000ff%7Cweight:5%7C22.572646,88.363895%7C-25.274398,133.775136&sensor=false" /></a><br /> </p>

Result after wrapping

<p dir="ltr"> </p>
<center> 
 <img src="http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-01516245-c773-4765-b542-ebecb964b255-file" /> 
</center>
<br />
<br />
<p></p> 
<p dir="ltr"> </p>
<center> 
 <a href="15.2993265,74.123996"><img src="http://maps.google.com/maps/api/staticmap?center=15.2993265,74.123996&zoom=15&size=960x540&sensor=false&markers=color:blue%7Clabel:!%7C15.2993265,74.123996" /></a> 
</center>
<p></p> 
<p dir="ltr"> </p>
<center> 
 <a href="22.572646,88.363895,-25.274398,133.775136"><img src="http://maps.google.com/maps/api/staticmap?center=22.572646,88.363895&zoom=2&size=960x540&markers=22.572646,88.363895%7C-25.274398,133.775136&path=color:0xff0000ff%7Cweight:5%7C22.572646,88.363895%7C-25.274398,133.775136&sensor=false" /></a> 
</center>
<br /> 
<p></p>


For comparison,

Valid output with only images:

<html>
 <head></head>
 <body>
  <p dir="ltr">
   <center>
    <img src="http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-959467a6-f83f-44c6-b6fc-88ba4f49d900-file" />
   </center><br /></p> 
  <p dir="ltr">
   <center>
    <img src="http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-46c38c96-c3b5-402e-a0b4-03209adf5203-file" />
   </center><br /></p> 
  <p dir="ltr">
   <center>
    <img src="http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-626ec909-c65e-452c-a341-61a361584eba-file" />
   </center><br /> </p> 
 </body>
</html>

Valid output with text and image:

<html>
 <head></head>
 <body>
  <p dir="ltr">text </p> 
  <p dir="ltr">
   <center>
    <img src="http://files.parsetfss.com/bcff7108-cbce-4ab8-b5d1-1f82827e6519/tfss-11343a01-7cd2-4f9e-9f9a-025ec3feb828-file" />
   </center><br /> </p> 
 </body>
</html>

++++++++++++++++++++++++++++++++++++

The methods in the class that are responsible for the above functionality:

private void createHtmlWeb(){

        String listOfElements = "null"; // normally found if
                                        // webTextcontains.maps.google.com
        Toast.makeText(getApplicationContext(), "" + mainEditText.getHeight(), Toast.LENGTH_SHORT).show();
        ParseObject postObject = new ParseObject("Post");
        Spannable s = mainEditText.getText();
        String webText = Html.toHtml(s);
        webText = webText.replaceAll("(</?(?:b|i|u)>)\\1+", "$1").replaceAll("</(b|i|u)><\\1>", "");
        // refactoring html
        webText = wrapImgWithCenter(webText);
        // Determine link and favourite types to add favourite a class around
        // it.
        if (webText.contains("a href")) {
            String favourite = "favourite";
            // Parse it into jsoup
            Document doc = Jsoup.parse(webText);
            // Create an array to tackle every type individually as wrap can
            // affect whole body types otherwises.
            Element[] array = new Element[doc.select("a").size()];

            for (int i = 0; i < doc.select("a").size(); i++) {
                if (doc.select("a").get(i) != null) {
                    array[i] = doc.select("a").get(i);
                }
            }

            for (int i = 0; i < array.length; i++) {
                // we don't want to wrap link types. Common part links have is
                // http. Should update for somethng more secure.
                if (array[i].toString().contains("http") == false) {
                    array[i] = array[i].wrap("<a class=" + favourite + "></a>");
                }

            }
            // Log.e("From doc.body html *************** ", " " + doc.body());
            Element element = doc.body();
            Log.e("From element html *************** ", " " + element.html());
            listOfElements = element.html();
        }

        // First need to do a check of the code if iti s a google maps image
        if (webText.contains("maps.google.com")) {
            Document doc = Jsoup.parse(webText); // Parse it into jsoup

            for (int i = 0; i < doc.select("img").size(); i++) {
                if (doc.select("img").get(i).toString().contains("maps.google.com")) {
                    // Get all numbers + full stops + get all numbers
                    Pattern noImage = Pattern.compile("(\\-?\\d+(\\.\\d+)?),(\\-?\\d+(\\.\\d+))+%7C(\\-?\\d+(\\.\\d+)?),(\\-?\\d+(\\.\\d+))");
                    // Gets the URL SRC basically.. almost.. lets try it
                    Matcher matcherer = noImage.matcher(doc.select("img").get(i).toString());

                    // Have two options - multi route or single route
                    if (matcherer.find() == true) {
                        for (int j = 0; j < matcherer.groupCount(); j++) {
                            latitude_to = Double.parseDouble(matcherer.group(1));
                            longitude_to = Double.parseDouble(matcherer.group(3));
                            latitude_from = Double.parseDouble(matcherer.group(5));
                            longitude_from = Double.parseDouble(matcherer.group(7));
                        }

                        String coOrds = "" + latitude_to + "," + longitude_to + "," + latitude_from + "," + longitude_from;
                        Element ele = doc.body();
                        ele.select("img").get(i).wrap("<a href=" + coOrds + "></a>");
                        listOfElements = ele.html();
                        listOfElements = listOfElements.replace("&amp;", "&");

                    } else if (matcherer.find() == false) {
                        noImage = Pattern.compile("(\\-?\\d+(\\.\\d+)?),\\s*(\\-?\\d+(\\.\\d+)?)");
                        matcherer = noImage.matcher(doc.select("img").get(i).toString());

                        Toast.makeText(getApplicationContext(), "Regex Count:" + matcherer.groupCount(), Toast.LENGTH_LONG).show();
                        if (matcherer.find()) {
                            for (int j = 0; j < matcherer.groupCount(); j++) {
                                latitude = Double.parseDouble(matcherer.group(1));
                                parseGeoPoint.setLatitude(latitude);
                                longitude = Double.parseDouble(matcherer.group(3));
                                parseGeoPoint.setLongitude(longitude);
                            }
                        }

                        String coOrds = "" + latitude + "," + longitude;

                        Element ele = doc.body();
                        ele.select("img").get(i).wrap("<a href=" + coOrds + "></a>");
                        listOfElements = ele.html();
                        listOfElements = listOfElements.replace("&amp;", "&");

                    }

                } else {
                    // standard photo
                    Element ele = doc.body();
                    ele.select("img").get(i);
                    listOfElements = ele.html();

                }

            }
            // Put new value in htmlContent
            postObject.put("htmlContent", listOfElements);

        } else {
            postObject.put("htmlContent", webText);
        }

        mainEditText.getViewTreeObserver().addOnGlobalLayoutListener(new ViewTreeObserver.OnGlobalLayoutListener() {

            @Override
            public void onGlobalLayout(){
                // TODO Auto-generated method stub
                Rect r = new Rect();
                mainEditText.getWindowVisibleDisplayFrame(r);

                // int screenHeight = mainEditText.getRootView().getHeight();
                // int heightDifference = screenHeight - (r.bottom - r.top);
            }
        });

        // See if a trip exists
        if (finalTrip != null) {
        }

        // Want to put the location in the location section
        // if parsegeoPoint != null -- old information
        if (latitude != -10000 && longitude != -10000) {
            // Toast.makeText(getApplicationContext(),
            // "Adding in location co-ods: " + latitude + " : " + longitude ,
            // Toast.LENGTH_SHORT).show();
            postObject.put("location", parseGeoPoint);
        }
        postObject.put("type", Post.PostType.HTML.getPostVal());
        postObject.put("user", ParseObject.createWithoutData("_User", user.getObjectId()));

        // Transfer these details
        Intent i = new Intent(getApplicationContext(), WriteStoryAnimation.class);
        i.putExtra("listOfElements", listOfElements);
        i.putExtra("webText", webText);
        i.putExtra("finalTrip", finalTrip);
        i.putExtra("latitude", latitude);
        i.putExtra("longitude", longitude);

        if (mainEditText.length() > 0) {
            startActivity(i);
        } else {
            Toast.makeText(getApplicationContext(), "Your story is empty", Toast.LENGTH_SHORT).show();
        }

        // finish();
        // Toast.makeText(getApplicationContext(), "EditText Sie: " + height +
        // " : " + desiredHeight, Toast.LENGTH_LONG).show();

    }

    // method to refactor html
    public String wrapImgWithCenter(String html){
         Document doc = Jsoup.parse(html);
         //adding center tag before images
            doc.select("img").wrap("<center></center>");
            //adding gap after last p tag
            for (int i =0; i<= 1; i++) {
            doc.select("p").last().after("<br>");
            }

            return doc.html();
    }

解决方案

I have solved the issue. Fonkap was right in his comments that something was altering my output. I just changed the place from where the wrapImgWithCenter() was getting called.

I have just changed the last of the createHtmlWeb() method and did this:

Log.e("listOfElements", listOfElements);
            //refactoring html
            listOfElements = wrapImgWithCenter(listOfElements);
            // Put new value in htmlContent
            postObject.put("htmlContent", listOfElements);

        } else {
            //refactoring html
            webText = wrapImgWithCenter(webText);
            postObject.put("htmlContent", webText);
        }

Now the output conforms to the requirements.

这篇关于JSoup&QUOT;包装&QUOT;不能按预期每次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆