导航：首页 > 网络技术 >

实现word转pdf，HTML转pdf（探索篇）

发表于：2024-11-26 作者：热门IT资讯网编辑

编辑最后更新 2024年11月26日，笔者找依赖的jar包，找的好辛苦。ITextRenderer、ITextFontResolver这两个类依赖的jar包到底是哪个，还有怎么下载？苦苦纠结了3个小时。终于找到你了！记录个网址：http:

笔者找依赖的jar包，找的好辛苦。

ITextRenderer、
ITextFontResolver这两个类依赖的jar包到底是哪个，还有怎么下载？苦苦纠结了3个小时。

终于找到你了！
记录个网址：
http://www.java2s.com/Code/Jar/c/Downloadcorerendererr8pre2jar.htm
上测试代码：

 /* * html转图片 */public static boolean convertHtmlToPdf(String inputFile,         String outputFile, String imagePath)        throws Exception {    OutputStream os = new FileOutputStream(outputFile);    ITextRenderer renderer = new ITextRenderer();    String url = new File(inputFile).toURI().toURL().toString();    renderer.setDocument(url);    // 解决中文支持问题    ITextFontResolver fontResolver = renderer.getFontResolver();    fontResolver.addFont("C:/Windows/Fonts/simsunb.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);    //解决图片的相对路径问题    renderer.getSharedContext().setBaseURL("file:/" + imagePath);//D:/test    renderer.layout();    renderer.createPDF(os);    os.flush();    os.close();    return true;}

调用+走你！

这里笔者结合上一篇poi将word转html，结合使用。
/**doc

转html
*/
String tagPath = "D:\red_ant_file\20180915\image\";
String sourcePath = "D:\red_ant_file\20180915\RedAnt的实验作业.doc";
String outPath = "D:\red_ant_file\20180915\123.html";
try {
AllServiceIsHere.docToHtml(tagPath, sourcePath, outPath);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
```
String pdfPath = "D:\\red_ant_file\\20180915\\456.pdf";try {    AllServiceIsHere.convertHtmlToPdf(outPath , pdfPath, tagPath);} catch (Exception e) {    // TODO Auto-generated catch block    e.printStackTrace();}
```
【注意】
（值得注意的地方是IText 根据html生成pdf文件的时候，会验证html文件是否标准，例如通过poi转换的出来的html文件的一些标签会缺少标签闭合 " / " :
否则，你会遇到
Can't load the XML resource (using TRaX transformer). org.xml.sax.SAXParseException; lineNumber: 23; columnNumber: 3; 元素类型 "meta" 必须由匹配的结束标记 "" 终止。

笔者尝试，使用第三方 jar 包Jsoup，直接调用 parse方法，笔者认为html就标准啦！
这个坑，让笔者苦恼了，1个小时。

为此，笔者不得不重写，word转html代码：
再次记录个网址：下载第三方 jar 包Jsoup使用
https://jsoup.org/download
上重写word转html代码：

        // word 转 html            public static void convert2Html(String fileName, String outPutFile) throws Exception {                HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(fileName));// WordToHtmlUtils.loadDoc(new                // 兼容2007 以上版本                WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(                        DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());                wordToHtmlConverter.setPicturesManager(new PicturesManager() {                    public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches,                            float heightInches) {                        return "test/" + suggestedName;                    }                });                wordToHtmlConverter.processDocument(wordDocument);                // save pictures                List pics = wordDocument.getPicturesTable().getAllPictures();                if (pics != null) {                    for (int i = 0; i < pics.size(); i++) {                        Picture pic = (Picture) pics.get(i);                        System.out.println();                        try {                            pic.writeImageContent(new FileOutputStream("D:/test/" + pic.suggestFullFileName()));                        } catch (FileNotFoundException e) {                            e.printStackTrace();                        }                    }                }                Document htmlDocument = wordToHtmlConverter.getDocument();                ByteArrayOutputStream out = new ByteArrayOutputStream();                DOMSource domSource = new DOMSource(htmlDocument);                StreamResult streamResult = new StreamResult(out);                TransformerFactory tf = TransformerFactory.newInstance();                Transformer serializer = tf.newTransformer();                serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");                serializer.setOutputProperty(OutputKeys.INDENT, "yes");                serializer.setOutputProperty(OutputKeys.METHOD, "HTML");                serializer.transform(domSource, streamResult);                out.close();                writeFile(new String(out.toByteArray()), outPutFile);            }                //输出html文件                 public static void writeFile(String content, String path) {                        FileOutputStream fos = null;                         BufferedWriter bw = null;                        org.jsoup.nodes.Document doc = Jsoup.parse(content);                         content=doc.html();                        try {                                File file = new File(path);                                fos = new FileOutputStream(file);                                bw = new BufferedWriter(new OutputStreamWriter(fos,"UTF-8"));                                bw.write(content);                        } catch (FileNotFoundException fnfe) {                                fnfe.printStackTrace();                        } catch (IOException ioe) {                                ioe.printStackTrace();                        } finally {                                try {                                        if (bw != null)                                                bw.close();                                        if (fos != null)                                                fos.close();                                } catch (IOException ie) {                                }                        }                }

准备个文件，测试一下。

    String source = "D:\\red_ant_file\\20180915\\1303\\RedAnt的实验作业.doc";                String out = "D:\\red_ant_file\\20180915\\1303\\789.html";                try {                    AllServiceIsHere.convert2Html(source, out);                } catch (Exception e) {                    // TODO Auto-generated catch block                    e.printStackTrace();                }

word转html，规范化代码后的转换结果。

接下来，html转pdf

【后话】

虽然笔者，最终调试出来了。使用这种方法转pdf。
但是使用中，会遇到各种各样的奇葩坑！因此笔者在这里不推荐使用这种方法。
原因就是，html的规则也在变化之中，写法也在变化之中。html转pdf会在后续报各种各样的标签错误。
笔者之所以粘出，这些代码。完全是因为，笔者对自己的尝试，有个明确的结果。亦或是，再优化这些代码，找到合适的解决办法。

很赞哦！

实现word转pdf，HTML转pdf（探索篇）

【后话】

相关文章