有没有办法计算字符串中每个单词的字符数,返回用逗号分隔的值? [英] Is there a way to count the number of characters per word for a string, returning values separated by a comma?

查看:37
本文介绍了有没有办法计算字符串中每个单词的字符数,返回用逗号分隔的值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在单元格中有一个字符串列表 - 其中有 1000 个 - 我需要计算出每个单词的字符,但用单词分隔 - 最好在 1 个快速公式中......

例如:1.带柄黑杯"> 我需要的公式> 5,3,4,6

  1. 巨熊雕像"> 我需要的公式 > 5,4,6

我需要它来执行重复性任务,该任务以非常低效的方式进行宏处理,以将单词计数到列中(其中我们需要最多使用 20 个用于包装),但这需要解决.

>

通常,我们计算空格和层嵌套的 serach() 公式以捎带一个和另一个来分解结构,然后字符计算单个单词...

我也可以使用宏来代替逗号的空格并将文本用于列,但这仍然使我需要长时间的计数过程来查找我正在寻找的内容

我们显然使用=LEN(A1)-LEN(SUBSTITUTE(A1," ",""))来统计单词中的空格

我们目前使用 =SEACRH() 函数结合 =MID() 函数(和一些奇怪的数字)将每个单词显示到它自己的单个单元格中

然后 =LEN 再次对所有单个单词进行 bu - 非常冗长

我希望找到一种更短的方法来做到这一点,但感觉可能没有足够动态的方法来单独使用公式来做到这一点,希望有人能证明我是错的!

解决方案

您将有不同的选项,具体取决于您的 Excel 版本.

<小时>

选项 1:TEXTJOIN

我认为您正在寻找 TEXTJOIN 函数.请记住,您只能在更高版本的 Excel(请参阅文档链接)中使用它,它可以像这样工作:

B1中的公式:

=TEXTJOIN(",",TRUE,LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")))

<块引用>

注意:这是一个数组公式,您需要使用 CtrlShiftEnter 输入它

为了让你不需要使用上面的组合键,我们可以包含一个INDEX:

=TEXTJOIN(",",TRUE,INDEX(LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")),))

<小时>

附加信息:

FILTERXML

此函数采用(根据文档)两个必需参数:

因为我们想从单元格中返回一个元素(单词)数组,我们需要SUBSTITUTE 结束标签的空格 (</..>) 并连接以字符串开头的开始标记 (<...>) 和结尾的另一个结束标记.

我将不得不依靠标签上的 XML 解释来解释为什么 <?><?> 起作用以及它的含义,因为就我的测试而言,只要最终的 Xpath 与相同的字符相似,我就可以交换字母或替换为另一个具有相同结果的字母.如果有人能够对这个问题做出更好的解释来补充这个答案,那就太好了.

有关FILTERXML技巧"的更多信息,请查看此处

<小时>

TEXTJOIN

如果您是 Office 365 订阅者或拥有 Excel 2019,则可以使用此功能.(根据文档)至少有 3 个必需参数:

  • 一个分隔符,它必须是一个文本字符串,可以是空的,或者是用双引号括起来的一个或多个字符,或者是对有效文本字符串的引用.如果提供了数字,它将被视为文本.
  • 第二个参数可以包含 TRUEFALSE 并确定是否要排除/包含空值
  • 第三个参数是要连接的文本项.一个文本字符串或字符串数​​组,例如一系列单元格.

现在我们可以将这两个函数连接在一起,FILTERXML 返回一个我们可以在 TEXTJOIN 中使用的数组.

<小时>

INDEX + LEN

我必须一起解释这些函数的使用.我不认为 LENINDEX 需要单独介绍很多内容,但它们一起工作得很好.本机将有一种称为隐式交集的力,它会阻止 LEN 在您将一组值传递给函数时返回一组值,在这种情况下,通过我们的 FILTERXML.

通常您会使用以下组合键禁用此机制:CtrlShiftEnter,通常称为 CSE.

现在INDEX 所做的是禁用这种隐式交集,使LEN 能够返回一个数组,从而无需CSE 公式.INDEX 是具有这种威力"的函数之一.可以在此处

找到有关隐式交集的更深入解释<小时>

选项 2:UDF

如果无法访问 TEXTJOIN,我认为您需要考虑使用 UDF,可能如下所示:

函数 TEXTJOIN(rng As Range) As StringTEXTJOIN = Join(Application.Evaluate("LEN({""" & Join(Split(rng, " "), """,""") & """})"), ",")结束函数

您可以像这样在 B1 中调用它:=TEXTJOIN(A1)

<小时>

附加信息:

UDF 由三个协同工作的主要机制组成:

JOIN

这个函数有两个参数,其中第一个是必需的:

  • 第一个参数是一个包含子字符串的一维数组
  • 第二个(可选)参数是一个字符串字符,用于分隔返回字符串中的子字符串.如果省略,则使用空格字符 (" ").如果 delimiter 是零长度字符串 (""),则列表中的所有项目都连接在一起,没有分隔符.

函数返回一个字符串值

<小时>

分割

该函数接受一个字符串并用指定的字符/子字符串分隔它.它需要以下参数:

  • 1st:包含子字符串和分隔符的必需字符串表达式.如果表达式是一个零长度的字符串(""),Split 返回一个空数组,即一个没有元素也没有数据的数组.
  • 2nd:可选的分隔符,它是一个字符串字符,用于标识子字符串限制.如果省略,则假定空格字符 (" ") 是分隔符.如果 delimiter 是零长度字符串,则返回包含整个表达式字符串的单元素数组.
  • 3rd:一个可选的限制,要返回的子串数量;-1 表示返回所有子字符串.
  • 4th:Compare,也是可选的,是一个数值,表示在评估子字符串时要使用的比较类型.有关值,请参阅设置"部分.

在这种情况下,我们只需要前两个参数.

<小时>

Application.Evaluate

这是 IMO 最方便的机制之一,您可以使用它来提取返回的值数组,而无需遍历项目/单元格.当你为函数提供一个大数组公式时它可能会变慢,但在这种情况下它会很好.该函数将 Microsoft Excel 名称转换为对象或值,当我们向其传递公式时,它将返回结果.在这种特殊情况下,它将返回一个数组.

I have a list of strings in cells - 1000s of them - and I need to work out the characters per word but separated by word - preferably in 1 swift formula...

For Example: 1. "Black Cup With Handle" > Formula I need > 5,3,4,6

  1. "Giant Bear Statue" > Formula I need > 5,4,6

I need this for a recurring task which has been macro'd in a very inefficient way to count words into columns (of which we need to use up to 20 for the just encase) but this needs to be tackled.

Usually, we count the spaces and layer nested serach() formulas to piggyback onto one and other to break down the structure then character counts the individual words...

I could alternatively the macro to substitute the spaces for commas and used text to columns but that still leaves me with a prolonged counting process for what im looking for

we obviously use =LEN(A1)-LEN(SUBSTITUTE(A1," ","")) to count the spaces in the word

we currently then use =SEACRH() function combined with =MID() functions (and some bizarre numbers) to reveal each word into its own individual cell

then =LEN once again bu on all individual words - very long-winded

Im hoping to find a shorter way to do this but feeling there may not be a dynamic enough way to do it with formula alone, hoping someone can prove me wrong!

解决方案

You'll have different options depending on your Excel version.


OPTION 1: TEXTJOIN

I think you are looking for a TEXTJOIN function. Just bare in mind that you can only use this the more later versions of Excel (see link to documentation) and it could work like this:

Formula in B1:

=TEXTJOIN(",",TRUE,LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")))

NOTE: It's an array formula and you need to enter it using CtrlShiftEnter

To make it so that you won't need to use the above key-combo, we can include an INDEX:

=TEXTJOIN(",",TRUE,INDEX(LEN(FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s")),))


Additional Information:

FILTERXML

This function takes (as per documentation) two required arguments:

  • A string in valid XML
  • A string in valid XPath

Because we want to return an array of elements (words) from the cell, we need to SUBSTITUTE the spaces for end-tags (</..>) and concatenate that with a start-tag (<..>) at the start of the string and another end-tag at the end.

I'll have to rely on an XML explaination on the tags as to why <?><?> works and it's meaning, because as far as my testing goes I could swap the letters around or replace by another letter with the same results as long as the final Xpath would resemble the same character. It would be great if someone would be able to complement this answer with a better explanation on this matter.

For more FILTERXML "tricks", have a look here


TEXTJOIN

If you are a Office 365 subscriber or own Excel 2019 you can make use of this function. There are (as per documentation) at least 3 required arguments:

  • A delimiter which must be a text string, either empty, or one or more characters enclosed by double quotes, or a reference to a valid text string. If a number is supplied, it will be treated as text.
  • The second argument can hold either TRUE or FALSE and determines whether or not you want to exclude/include empty values
  • The third argument is the text item to be joined. A text string, or array of strings, such as a range of cells.

Now this is where we can join the two functions together, FILTERXML returning an array which we can use in TEXTJOIN.


INDEX + LEN

I'll have to explain the use of these functions together. I don't think LEN and INDEX will need much of an introduction on their own, but together they work quite nicely. Natively there will be a force called implicit intersection that will prevent LEN from returning an array of values when you pass an array of values to the function, in this case through our FILTERXML.

Normally you would disable this mechanism using a key combination of: CtrlShiftEnter, better known as CSE.

Now what INDEX does is disabling this implicit intersection making LEN able to return an array, removing the need to CSE the formula. INDEX is one of the functions that has this "power". A more in depth explanation on implicit intersection can be found here


OPTION 2: UDF

Without access to TEXTJOIN I think you'll need to have a look at using an UDF, possibly looking like below:

Function TEXTJOIN(rng As Range) As String
    TEXTJOIN = Join(Application.Evaluate("LEN({""" & Join(Split(rng, " "), """,""") & """})"), ",")
End Function

You can call this in B1 like so: =TEXTJOIN(A1)


Additional Information:

The UDF consists out of three main mechanisms that work together:

JOIN

This funciton takes two parameters, where the first one is required:

  • First parameter is a one-dimensional array containing substrings
  • The second (optional) parameter is a string character used to separate the substrings in the returned string. If omitted, the space character (" ") is used. If delimiter is a zero-length string (""), all items in the list are concatenated with no delimiters.

The function returns a string value


SPLIT

This function takes a string and delimits it by a specified character/substring. It takes the following arguments:

  • 1st: A required string expression containing substrings and delimiters. If expression is a zero-length string(""), Split returns an empty array, that is, an array with no elements and no data.
  • 2nd: The optional delimiter which is a string character used to identify substring limits. If omitted, the space character (" ") is assumed to be the delimiter. If delimiter is a zero-length string, a single-element array containing the entire expression string is returned.
  • 3rd: An optional limit, a number of substrings to be returned; -1 indicates that all substrings are returned.
  • 4th: Compare, also optional, is a numeric value indicating the kind of comparison to use when evaluating substrings. See Settings section for values.

In this case we would only need the first two arguments.


Application.Evaluate

This is IMO one of the most handy mechanisms you can use to pull of a returned array of values without having to loop through items/cells. It may get slow when you feed the function a large array formula, but in this case it will be fine. The funtion converts a Microsoft Excel name into an object or value, and when we pass it an formula, it thus will return the results. In this particular case it will return an array.

这篇关于有没有办法计算字符串中每个单词的字符数,返回用逗号分隔的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆